Lecture 02
Lecture 02
Yaregal Assabie
2018/19— Sem I
Introduction Terminologies
English Morphology Kinds of Morphemes
Terminologies
Introduction
Amharic Morphology Morphological Types
Kinds of Morphemes
English Morphology
Models for Morphological Analysis Morphological Rules
Morphological Types
Amharic Morphology
Morphological Rules
Introduction: Terminologies
Introduction: Terminologies
Morphology ስነ ( ስነוእֶድ
וእֶድ ) - the study of the structure of words.
ስነ וእֶድ וእֶድ)- minimal units of morphology, e.g. helpfulness and ָጅነُ.
Morpheme ( וእֶድ
וእֶድ አוድ
Stem ( አוድ )-አוድpart of the word that never changes even when morphologically
ስץ
ስ ץFor ስץ ስነ וእֶድ
inflected. example, walk is the stem for the words walk, walks, walking, and
וእֶድ
walked. רበ ץisአוድ the stem for the words רበץኩ , רበ ֹּלץ, רበץን , רበץአ٤ሁ , etc.
Root/Lemma ( ስ) ץ- citation form of a set of words, e.g. break is the root form for
the words break, breaks, breaking, broke, and broken. Amharic root form is
usually a sequence of three consonants known as radicals. For example, ስብ ץis
the root form for רበ ץ, רብ ץ, ስበ ץ, አרበ ץ, רـበ ץ, וֹרበ ץ, etc.
Part-of-Speech/Lexical Category/Word Class ( የቃָ ክፍָ ) - a linguistic category of
words that explains how the word is used in a sentence. Although different
languages may have different classification schemes, English and Amharic words
are usually classified into eight lexical categories: noun, pronoun, adjective, verb,
adverb, preposition, conjunction and interjection. Morphologically important
parts-of-speech in English and Amharic include: nouns, adjectives and verbs.
Morphological Analysis - the process of finding morphemes of a word. It is an
important component of Spelling Correction, Machine Translation, Information
Retrieval, Text Generation and other natural language systems.
Morphological Generation - the process of generating different words from a
morpheme.
Lemmatisation - the process of finding the root/lemma of a word.
Stemming - the process of finding the stems of a word.
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 2/23
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 2/44
Introduction Terminologies
English Morphology Kinds of Morphemes
Amharic Morphology Morphological Types
Models for Morphological Analysis Morphological Rules
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 3/44
Introduction Terminologies
English Morphology Kinds of Morphemes
Amharic Morphology Morphological Types
Models for Morphological Analysis Morphological Rules
Affixes - bound morphemes that either precede, follow or are inserted inside the
root or stem.
e.g. Prefix: en- in enlarge is an affix that precedes the root large
Suffix: -ly in largely is an affix that follows the root large
Infix: -ና- in ُናን ֹּלis an affix that is inserted inside the root ُንֹּל
Circumfix: አָ…ኧ וin አָרበ[ ונአָרበ(ץኧ) ]וis an affix that precedes and
follows the stem רበץ
Combining Forms - morphemes that are formed from two bound or free-like roots.
e.g. two free roots: photo and graph in photograph
two bound roots: electro- and -lysis in electrolysis
bound and free roots: Ethio- and America in Ethio-American
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 4/44
Introduction Terminologies
English Morphology Kinds of Morphemes
Amharic Morphology Morphological Types
Models for Morphological Analysis Morphological Rules
Isolating
Languages with isolating morphological structures have morphemes representing
words in the language in most cases. There is little or no morphological change in
words, and such languages do not require extensive study on morphological analysis.
Agglutinative
Languages with agglutinative morphological structures have words formed from lots of
morphemes that are glued together. Words in these language groups have lots of
easily separable morphemes.
Inflectional
In languages with inflectional morphological structures, morphemes are fused together
and require complex morphological analyzer to separate morphemes. Morphemes may
be fused together in several ways such as affixation and doubling all or part of a word.
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 5/44
Introduction Terminologies
English Morphology Kinds of Morphemes
Amharic Morphology Morphological Types
Models for Morphological Analysis Morphological Rules
Derivational Morphology
Derivational Morphology is a morphology concerned with the way in which words are
derived from morphemes through processes such as affixation or compounding. This
derivation process usually changes the part-of-speech category.
Inflectional Morphology
Inflectional Morphology is a morphology that deals with the combination of a word with
a morpheme, usually resulting in a word of the same class as the original stem, and
serving same syntactic function. They do not change the part-of-speech category but
the grammatical function.
Inflection can by achieved by marking a word category for person (first, second, third),
gender (feminine, neuter, masculine), number (singular, plural), case (subjective/
nominative, objective/accusative/dative, possessive/genitive), definiteness (definite,
indefinite), degree (positive, comparative, superlative), tense (past, present, future),
aspect(perfective, imperfective/continuous), politeness (impolite, polite), etc.
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 6/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis
Derivation
English nouns can be derived from:
Nouns: e.g. book –> booklet (by affixing -let)
prince –> princess (by affixing -ss)
Ethiopia –> Ethiopian (by affixing -n)
child –> childhood (by affixing -hood)
art –> artist (by affixing -ist)
Adjectives: e.g. equal –>equality (by affixing -ity)
good –> goodness (by affixing -ness)
radical –> radicalism (by affixing -ism)
Verbs: e.g. perform –> performance (by affixing -ance)
move –> movement (by affixing -ment)
build –> building (by affixing -ing)
build –> builder (by affixing -er)
construct –> construction (by affixing -ion)
arrive –> arrival (by affixing -al)
sing –> song (by changing a vowel)
defend –> defense (by changing the last consonant)
Compound words: e.g. shoe maker –> shoemaker (compounding two nouns)
black board –> blackboard (compounding adjective and noun)
play time –> playtime (compounding verb and noun)
over coat –> overcoat (compounding preposition and noun)
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 7/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis
Inflection
English nouns can be marked for:
Gender: e.g. actor –> actress (by affixing -ess)
hero –> heroine (by affixing -ine)
Number: e.g. cow –> cows (by affixing -s)
ox –> oxen (by affixing -en)
child –> children (by affixing -ren)
tooth –> teeth (by changing vowels)
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 8/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis
Derivation
English adjectives can be derived from:
Nouns: e.g. help –> helpful (by affixing -ful)
help –> helpless (by affixing -less)
person –> personal (by affixing -al)
child –> childish (by affixing -ish)
Adjectives: e.g. readable –> unreadable (by affixing un-)
edible –> inedible (by affixing in-)
legible –> illegible (by affixing il-)
responsible –> irresponsible (by affixing ir-)
possible –> impossible (by affixing ir-)
Verbs: e.g. interest –> interesting (by affixing -ing)
damage –> damaged (by affixing -ed)
drink –> drunk (by vowel change)
write –> written (by affixing -en)
read –> readable (by affixing -able)
converse –> conversant (by affixing -ant)
repulse –> repulsive (by affixing -ive)
Compound words: e.g. hand written –> handwritten (compounding noun and adjective)
blue black –> blue-black (compounding two adjectives)
ever green –> evergreen (compounding adverb and adjective)
over active –> overactive (compounding prep. and adjective)
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 9/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis
Inflection
Most English adjectives are marked for degree as shown below.
e.g. fast (positive degree) –> faster: comparative degree (by affixing -er)
fastest: superlative degree (by affixing -est)
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 10/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis
Derivation
English verbs can be derived from:
Nouns: e.g. bug –> debug (by affixing de-)
organ –> organize (by affixing -ize/-ise)
beauty –> beautify (by affixing -ify)
power –> empower (by affixing em-)
throne –> enthrone (by affixing en-)
breath –> breathe (by affixing the vowel -e)
Adjectives: e.g. national –> nationalize (by affixing -ize/-ise)
bold –> embolden (by affixing em-...-en)
pure –> purify (by affixing -ify)
tight –> tighten (by affixing -en)
large –> enlarge (by affixing en-)
Verbs: e.g. do –> redo (by affixing re-)
do –> undo (by affixing un-)
compose –> decompose (by affixing de-)
satisfy –> dissatisfy (by affixing dis-)
Compound words: e.g. stir fry –> stir-fry (compounding two verbs)
hand wash –> hand-wash (compounding noun and verb)
dry clean –> dry-clean (compounding adjective and verb)
over act –> overact (compounding preposition and verb)
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 11/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis
Inflection
English verbs can be marked for aspect and tense (and person and number for present
tense). For example, the verb give has the following inflected forms:
give: present tense for plural, first person singular, or second person singular
gives: present tense for third person singular
gave: past tense
given: perfective aspect
giving: imperfective/continuous aspect
With a special case, the verb be can be inflected for tense, aspect, person and number
as shown below.
am: present tense for first person singular
is: present tense for third person singular
are: present tense for first person plural, second person, or third person plural
was: past tense for first person singular, or third person singular
were: past tense for plural, or second person singular
been: perfective aspect
being: imperfective/continuous aspect
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 12/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis
Derivation
Amharic nouns can be derived from:
i. Verbal Roots by infixing vowels between consonants (C) as shown below
Verbal Root (Examples) Pattern of Derivation Derived Noun
ጥ-ቅ-ו CእCእC ጥእቅእ[ וጥቅ]ו
ו-ץ-ُ CእCC וእ]ُץו[ ُץ
ו-ָ-ስ CኧCC וኧָስ [ָאስ]
ን-ግ-ץ CኧCኧC ንኧግኧ[ ץነገ]ץ
ድ-ክ-ו CእCኣC ድእክኣ[ וድካ]ו
ֱ-ו-ו CእCኧC ֱእוኧ]ואֱ[ ו
ግ-ብ-ዕ CእC ግእብ [ግብ]
ጥ-ው-ו CኦC ጥኦ[ וጦ]ו
ቅ-ው-ץ-ጥ CኡCC ቅኡץጥ [ץשּׁጥ]
ድ-ብብ-ቅ CእC1C1እC ድእብብእቅ [ድብቅ]
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 13/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis
Derivation
iii. Stems by prefixing or suffixing bound morphemes
Stem (Examples) Morpheme Derived Noun
ውץድ- -ኧُ ውץድ-ኧُ [ውץደُ]
ቅዳስ- -ኤ ቅዳስ-ኤ [ቅዳሴ]
እץጅ- -እና እץጅ-እና [እץጅና]
וָֹּל- -ኣُ וָֹּל-ኣُ []ُדָֹּל
ስץቅ- -ኦֹּל ስץቅ-ኦ[ ֹּלስ]ֹּלבּץ
٤ָ- -ኦٍ ٤ָ-ኦٍ [٤ֹٍ]
ውጥ- -ኤُ ውጥ-ኤُ [ውጤُ]
ፍֳግ- -ኣ ፍֳግ-ኣ [ፍֳጋ]
ናፍቅ- -ኦُ ናፍቅ-ኦُ [ናፍ]ُבּ
ድץግ- -ኢُ ድץግ-ኢُ [ድץጊُ]
וֹרክ- -ኢ וֹרክ-ኢ [וֹרኪ]
ዝץፍ- -ኢያ ዝץፍ-ኢያ [ዝץፊያ]
ጠושׂ- -ኤٍ ጠושׂ-ኤٍ [ጠ]ٍהשׂ
-ְድ א- א-ְድ [ְאድ]
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 14/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis
Derivation
v. Nouns by suffixing bound morphemes
Noun (Examples) Morpheme Derived Noun
ָጅ -ነُ ָጅ-ነُ [ָጅነُ]
እግץ -ኧኛ እግץ-ኧኛ [እግנኛ ]
ክብץ -ኧُ ክብץ-ኧُ [ክብ]ُנ
ከדـ -ኤ ከ דـ-ኤ [ከ]הـ
ጢו -ኦ ጢו-ኦ [ጢ]ז
ኢُዮጵያ -ኣዊ ኢُዮጵያ-ኣዊ [ኢُዮጵያዊ]
እንግֵዝ -ኛ እንግֵዝ-ኛ [እንግֵዝኛ]
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 15/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis
Inflection
Amharic nouns can be marked for:
i. Number by affixation of morphemes (and vowel changes) or repetition of words
Noun in Singular Description of the Noun Morpheme Plural Form
Form (Examples)
Ending with consonant -O -O []
Ending with vowel -
A Personal Pronoun E- E-A [E ]
Proper Noun E- E
Plural formation by repetition -- []
A Loanwords from Geez (do not have A
similar patterns for plural formation)
! "
ii. Definiteness by affixation of morphemes or vowels based on number, gender, and/or ending
of the noun.
Indefinite Noun Ending of Number Gender Definite Noun
(Examples) the Noun
Feminine -# [#] / -I& ['&]
Singular
Consonant Masculine -U [)]
Plural -U [*]
Feminine A +-# [A +#] / A +-,& [A +,&]
Singular
A + Vowel Masculine A +-- [A +-]
Plural A + -U [A + *]
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 16/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis
Inflection
iii. Gender by affixation of the morpheme -I, e.g. --> -I []
iv. Case
(a) Objective case by affixation of the morpheme -, e.g. (subjective case) --> - []
(b) Possessive case by affixation of morphemes or vowels based on person, number, gender,
and/or ending of the noun (personal pronouns by prefixing -, e.g. E --> -E [ E / ])
Subjective Case Ending of Person Number Gender Possessive
(Examples) the Noun Case
Singular - [ ]
First
Plural - []
Masculine - []
Singular
Ending with Second Feminine - []
consonant Plural - []
Masculine -U []
Singular
Third Feminine - []
Plural - []
Singular A- [A ]
First
Plural A- [A]
Masculine A- [A]
Singular
Ending with Second Feminine A- [A]
A
vowel Plural A- [A]
Masculine A- [A]
Singular
Third Feminine A- [A]
Plural A- [A]
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 17/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis
Derivation
Amharic adjectives can be derived from:
i. Verbal Roots by infixing vowels between consonants (C) as shown below
Verbal Root (Examples) Pattern of Derivation Derived Adjective
ድ-ር-ቅ CኧCኧC ድኧርኧቅ [ደረቅ]
ጥ-ቅ-ር CECUC ጥEቅUር [ጥቁር]
ጥ-ብ-ብ CኧC1C1IC ጥኧብIብ [ጠቢብ]
ፍ-ጥ-ን CኧC1C1ኣC ፍኧጥኣን [ፈጣን]
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 18/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis
Inflection
Amharic adjectives can be marked for:
i. Number by affixation of morphemes or repetition of consonants (and affixing the vowel -)
Adjective in Singular Description of the Morpheme Plural Form
Form (Examples) Adjective
Ending with consonant -O -O [ ]
Ending with vowel - - [ ]
Plural formation by repetition of consonant -- [ ]
ii. Definiteness by affixation of morphemes or vowels based on number, gender, and/or ending
of the adjective.
Indefinite Adjective Ending of the Number Gender Definite Adjective
(Examples) Adjective
Feminine A- [A] / A-I [A]
Singular
A Consonant Masculine A-U [A]
Plural A-U [A]
Feminine A- [A] / A- [A]
Singular
A Vowel Masculine A- [A]
Plural A -U [A ]
iii. Gender by affixation of the morpheme -I , e.g. A --> A-I [A ]
iV. Case (Objective Case) by affixation of the morpheme -!, e.g. A --> A-! [A!]
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 19/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis
Derivation
Amharic verbal stems (from which various forms of verbs are formed) can be derived from:
i. Verbal Roots by
(a) affixing the vowel -- to produce CC1C1C-, e.g. -- --> - [-]
(b) repeating penultimate consonants and affixing the vowels -- and -'- to produce
C C1'C1C1C-, e.g. (-)-* --> ()'))*- [+,-*-]
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 20/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis
Inflection
Amharic verbs are marked for:
i. Person, gender, number, case, and tense//aspect
Person Singular Plural
Gender
(Subjective Case) Past Tense Non-Past Tense Past Tense Non-Past Tense
First &'(-)/-* E-&,( &'(-- E--&,(
Masculine &'(-//-0 1-&,( &'(-23* 1-&,(-U
Second
Feminine &'(-5 1-&,(-I &'(-23* 1-&,(-U
Masculine &'(-7 8-&,( &'(-U 8-&,(-U
Third
Feminine &'(-73 1-&,( &'(-U 8-&,(-U
Objective Case
Tense Subjective Case
Person Gender Singular Plural
First &'(-7-: &'(-7--
Third Person, Masculine &'(-7-//-0
Second &'(-7-23*
Singular, Feminine &'(-7-5
Masculine Masculine &'(-7-;
Third &'(-7-2<;
Past Feminine &'(-7-21
Tense First &'(-73-: &'(-73--
Third Person, Second Masculine &'(-73-0
&'(-73-23*
Singular, Feminine &'(-73-5
Feminine Third Masculine &'(-73-;
&'(--73-2<;
Feminine &'(-73-21
.. .. .. .. .. ..
. . . . . .
etc etc etc etc etc etc
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 21/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis
Inflection
ii. Mood
Mood
Number Person Gender Completed
Command Request Negative
Action
First -!/-# $-% $-% A$-E-%
Masculine -(/-) % *-+ A$-*-%
Second
Singular Feminine -- %-I *-+-I A$-*-%-I
Masculine -/ 0-% 0-% A$-0-%
Third
Feminine -/3 *-% *-% A$-*-%
First -4 E4-% E4-% A$-E4-%
Plural Second -53# %-U *-+-U A$-*-%-U
Third -U 0-%-U 0-%-U A$-0-%-U
Note
Amharic verbs in general show high degree of inflection since person, case, gender, number,
tense, aspect, mood and others are marked on the verb. For example, A !" indicates:
the subject E& (third person, masculine, singular)
the object E)! (first person, plural)
negation A…"
past tense
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 22/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis
State Machines
• State machines are widely used in NLP for modeling phonology, morphology and syntax.
• State machines are formal models that consist of states, transitions among states, and
an input representation.
♦ States – represent the set of properties of an abstract machine
♦ Transitions – represent jumps from one state to another
♦ Inputs – sequences of symbols or letters that can be read by the machine
• A machine with finite number of states is called finite state machine (FSM).
• FSM has two special states: start state and final state.
1 1 Input symbol
0 Transition
1 Final state
S0 S1 S2
0
Start state
• There are two types of FSMs: finite state automata and finite state transducers.
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 23/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis
• Finite state automaton (FSA) is finite state machine that only accepts a set of given
strings (a language).
• In deterministic FSA, every state has one transition for each possible input.
1 1
0
ε S2
S0 S1
0
♦ Strings accepted by this deterministic FSA are: ε, 1, 11, 111, 00, 010,
1010, 10110, etc.
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 24/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis
• In non-deterministic FSA, an input can lead to one, more than one or no transition for
a given state.
ε S2
S1
0
S0 0 0
1
ε
S3 S4
1
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 25/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis
Word Recognition
• FSAs can be used to recognize words in a language.
• Examples:
ሰ በ ረ
S0 S1 S2 S3
w a l k
S0 S1 S2 S3 S4
ሰበረ
S0 S1
walk
S0 S1
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 26/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis
Word Recognition
♦ Recognition of multiple words
ሰበ ቀ
S0 S1 S2
ብ
in
S2
tern
e al
S0 S1 c
i opia
S4 S5
eth
S3 anol
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 27/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis
Word Recognition
♦ Recognition of multiple words (for instance, Amharic pronouns: Eኔ, Eኛ, Aንተ,
Aንቺ, Eናንተ, Eስዎ, Eርስዎ, Eሱ, Eርሱ, Eሷ, Eርሷ, Eሳቸው, Eርሳቸው, Eነሱ, Eነርሱ)
ቺ
S1
Aን ሷ
E ሱ
ሳቸው
E ር ስዎ
S0 S2 S3 S6
ኔ
ኛ
ሱ
ነ ናነተ
S4 ር S5
ነ
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 28/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis
Modeling Morphology
• One word and multiple inflections
s
walk ed
S0 S1 S2
ing
...
ኧን
ኧህ
ኣት
ኧው
S0 ሰበር S1 ኣቸው S2
ኧኝ
ኧሽ
ኣችሁ
ኣችሁት
..
.
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 29/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis
Modeling Morphology
• Multiple words and multiple inflections
..
.
jump s
walk ed
S0 S1 S2
help ing
..
.
...
ኧን
ኧህ
..
. ኣት
ማረክ ኧው
S0 ሰበር S1 ኣቸው S2
ገደል ኧኝ
..
. ኧሽ
ኣችሁ
ኣችሁት
..
.
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 30/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis
Modeling Morphology
• One word and multiple inflections with affixes
.
.
.
. ኧን
.
. ህ
Eንዲ ኣት
Eንዳይ ኧው
S0 ከሚ S1 ሰብር S2 ኣቸው S3
ሊ ብን
የሚ በት
.
. ለት
.
ባቸው
.
.
.
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 31/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis
Modeling Morphology
• Multiple words and multiple inflections with affixes
.
.
.
. ኧን
.
. ህ
.
Eንዲ . ኣት
.
Eንዳይ ማርክ ኧው
S0 ከሚ S1 ሰብር S2 ኣቸው S3
ሊ ገድል ብን
.
የሚ . በት
.
.
. ለት
.
ባቸው
.
.
.
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 32/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis
Modeling Morphology
• Marking part-of-speech
ion
[word] y cate
S0 S1 S3 S5
ism er y
ist
S2 S4
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 33/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis
Modeling Morphology
• Marking part-of-speech
ion
[word] y cate
S0 N Adj V
ism er y
ist
N N
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 34/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis
... walk walked walking walks wall walls want wanted wanting
wants warn warned warning warns ...
d
e
k s
i
l n g
l
s
d
e
w a n t s
i g
n
r e d
n s
i g
n
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 35/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 36/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis
d Discovered Morphology
e
k s • Stems - with common
i suffix tree:
l n g
l ♦ walk
s
♦ want
d ♦ warn
e
w a n t s
i • Morphemes - frequent
n g suffix tree:
r e d ♦ ε
n s ♦ – ed
i ♦ –s
n g ♦ – ing
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 37/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis
ኧው
Discovered Morphology
Uበት • Stems - with common
ሰብር Uባቸው suffix tree:
Uት
♦ ሰብር
ገድል ♦ ገድል
ኧው
ሚ Uበት • Morphemes - frequent
Uባቸው suffix tree:
Uት
Eንደ ♦ ε
♦ – ኧው
ኧው
ማይ
♦ – Uበት
Uበት
ሰብር Uባቸው
♦ – Uባቸው
Uት ♦ – Uት
• Other affixes:
ገድል ኧው
Uበት
♦ – Eንደ
Uባቸው ♦ –ሚ–
Uት ♦ – ማይ –
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 38/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis
• Finite state transducers (FSTs) are extensions of finite state automata (FSA) that can
generate outputs.
• FSTs can be considered as:
♦ Recognizer: a machine that takes a pair of strings as input and outputs
“accept” if the string-pair is in the string-pair language, and
“reject” if it is not.
♦ Generator: a machine that outputs pairs of strings of the language, i.e. the
output is a “yes” or “no”, and a pair of output strings.
♦ Translator: a machine that reads a string and outputs another string.
♦ Set relater: a machine that computes relations between sets.
b:b b:ε b b b b
b
ε ε
a:b a a
S0 S1 S0 S1 S0 S1
b b
a:ba a a
ba ba
Different ways of representing input/output relations in FSTs
N.B: Identical input/output pairs can be written using one symbol, e.g. “b:b” Î “b”.
The ε symbol represents empty symbol.
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 39/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis
• Depending on the type of accepted input and produced output, FSTs can be:
♦ String-to-string transducers: produce strings as outputs.
♦ String-to-weight transducers: produce weights as outputs.
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 40/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis
S1
b:ε
b a/2 b/3
a:b S1
S0
S0/4 S2/1
a:ba
b/5
aab
Initial weight Final weight
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 41/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis
Two-Level Morphology
• In the finite-state morphology paradigm, a word is represented as a correspondence
between a lexical level and the surface level.
♦ Lexical level represents a concatenation of morphemes making up a word.
♦ Surface level represents the concatenation of letters which make up the actual
spelling of the word.
• Morphological parsing is the process of building a structured representation of words by
breaking down into component morphemes. For example:
♦ “bigger” is morphologically parsed as “big+ADJ+COMPARATIVE”.
♦ “lower” is morphologically parsed as “low+ADJ+COMPARATIVE”.
♦ “ተማሪዎች” is morphologically parsed as “ተማሪ+N+PLURAL”.
• Thus, morphological parser is used to identify the correspondence between a lexical
level and the surface level.
♦ For example, the lexical level representation for the surface level word “lower” is
“low+ADJ+COMPARATIVE”.
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 42/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis
Two-Level Morphology
• Two-level morphology is an important application of FSTs to morphological
representation and parsing.
b i g ε +ADJ ε +COMP
S1 S2 S3 S4 S5 S6 S7
b i g g ε e r
S0
l o w
S8 S9 w
l o
• FSTs can also be used to implement spelling rules applied during inflection of words.
Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 43/44
TOC: Course Syllabus