0% found this document useful (0 votes)
27 views

Lecture 02

Uploaded by

mengesha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Lecture 02

Uploaded by

mengesha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Natural Language Processing (COSC 6405)

Lecture 02: Morphological Analysis

Department of Computer Science,


Addis Ababa University

Yaregal Assabie

2018/19— Sem I
Introduction Terminologies
English Morphology Kinds of Morphemes
Terminologies
Introduction
Amharic Morphology Morphological Types
Kinds of Morphemes
English Morphology
Models for Morphological Analysis Morphological Rules
Morphological Types
Amharic Morphology
Morphological Rules
Introduction: Terminologies
Introduction: Terminologies

Morphology ስነ ( ስነ‫ו‬እֶድ
‫ו‬እֶድ ) - the study of the structure of words.
ስነ ‫ו‬እֶድ ‫ו‬እֶድ)- minimal units of morphology, e.g. helpfulness and ָጅነُ.
Morpheme ( ‫ו‬እֶድ
‫ו‬እֶድ አ‫ו‬ድ
Stem ( አ‫ו‬ድ )-አ‫ו‬ድpart of the word that never changes even when morphologically
ስ‫ץ‬
ስ‫ ץ‬For ስ‫ץ‬ ስነ ‫ו‬እֶድ
inflected. example, walk is the stem for the words walk, walks, walking, and
‫ו‬እֶድ
walked. ‫ר‬በ‫ ץ‬isአ‫ו‬ድ the stem for the words ‫ר‬በ‫ץ‬ኩ , ‫ר‬በ‫ ֹּלץ‬, ‫ר‬በ‫ץ‬ን , ‫ר‬በ‫ץ‬አ٤ሁ , etc.
Root/Lemma ( ስ‫) ץ‬- citation form of a set of words, e.g. break is the root form for
the words break, breaks, breaking, broke, and broken. Amharic root form is
usually a sequence of three consonants known as radicals. For example, ስብ‫ ץ‬is
the root form for ‫ר‬በ‫ ץ‬, ‫ר‬ብ‫ ץ‬, ስበ‫ ץ‬, አ‫ר‬በ‫ ץ‬, ‫רـ‬በ‫ ץ‬, ‫וֹר‬በ‫ ץ‬, etc.
Part-of-Speech/Lexical Category/Word Class ( የቃָ ክፍָ ) - a linguistic category of
words that explains how the word is used in a sentence. Although different
languages may have different classification schemes, English and Amharic words
are usually classified into eight lexical categories: noun, pronoun, adjective, verb,
adverb, preposition, conjunction and interjection. Morphologically important
parts-of-speech in English and Amharic include: nouns, adjectives and verbs.
Morphological Analysis - the process of finding morphemes of a word. It is an
important component of Spelling Correction, Machine Translation, Information
Retrieval, Text Generation and other natural language systems.
Morphological Generation - the process of generating different words from a
morpheme.
Lemmatisation - the process of finding the root/lemma of a word.
Stemming - the process of finding the stems of a word.

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 2/23

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 2/44
Introduction Terminologies
English Morphology Kinds of Morphemes
Amharic Morphology Morphological Types
Models for Morphological Analysis Morphological Rules

Introduction: Kinds of Morphemes

Morphemes can be classified in two ways:


1 Free versus Bound
2 Roots, Affixes versus Combining Forms

Free versus Bound


Free morphemes - morphemes that can stand on their own to give meaning.
e.g. friend in friendly
large in enlarge
help in helpfulness
perform in performance
ָጅ in ָጅነُ
ቤُ in ቤِ٤ [ቤ(ُኦ)٤]
ቤُ in ከቤُ

Bound morphemes - morphemes that cannot stand on their own as a word.


e.g. -ly in friendly
en- in enlarge
-ful and -ness in helpfulness
-ance in performance
- ነُ in ָጅነُ
- ኦ٤ in ቤِ٤ [ቤ(ُኦ)٤]
ከ - in ከቤُ

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 3/44
Introduction Terminologies
English Morphology Kinds of Morphemes
Amharic Morphology Morphological Types
Models for Morphological Analysis Morphological Rules

Introduction: Kinds of Morphemes

Roots, Affixes versus Combining Forms


Roots - morphemes (within a non-compound word) that makes the the most precise and
concrete contribution to the word’s meaning, and is either the sole morpheme
or else the only one that is not an affix.
e.g. break in breaks
help in unhelpfulness
ስብ‫ ץ‬in ‫ר‬በ‫ץ‬
ስብ‫ ץ‬in ‫רـ‬በ‫נ‬٤ [‫רـ‬በ(‫ץ‬ኧ)٤]

Affixes - bound morphemes that either precede, follow or are inserted inside the
root or stem.
e.g. Prefix: en- in enlarge is an affix that precedes the root large
Suffix: -ly in largely is an affix that follows the root large
Infix: -ና- in ُናን‫ ֹּל‬is an affix that is inserted inside the root ُን‫ֹּל‬
Circumfix: አָ…ኧ‫ ו‬in አָ‫ר‬በ‫[ ונ‬አָ‫ר‬በ(‫ץ‬ኧ)‫ ]ו‬is an affix that precedes and
follows the stem ‫ר‬በ‫ץ‬
Combining Forms - morphemes that are formed from two bound or free-like roots.
e.g. two free roots: photo and graph in photograph
two bound roots: electro- and -lysis in electrolysis
bound and free roots: Ethio- and America in Ethio-American

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 4/44
Introduction Terminologies
English Morphology Kinds of Morphemes
Amharic Morphology Morphological Types
Models for Morphological Analysis Morphological Rules

Introduction: Morphological Types

There are three types of morphological structures:


1 Isolating
2 Agglutinative
3 Inflectional

Isolating
Languages with isolating morphological structures have morphemes representing
words in the language in most cases. There is little or no morphological change in
words, and such languages do not require extensive study on morphological analysis.

Agglutinative
Languages with agglutinative morphological structures have words formed from lots of
morphemes that are glued together. Words in these language groups have lots of
easily separable morphemes.

Inflectional
In languages with inflectional morphological structures, morphemes are fused together
and require complex morphological analyzer to separate morphemes. Morphemes may
be fused together in several ways such as affixation and doubling all or part of a word.

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 5/44
Introduction Terminologies
English Morphology Kinds of Morphemes
Amharic Morphology Morphological Types
Models for Morphological Analysis Morphological Rules

Introduction: Morphological Rules

Words can be formed from morphemes in two ways:


1 Derivational Morphology
2 Inflectional Morphology

Derivational Morphology
Derivational Morphology is a morphology concerned with the way in which words are
derived from morphemes through processes such as affixation or compounding. This
derivation process usually changes the part-of-speech category.

Inflectional Morphology
Inflectional Morphology is a morphology that deals with the combination of a word with
a morpheme, usually resulting in a word of the same class as the original stem, and
serving same syntactic function. They do not change the part-of-speech category but
the grammatical function.
Inflection can by achieved by marking a word category for person (first, second, third),
gender (feminine, neuter, masculine), number (singular, plural), case (subjective/
nominative, objective/accusative/dative, possessive/genitive), definiteness (definite,
indefinite), degree (positive, comparative, superlative), tense (past, present, future),
aspect(perfective, imperfective/continuous), politeness (impolite, polite), etc.

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 6/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis

English Morphology: Nouns

Derivation
English nouns can be derived from:
Nouns: e.g. book –> booklet (by affixing -let)
prince –> princess (by affixing -ss)
Ethiopia –> Ethiopian (by affixing -n)
child –> childhood (by affixing -hood)
art –> artist (by affixing -ist)
Adjectives: e.g. equal –>equality (by affixing -ity)
good –> goodness (by affixing -ness)
radical –> radicalism (by affixing -ism)
Verbs: e.g. perform –> performance (by affixing -ance)
move –> movement (by affixing -ment)
build –> building (by affixing -ing)
build –> builder (by affixing -er)
construct –> construction (by affixing -ion)
arrive –> arrival (by affixing -al)
sing –> song (by changing a vowel)
defend –> defense (by changing the last consonant)
Compound words: e.g. shoe maker –> shoemaker (compounding two nouns)
black board –> blackboard (compounding adjective and noun)
play time –> playtime (compounding verb and noun)
over coat –> overcoat (compounding preposition and noun)

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 7/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis

English Morphology: Nouns

Inflection
English nouns can be marked for:
Gender: e.g. actor –> actress (by affixing -ess)
hero –> heroine (by affixing -ine)
Number: e.g. cow –> cows (by affixing -s)
ox –> oxen (by affixing -en)
child –> children (by affixing -ren)
tooth –> teeth (by changing vowels)

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 8/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis

English Morphology: Adjectives

Derivation
English adjectives can be derived from:
Nouns: e.g. help –> helpful (by affixing -ful)
help –> helpless (by affixing -less)
person –> personal (by affixing -al)
child –> childish (by affixing -ish)
Adjectives: e.g. readable –> unreadable (by affixing un-)
edible –> inedible (by affixing in-)
legible –> illegible (by affixing il-)
responsible –> irresponsible (by affixing ir-)
possible –> impossible (by affixing ir-)
Verbs: e.g. interest –> interesting (by affixing -ing)
damage –> damaged (by affixing -ed)
drink –> drunk (by vowel change)
write –> written (by affixing -en)
read –> readable (by affixing -able)
converse –> conversant (by affixing -ant)
repulse –> repulsive (by affixing -ive)
Compound words: e.g. hand written –> handwritten (compounding noun and adjective)
blue black –> blue-black (compounding two adjectives)
ever green –> evergreen (compounding adverb and adjective)
over active –> overactive (compounding prep. and adjective)

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 9/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis

English Morphology: Adjectives

Inflection
Most English adjectives are marked for degree as shown below.
e.g. fast (positive degree) –> faster: comparative degree (by affixing -er)
fastest: superlative degree (by affixing -est)

Some others have irregular forms.

Positive Comparative Superlative


good better best
bad worse worst
little less least
much
more most
many

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 10/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis

English Morphology: Verbs

Derivation
English verbs can be derived from:
Nouns: e.g. bug –> debug (by affixing de-)
organ –> organize (by affixing -ize/-ise)
beauty –> beautify (by affixing -ify)
power –> empower (by affixing em-)
throne –> enthrone (by affixing en-)
breath –> breathe (by affixing the vowel -e)
Adjectives: e.g. national –> nationalize (by affixing -ize/-ise)
bold –> embolden (by affixing em-...-en)
pure –> purify (by affixing -ify)
tight –> tighten (by affixing -en)
large –> enlarge (by affixing en-)
Verbs: e.g. do –> redo (by affixing re-)
do –> undo (by affixing un-)
compose –> decompose (by affixing de-)
satisfy –> dissatisfy (by affixing dis-)
Compound words: e.g. stir fry –> stir-fry (compounding two verbs)
hand wash –> hand-wash (compounding noun and verb)
dry clean –> dry-clean (compounding adjective and verb)
over act –> overact (compounding preposition and verb)

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 11/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis

English Morphology: Verbs

Inflection
English verbs can be marked for aspect and tense (and person and number for present
tense). For example, the verb give has the following inflected forms:
give: present tense for plural, first person singular, or second person singular
gives: present tense for third person singular
gave: past tense
given: perfective aspect
giving: imperfective/continuous aspect
With a special case, the verb be can be inflected for tense, aspect, person and number
as shown below.
am: present tense for first person singular
is: present tense for third person singular
are: present tense for first person plural, second person, or third person plural
was: past tense for first person singular, or third person singular
were: past tense for plural, or second person singular
been: perfective aspect
being: imperfective/continuous aspect

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 12/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis

Amharic Morphology: Nouns

Derivation
Amharic nouns can be derived from:
i. Verbal Roots by infixing vowels between consonants (C) as shown below
Verbal Root (Examples) Pattern of Derivation Derived Noun
ጥ-ቅ-‫ו‬ CእCእC ጥእቅእ‫[ ו‬ጥቅ‫]ו‬
‫ו‬-‫ץ‬-ُ CእCC ‫ו‬እ‫]ُץו[ ُץ‬
‫ו‬-ָ-ስ CኧCC ‫ו‬ኧָስ [‫ָא‬ስ]
ን-ግ-‫ץ‬ CኧCኧC ንኧግኧ‫[ ץ‬ነገ‫]ץ‬
ድ-ክ-‫ו‬ CእCኣC ድእክኣ‫[ ו‬ድካ‫]ו‬
ֱ-‫ו‬-‫ו‬ CእCኧC ֱእ‫ו‬ኧ‫]ואֱ[ ו‬
ግ-ብ-ዕ CእC ግእብ [ግብ]
ጥ-ው-‫ו‬ CኦC ጥኦ‫[ ו‬ጦ‫]ו‬
ቅ-ው-‫ץ‬-ጥ CኡCC ቅኡ‫ץ‬ጥ [‫ץשּׁ‬ጥ]
ድ-ብብ-ቅ CእC1C1እC ድእብብእቅ [ድብቅ]

ii. Adjectives by suffixing bound morphemes


Adjective (Examples) Morpheme Derived Noun
ደግ -ነُ ደግ-ነُ [ደግነُ]
ቅ‫ץ‬ብ -ኧُ ቅ‫ץ‬ብ-ኧُ [ቅ‫ץ‬በُ]
ብֱָ -ኣُ ብֱָ-ኣُ [ብָሃُ]
ብָጥ -ኦ ብָጥ-ኦ [ብָጦ]

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 13/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis

Amharic Morphology: Nouns

Derivation
iii. Stems by prefixing or suffixing bound morphemes
Stem (Examples) Morpheme Derived Noun
ው‫ץ‬ድ- -ኧُ ው‫ץ‬ድ-ኧُ [ው‫ץ‬ደُ]
ቅዳስ- -ኤ ቅዳስ-ኤ [ቅዳሴ]
እ‫ץ‬ጅ- -እና እ‫ץ‬ጅ-እና [እ‫ץ‬ጅና]
‫וָֹּל‬- -ኣُ ‫וָֹּל‬-ኣُ [‫]ُדָֹּל‬
ስ‫ץ‬ቅ- -ኦ‫ֹּל‬ ስ‫ץ‬ቅ-ኦ‫[ ֹּל‬ስ‫]ֹּלבּץ‬
٤ָ- -ኦٍ ٤ָ-ኦٍ [٤ֹٍ]
ውጥ- -ኤُ ውጥ-ኤُ [ውጤُ]
ፍֳግ- -ኣ ፍֳግ-ኣ [ፍֳጋ]
ናፍቅ- -ኦُ ናፍቅ-ኦُ [ናፍ‫]ُבּ‬
ድ‫ץ‬ግ- -ኢُ ድ‫ץ‬ግ-ኢُ [ድ‫ץ‬ጊُ]
‫וֹר‬ክ- -ኢ ‫וֹר‬ክ-ኢ [‫וֹר‬ኪ]
ዝ‫ץ‬ፍ- -ኢያ ዝ‫ץ‬ፍ-ኢያ [ዝ‫ץ‬ፊያ]
ጠ‫ושׂ‬- -ኤٍ ጠ‫ושׂ‬-ኤٍ [ጠ‫]ٍהשׂ‬
-ְድ ‫א‬- ‫א‬-ְድ [‫ְא‬ድ]

iv. Stem-like Verbs by suffixing the bound morpheme -ٍ


Stem-like Verb (Examples) Morpheme Derived Noun
ዝ‫ו‬- -ٍ ዝ‫ו‬-ٍ [ዝ‫]ٍו‬
ደስ- -ٍ ደስ-ٍ [ደስٍ]

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 14/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis

Amharic Morphology: Nouns

Derivation
v. Nouns by suffixing bound morphemes
Noun (Examples) Morpheme Derived Noun
ָጅ -ነُ ָጅ-ነُ [ָጅነُ]
እግ‫ץ‬ -ኧኛ እግ‫ץ‬-ኧኛ [እግ‫נ‬ኛ ]
ክብ‫ץ‬ -ኧُ ክብ‫ץ‬-ኧُ [ክብ‫]ُנ‬
ከ‫דـ‬ -ኤ ከ‫ דـ‬-ኤ [ከ‫]הـ‬
ጢ‫ו‬ -ኦ ጢ‫ו‬-ኦ [ጢ‫]ז‬
ኢُዮጵያ -ኣዊ ኢُዮጵያ-ኣዊ [ኢُዮጵያዊ]
እንግֵዝ -ኛ እንግֵዝ-ኛ [እንግֵዝኛ]

vi. Compound Words (sometimes by affixing the vowels ኧ and ኦ )


Classes of Compound Words Example Derived Noun
Noun + Noun ብ‫ ُנ‬+ ‫ו‬ጣድ ብ‫ו ُנ‬ጣድ
Noun + [ኧ] + Noun ቤُ + [ኧ] + ‫א‬ንግስُ ቤ‫א ـ‬ንግስُ
Noun + Verbal Stems ָብ + ወֳድ- ָብ ወֳድ
Verbal Stem + [ኦ] + Verbal Stem ‫ُץר‬- + [ኦ] + አደ‫ץ‬- ‫ ِץר‬አደ‫ץ‬
Verbal Stem + [ኦ] + Noun ‫ُץר‬- + [ኦ] + አዳ‫ע‬ ‫ ِץר‬አዳ‫ע‬

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 15/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis

Amharic Morphology: Nouns

Inflection
Amharic nouns can be marked for:
i. Number by affixation of morphemes (and vowel changes) or repetition of words
Noun in Singular Description of the Noun Morpheme Plural Form
Form (Examples)
 Ending with consonant -O -O []
 Ending with vowel -   
A  Personal Pronoun E- E-A  [E ]
 Proper Noun E- E
 Plural formation by repetition -- []
A  Loanwords from Geez (do not have A
 similar patterns for plural formation) 
 !  "
ii. Definiteness by affixation of morphemes or vowels based on number, gender, and/or ending
of the noun.
Indefinite Noun Ending of Number Gender Definite Noun
(Examples) the Noun
Feminine -# [#] / -I& ['&]
 Singular
Consonant Masculine -U [)]
Plural -U [*]
Feminine A +-# [A +#] / A +-,& [A +,&]
Singular
A + Vowel Masculine A +-- [A +-]
Plural A + -U [A + *]

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 16/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis

Amharic Morphology: Nouns

Inflection
iii. Gender by affixation of the morpheme -I, e.g.  --> -I []
iv. Case
(a) Objective case by affixation of the morpheme -, e.g.  (subjective case) --> - []
(b) Possessive case by affixation of morphemes or vowels based on person, number, gender,
and/or ending of the noun (personal pronouns by prefixing -, e.g. E --> -E [ E / ])
Subjective Case Ending of Person Number Gender Possessive
(Examples) the Noun Case
Singular - [ ]
First
Plural - []
Masculine - []
Singular
Ending with Second Feminine - []

consonant Plural - []
Masculine -U []
Singular
Third Feminine - []
Plural - []
Singular A- [A ]
First
Plural A- [A]
Masculine A- [A]
Singular
Ending with Second Feminine A- [A]
A
vowel Plural A- [A]
Masculine A- [A]
Singular
Third Feminine A- [A]
Plural A- [A]

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 17/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis

Amharic Morphology: Adjectives

Derivation
Amharic adjectives can be derived from:
i. Verbal Roots by infixing vowels between consonants (C) as shown below
Verbal Root (Examples) Pattern of Derivation Derived Adjective
ድ-ር-ቅ CኧCኧC ድኧርኧቅ [ደረቅ]
ጥ-ቅ-ር CECUC ጥEቅUር [ጥቁር]
ጥ-ብ-ብ CኧC1C1IC ጥኧብIብ [ጠቢብ]
ፍ-ጥ-ን CኧC1C1ኣC ፍኧጥኣን [ፈጣን]

ii. Nouns by suffixing bound morphemes


Noun (Examples) Morpheme Derived Adjective
ነገር -ኧኛ ነገር-ኧኛ [ነገረኛ]
ተራራ -ኣማ ተራራ-ኣማ [ተራራማ]
ፈርስ -ኣም ፈርስ-ኣም [ፈርሳም]
ህዝብ -ኣዊ ህዝብ-ኣዊ [ህዝባዊ]

iii. Stems by suffixing bound morphemes


Stems (Examples) Morpheme Derived Adjective
ደካም- -ኣ ደካም-ኣ [ደካማ]
ንቅ- -U ንቅ-U [ንቁ]
በል- -Iታ በል-Iታ [በሊታ]

iv. Compound Words of nouns and adjectives by affixing the vowel -ኧ


e.g. ሆድ ሰፊ --> ሆድ-ኧ ሰፊ [ሆደ ሰፊ]

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 18/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis

Amharic Morphology: Adjectives

Inflection
Amharic adjectives can be marked for:
i. Number by affixation of morphemes or repetition of consonants (and affixing the vowel -)
Adjective in Singular Description of the Morpheme Plural Form
Form (Examples) Adjective
 Ending with consonant -O -O [ ]
Ending with vowel -  -  [ ]
 Plural formation by repetition of consonant -- [ ]
ii. Definiteness by affixation of morphemes or vowels based on number, gender, and/or ending
of the adjective.
Indefinite Adjective Ending of the Number Gender Definite Adjective
(Examples) Adjective
Feminine A- [A] / A-I [A]
Singular
A Consonant Masculine A-U [A]
Plural A-U [A]
Feminine A- [A] / A- [A]
Singular
A Vowel Masculine A- [A]
Plural A -U [A ]
iii. Gender by affixation of the morpheme -I , e.g. A --> A-I [A ]
iV. Case (Objective Case) by affixation of the morpheme -!, e.g. A --> A-! [A!]

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 19/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis

Amharic Morphology: Verbs

Derivation
Amharic verbal stems (from which various forms of verbs are formed) can be derived from:
i. Verbal Roots by
(a) affixing the vowel -- to produce CC1C1C-, e.g. -- --> - [-]
(b) repeating penultimate consonants and affixing the vowels -- and -'- to produce
C C1'C1C1C-, e.g. (-)-* --> ()'))*- [+,-*-]

ii. Verbal Stems by affixing morphemes

Verbal Stem Morpheme Derived Verbal Stem


(Examples)
- .- .-- [.-]
-/0- A- A--/0- [A-/0-]
234- A- A-234- [A234-]

iii. Compound Words of


(a) stems and verbs, e.g. - + A- -->  A-
(b) sub-words and verbs, e.g. 67 + A839 --> 67 A839

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 20/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis

Amharic Morphology: Verbs

Inflection
Amharic verbs are marked for:
i. Person, gender, number, case, and tense//aspect
Person Singular Plural
Gender
(Subjective Case) Past Tense Non-Past Tense Past Tense Non-Past Tense
First &'(-)/-* E-&,( &'(-- E--&,(
Masculine &'(-//-0 1-&,( &'(-23* 1-&,(-U
Second
Feminine &'(-5 1-&,(-I &'(-23* 1-&,(-U
Masculine &'(-7 8-&,( &'(-U 8-&,(-U
Third
Feminine &'(-73 1-&,( &'(-U 8-&,(-U

Objective Case
Tense Subjective Case
Person Gender Singular Plural
First &'(-7-: &'(-7--
Third Person, Masculine &'(-7-//-0
Second &'(-7-23*
Singular, Feminine &'(-7-5
Masculine Masculine &'(-7-;
Third &'(-7-2<;
Past Feminine &'(-7-21
Tense First &'(-73-: &'(-73--
Third Person, Second Masculine &'(-73-0
&'(-73-23*
Singular, Feminine &'(-73-5
Feminine Third Masculine &'(-73-;
&'(--73-2<;
Feminine &'(-73-21
.. .. .. .. .. ..
. . . . . .
etc etc etc etc etc etc

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 21/44
Introduction
Nouns
English Morphology
Adjectives
Amharic Morphology
Verbs
Models for Morphological Analysis

Amharic Morphology: Verbs

Inflection

ii. Mood
Mood
Number Person Gender Completed
Command Request Negative
Action
First -!/-# $-% $-% A$-E-%
Masculine -(/-) % *-+ A$-*-%
Second
Singular Feminine -- %-I *-+-I A$-*-%-I
Masculine -/ 0-% 0-% A$-0-%
Third
Feminine -/3 *-% *-% A$-*-%
First -4 E4-% E4-% A$-E4-%
Plural Second -53# %-U *-+-U A$-*-%-U
Third -U 0-%-U 0-%-U A$-0-%-U

Note
Amharic verbs in general show high degree of inflection since person, case, gender, number,
tense, aspect, mood and others are marked on the verb. For example, A !" indicates:
 the subject E& (third person, masculine, singular)
 the object E)! (first person, plural)
 negation A…"
 past tense 

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 22/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis

State Machines

• State machines are widely used in NLP for modeling phonology, morphology and syntax.

• State machines are formal models that consist of states, transitions among states, and
an input representation.
♦ States – represent the set of properties of an abstract machine
♦ Transitions – represent jumps from one state to another
♦ Inputs – sequences of symbols or letters that can be read by the machine

• A machine with finite number of states is called finite state machine (FSM).

• FSM has two special states: start state and final state.
1 1 Input symbol
0 Transition
1 Final state
S0 S1 S2
0

Start state

• There are two types of FSMs: finite state automata and finite state transducers.

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 23/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis

Finite State Automata

• Finite state automaton (FSA) is finite state machine that only accepts a set of given
strings (a language).

• FSA can be deterministic or non-deterministic.

• In deterministic FSA, every state has one transition for each possible input.

♦ Example: A deterministic FSA that determines if a binary string contains


an even number of 0's.

1 1
0

ε S2
S0 S1
0

♦ Strings accepted by this deterministic FSA are: ε, 1, 11, 111, 00, 010,
1010, 10110, etc.

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 24/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis

Finite State Automata

• In non-deterministic FSA, an input can lead to one, more than one or no transition for
a given state.

♦ Example: A non-deterministic FSA that determines if a binary string


contains an even number of 0’s or an even number of 1’s.
1 1
0

ε S2
S1
0
S0 0 0
1
ε
S3 S4
1

♦ Strings accepted by this non-deterministic FSA are: ε, 1, 11, 111, 00,


010, 1010, 10110, 011, 11011, 1010101, etc.

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 25/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis

Finite State Automata

Word Recognition
• FSAs can be used to recognize words in a language.

• Examples:

♦ Single word recognition

ሰ በ ረ
S0 S1 S2 S3

w a l k
S0 S1 S2 S3 S4

ሰበረ
S0 S1

walk
S0 S1

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 26/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis

Finite State Automata

Word Recognition
♦ Recognition of multiple words

ሰበረ, ሰበቀ, ሰበብ ረ

ሰበ ቀ
S0 S1 S2

internal, eternal, ethical, ethiopia, ethanol

in
S2
tern
e al
S0 S1 c
i opia
S4 S5
eth
S3 anol

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 27/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis

Finite State Automata

Word Recognition
♦ Recognition of multiple words (for instance, Amharic pronouns: Eኔ, Eኛ, Aንተ,
Aንቺ, Eናንተ, Eስዎ, Eርስዎ, Eሱ, Eርሱ, Eሷ, Eርሷ, Eሳቸው, Eርሳቸው, Eነሱ, Eነርሱ)


S1
Aን ሷ
E ሱ
ሳቸው
E ር ስዎ
S0 S2 S3 S6



ነ ናነተ

S4 ር S5

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 28/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis

Finite State Automata

Modeling Morphology
• One word and multiple inflections

s
walk ed
S0 S1 S2
ing

...
ኧን
ኧህ
ኣት
ኧው
S0 ሰበር S1 ኣቸው S2
ኧኝ
ኧሽ
ኣችሁ
ኣችሁት
..
.

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 29/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis

Finite State Automata

Modeling Morphology
• Multiple words and multiple inflections
..
.
jump s
walk ed
S0 S1 S2
help ing
..
.
...
ኧን
ኧህ
..
. ኣት
ማረክ ኧው
S0 ሰበር S1 ኣቸው S2
ገደል ኧኝ
..
. ኧሽ
ኣችሁ
ኣችሁት
..
.

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 30/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis

Finite State Automata

Modeling Morphology
• One word and multiple inflections with affixes

.
.
.

. ኧን
.
. ህ
Eንዲ ኣት
Eንዳይ ኧው
S0 ከሚ S1 ሰብር S2 ኣቸው S3
ሊ ብን
የሚ በት
.
. ለት
.
ባቸው
.
.
.

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 31/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis

Finite State Automata

Modeling Morphology
• Multiple words and multiple inflections with affixes

.
.
.

. ኧን
.
. ህ
.
Eንዲ . ኣት
.
Eንዳይ ማርክ ኧው
S0 ከሚ S1 ሰብር S2 ኣቸው S3
ሊ ገድል ብን
.
የሚ . በት
.
.
. ለት
.
ባቸው
.
.
.

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 32/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis

Finite State Automata

Modeling Morphology
• Marking part-of-speech

ion

[word] y cate
S0 S1 S3 S5

ism er y
ist
S2 S4

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 33/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis

Finite State Automata

Modeling Morphology
• Marking part-of-speech

ion

[word] y cate
S0 N Adj V

ism er y
ist
N N

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 34/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis

Finite State Automata

Automatically Learning Morphology


• Collect words in a large corpus and compile into a trie data structure:

... walk walked walking walks wall walls want wanted wanting
wants warn warned warning warns ...

d
e
k s
i
l n g
l
s
d
e
w a n t s
i g
n

r e d
n s
i g
n

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 35/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis

Finite State Automata

Automatically Learning Morphology


..
. ኧው
Eንደሚሰብረው
Uበት
Eንደሚሰብሩበት
Eንደሚሰብሩባቸው ሰብር Uባቸው
Uት
Eንደሚሰብሩት
Eንደሚሰብር
Eንደሚገድለው ገድል ኧው
Eንደሚገድሉበት
ሚ Uበት
Eንደሚገድሉባቸው
Uባቸው
Eንደሚገድሉት
Uት
Eንደሚገድል
Eንደማይሰብረው Eንደ
Eንደማይሰብሩበት ኧው
Eንደማይሰብሩባቸው ማይ Uበት
Eንደማይሰብሩት ሰብር Uባቸው
Eንደማይሰብር Uት
Eንደማይገድለው
Eንደማይገድሉበት
ገድል ኧው
Eንደማይገድሉባቸው
Eንደማይገድሉት Uበት
Eንደማይገድል Uባቸው
..
. Uት

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 36/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis

Finite State Automata

Automatically Learning Morphology


• Identify frequent suffix trees

d Discovered Morphology
e
k s • Stems - with common
i suffix tree:
l n g
l ♦ walk
s
♦ want
d ♦ warn
e
w a n t s
i • Morphemes - frequent
n g suffix tree:
r e d ♦ ε
n s ♦ – ed
i ♦ –s
n g ♦ – ing

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 37/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis

Finite State Automata

Automatically Learning Morphology

ኧው
Discovered Morphology
Uበት • Stems - with common
ሰብር Uባቸው suffix tree:
Uት
♦ ሰብር
ገድል ♦ ገድል
ኧው
ሚ Uበት • Morphemes - frequent
Uባቸው suffix tree:
Uት
Eንደ ♦ ε
♦ – ኧው
ኧው
ማይ
♦ – Uበት
Uበት
ሰብር Uባቸው
♦ – Uባቸው
Uት ♦ – Uት
• Other affixes:
ገድል ኧው
Uበት
♦ – Eንደ
Uባቸው ♦ –ሚ–
Uት ♦ – ማይ –

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 38/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis

Finite State Transducers

• Finite state transducers (FSTs) are extensions of finite state automata (FSA) that can
generate outputs.
• FSTs can be considered as:
♦ Recognizer: a machine that takes a pair of strings as input and outputs
“accept” if the string-pair is in the string-pair language, and
“reject” if it is not.
♦ Generator: a machine that outputs pairs of strings of the language, i.e. the
output is a “yes” or “no”, and a pair of output strings.
♦ Translator: a machine that reads a string and outputs another string.
♦ Set relater: a machine that computes relations between sets.

b:b b:ε b b b b
b
ε ε
a:b a a
S0 S1 S0 S1 S0 S1
b b
a:ba a a
ba ba
Different ways of representing input/output relations in FSTs
N.B: Identical input/output pairs can be written using one symbol, e.g. “b:b” Î “b”.
The ε symbol represents empty symbol.

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 39/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis

Finite State Transducers

• Like FSA, FSTs can be deterministic (called sequential transducer) or non-deterministic


regarding their input.
• In sequential transducer, for each state there exists at most one outgoing transition on
one input symbol.
♦ However, sequential transducers may have nondeterministic output.
♦ Thus, multiple outgoing transitions with one output symbol may occur.

• Depending on the type of accepted input and produced output, FSTs can be:
♦ String-to-string transducers: produce strings as outputs.
♦ String-to-weight transducers: produce weights as outputs.

• The weights in string-to-weight transducers in most cases represent probabilities.


♦ Thus, string-to-weight transducers are also known as weighted automata or
probabilistic automata.
♦ In addition to the output weights of the transitions, string-to-weight
transducers are provided with initial and final weights (to the initial and final
states, respectively).

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 40/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis

Finite State Transducers

S1
b:ε
b a/2 b/3

a:b S1
S0
S0/4 S2/1
a:ba
b/5

A sequential string-to-string transducer A sequential string-to-string transducer


with nondeterministic output
Input “aab” produces: “bbab” Input “ab” produces: 4+2+3+1 = 10

aab
Initial weight Final weight

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 41/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis

Finite State Transducers

Two-Level Morphology
• In the finite-state morphology paradigm, a word is represented as a correspondence
between a lexical level and the surface level.
♦ Lexical level represents a concatenation of morphemes making up a word.
♦ Surface level represents the concatenation of letters which make up the actual
spelling of the word.
• Morphological parsing is the process of building a structured representation of words by
breaking down into component morphemes. For example:
♦ “bigger” is morphologically parsed as “big+ADJ+COMPARATIVE”.
♦ “lower” is morphologically parsed as “low+ADJ+COMPARATIVE”.
♦ “ተማሪዎች” is morphologically parsed as “ተማሪ+N+PLURAL”.
• Thus, morphological parser is used to identify the correspondence between a lexical
level and the surface level.
♦ For example, the lexical level representation for the surface level word “lower” is
“low+ADJ+COMPARATIVE”.

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 42/44
Introduction
State Machines
English Morphology
Finite State Automata
Amharic Morphology
Finite State Transducers
Models for Morphological Analysis

Finite State Transducers

Two-Level Morphology
• Two-level morphology is an important application of FSTs to morphological
representation and parsing.

Lexical level: ተ ማ ሪ +N ε +PLU


S0 S1 S2 S3 S4 S5 S6
Surface level: ተ ማ ሪ ε ዎ ች

b i g ε +ADJ ε +COMP
S1 S2 S3 S4 S5 S6 S7
b i g g ε e r
S0
l o w
S8 S9 w
l o

• FSTs can also be used to implement spelling rules applied during inflection of words.

Department of Computer Science, Addis Ababa University Lecture 02: Morphological Analysis 43/44
TOC: Course Syllabus

Previous: NLP: Background and Overview

Current: Morphological Analysis

Next: Syntax and Parsing

You might also like