NLP Lecture 3
NLP Lecture 3
NLP
Presented by:
Dr. Esraa Abdalla
Introduction to Lexical Analysis
Words are fundamental units in natural
language texts.
Lexical analysis aims to understand
word structure and morphology.
Words can be seen as strings or
abstract objects representing sets of
strings.
Example: 'delivers' relates to the
lemma 'DELIVER'.
Introduction to Lexical Analysis
• Lexical analysis is the study of word structure in
NLP.
• Allows better
• Helps in • Essential for
understanding of
syntactic parsing languages with
text structure and
and POS tagging. rich morphology.
meaning.
Lexical Analysis Tasks
• Uses Finite State Transducers • Helps in recognizing and • Commonly used in NLP
(FSTs) to analyze word generating word forms. applications.
structure.
Finite State Morphology (FSM)
▪FSM uses finite state transducers
(FSTs) to map between different levels
of representation.
▪FSTs are efficient and can handle both
parsing and generation.
▪Used to model morphology
▪Example: Mapping 'delivers' to
'DELIVER + {3rd, Sg, Present}'.
FSM - Morphonology
▪Morphonology deals with phonological changes at morpheme
boundaries.
▪Example: Plural affixation in English (e.g., 'cats' vs. 'dogs').
▪FSTs handle these changes through rule-governed mappings.
▪Efficiently manages orthographic variations and phonological
rules.
FSM - Morphotactics
Morphotactics
FSTs model the
involves the ordering
morphotactic rules
of morphemes in a
of a language.
word.
• Efficient for
handling regular
morphology.
Advantages of FSM
• Examples:
• FSM struggles to
• - English: go → went
handle such cases.
• - German: laufen → lief
Infixation in Morphology
• Some languages insert affixes inside words.
• Example (Tagalog): sulat → sumulat (write).
• FSM requires additional rules to handle this.
Non-Concatenative Morphology
• Arabic & Hebrew use root-and-pattern systems.
• Example:
- ktb (root) → kitab (book), katib (writer)
• Requires multi-layered processing.
Case Study: Russian Noun
Declensions
• Russian nouns change form based on case and number.
• Example:
- karta (map, nominative)
- karty (maps, plural)
• Requires complex analysis.
Problems with FSM for Complex
Morphology
• Cannot handle non-adjacent word changes.
• Needs additional models for complex word structures.
• Limited in languages with infixes or internal changes.
Possible Solutions
Infixation and
FSM handles
root-and-
these by
template
recasting
morphology
problems as
require complex
linear ones.
FSTs.
Paradigm-Based Lexical Analysis
▪Views word structure in terms of paradigms.
▪Each cell in the paradigm represents a unique combination of
morphosyntactic features.
▪Captures generalizations and exceptions through inheritance
hierarchies and default mechanisms.
▪Handles difficult morphology more naturally than FSM.
▪Example: Russian noun inflection classes.
Paradigm-Based - Inheritance
Inheritance Default
hierarchies inheritance allows
capture shared for efficient
features across representation of
inflectional regular and
classes. irregular forms.
Overriding
Example: Russian
defaults to handle
nouns sharing
exceptions and
case and number
semi-regular
features.
forms.
Handling Exceptions with
Paradigms
• Some words have unique forms.
• Example:
- English: mouse → mice (not mouses)
• Paradigm models store exceptions effectively.
Combining Paradigms with FSM
• Some NLP systems use both approaches.
• FSM for simple rules, paradigms for complex cases.
• Improves accuracy in word analysis.
Future of Paradigm-Based Models
• AI and machine learning improving word analysis.
• Hybrid models integrating FSM and paradigms.
• Better handling of multi-language morphology.
Advances in Lexical Analysis
• Deep learning is improving morphological analysis.
• Neural networks can learn complex word patterns.
Hybrid Approaches
• Combining FSM and Paradigm models for better results.
• Example:
- AI systems using rule-based + machine learning models.
Challenges in Lexical Analysis
• Handling low-resource languages.
• Building more robust models for diverse languages.
Why Lexical Analysis Matters
• Helps in machine translation, search engines, and AI
applications.
• Improves text understanding and generation.
Applications of Lexical Analysis
Machine Translation (MT): Mapping between source and target
language morphological structures.
Information Retrieval (IR): Aids in stemming and generating
search terms.
Text Preprocessing: Used for syntactic analysis and tokenization.
Example: Tokenization in languages without clear word
boundaries.
Challenges in Lexical Analysis
Handling morphologically rich languages.
Dealing with ambiguity in morphological analysis.
Integrating rule-based and statistical methods.
Example: Ambiguity in Russian case and number forms.
Future Directions in Lexical
Analysis
Enhancing FSM with paradigm-based insights.
Developing more robust statistical models for morphological
analysis.
Exploring the role of lexical analysis in multilingual NLP
applications.
Integrating symbolic and statistical approaches for better
performance.
Summary of Key Points
• Lexical analysis is crucial for NLP.
• FSM works well for regular words, but has limitations.
• Paradigm-based models are more flexible.
Final Thoughts
• Ongoing research is improving lexical analysis techniques.
• The future lies in combining rule-based and AI models.
• Questions?
Thank You!
References
Indurkhya, Nitin, and Fred J. Damerau. Handbook of natural
language processing. Chapman and Hall/CRC, 2010.