STEMMING AND LEMMATIZATION
INTRODUCTION
Used in Natural Language Processing Stemming and Lemmatization reduce a
(NLP) word to its root or core.
Help to understand that some ”different” Usually is a pre-processing step for NLP
words come from the same core root
Example:
Eaten
Eating
Eats
2023FJ - [email protected] 2
STEMMING
2023FJ - [email protected] 3
STEMMING
Easiest approach
Words are reduced to their
word stems
A stem may not be the same as
the root of the word
https://devopedia.org/images/article/218/8583.1569386710.png
Algorithms are usually heuristic
2023FJ - [email protected] 4
STEMMING ALGORITHMS
Porter stemmer
Created in 1980
Removes common endings to words
Usually, applied first as a starter point
Guarantees reproducibility
2023FJ - [email protected] 5
STEMMING ALGORITHMS…
Snowball stemmer
Also known as Porter2
Better than Porter stemmer
More aggressive than Porter stemmer
2023FJ - [email protected] 6
STEMMING ALGORITHMS…
Lancaster Stemmer
One of the most aggressive
NLTK allows to add your own rules
Can transform words into strange stems
2023FJ - [email protected] 7
STEMMING ALGORITHMS…
Regular Expression Stemmer
Allows to define a regular expression
Removes prefixes and suffixes
2023FJ - [email protected] 8
LEMMATIZATION
2023FJ - [email protected] 9
LEMMATIZATION
Involves resolving words to their
dictionary form
Requires linguistic knowledge
https://d2mk45aasx86xg.cloudfront.net/Example_to_understand_lemmatization_a73d97a04c.webp
Gives better solutions called
“Lemma”s
More complex to use
2023FJ - [email protected] 10
https://d2mk45aasx86xg.cloudfront.net/difference_between_Stemming_and_lemmatization_8_11zon_452539721d.webp
https://www.baeldung.com/wp-content/uploads/sites/4/2020/06/stemvslemma.png
2023FJ - [email protected] 11
WORDNET
LEMMATIZER
Lexical database
Used by most search engines
2023FJ - [email protected] 12
TEXTBLOB
LEMMATIZER
2023FJ - [email protected] 13
N-GRAMS
2023FJ - [email protected] 14
N-GRAM
Connected string of N elements
An element can be a word or a
smaller set (like a syllable)
Used extensively in NLP
Uses: https://images.deepai.org/django-summernote/2019-04-11/f98290ce-a9e9-48c6-8330-4e9a5fe55331.png
Text autocompletion
Auto spell check
Basic grammar check
2023FJ - [email protected] 15
USING NLTK
2023FJ - [email protected] 16
USING NLTK…
2023FJ - [email protected] 17