0% found this document useful (0 votes)
62 views17 pages

Stemming and Lemmatization

The document discusses stemming and lemmatization, which are used in natural language processing to reduce words to their root form. Stemming is the simplest approach and reduces words to stems, which may not be the true root. Lemmatization requires linguistic knowledge to identify the correct root word or "lemma". The document provides examples of stemming and lemmatization algorithms like the Porter stemmer and WordNet lemmatizer that are commonly used in NLP tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views17 pages

Stemming and Lemmatization

The document discusses stemming and lemmatization, which are used in natural language processing to reduce words to their root form. Stemming is the simplest approach and reduces words to stems, which may not be the true root. Lemmatization requires linguistic knowledge to identify the correct root word or "lemma". The document provides examples of stemming and lemmatization algorithms like the Porter stemmer and WordNet lemmatizer that are commonly used in NLP tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

STEMMING AND LEMMATIZATION

INTRODUCTION
Used in Natural Language Processing Stemming and Lemmatization reduce a
(NLP) word to its root or core.

Help to understand that some ”different” Usually is a pre-processing step for NLP
words come from the same core root
Example:
­ Eaten
­ Eating
­ Eats

2023FJ - [email protected] 2
STEMMING
2023FJ - [email protected] 3
STEMMING
Easiest approach

Words are reduced to their


word stems

A stem may not be the same as


the root of the word
https://devopedia.org/images/article/218/8583.1569386710.png

Algorithms are usually heuristic

2023FJ - [email protected] 4
STEMMING ALGORITHMS
Porter stemmer
Created in 1980
Removes common endings to words

Usually, applied first as a starter point


Guarantees reproducibility

2023FJ - [email protected] 5
STEMMING ALGORITHMS…
Snowball stemmer
Also known as Porter2
Better than Porter stemmer

More aggressive than Porter stemmer

2023FJ - [email protected] 6
STEMMING ALGORITHMS…
Lancaster Stemmer
One of the most aggressive
NLTK allows to add your own rules

Can transform words into strange stems

2023FJ - [email protected] 7
STEMMING ALGORITHMS…
Regular Expression Stemmer
Allows to define a regular expression

Removes prefixes and suffixes

2023FJ - [email protected] 8
LEMMATIZATION
2023FJ - [email protected] 9
LEMMATIZATION
Involves resolving words to their
dictionary form

Requires linguistic knowledge

https://d2mk45aasx86xg.cloudfront.net/Example_to_understand_lemmatization_a73d97a04c.webp

Gives better solutions called


“Lemma”s

More complex to use

2023FJ - [email protected] 10
https://d2mk45aasx86xg.cloudfront.net/difference_between_Stemming_and_lemmatization_8_11zon_452539721d.webp

https://www.baeldung.com/wp-content/uploads/sites/4/2020/06/stemvslemma.png

2023FJ - [email protected] 11
WORDNET
LEMMATIZER
Lexical database

Used by most search engines

2023FJ - [email protected] 12
TEXTBLOB
LEMMATIZER

2023FJ - [email protected] 13
N-GRAMS
2023FJ - [email protected] 14
N-GRAM
Connected string of N elements

An element can be a word or a


smaller set (like a syllable)

Used extensively in NLP

Uses: https://images.deepai.org/django-summernote/2019-04-11/f98290ce-a9e9-48c6-8330-4e9a5fe55331.png

­ Text autocompletion
­ Auto spell check
­ Basic grammar check

2023FJ - [email protected] 15
USING NLTK

2023FJ - [email protected] 16
USING NLTK…

2023FJ - [email protected] 17

You might also like