0% found this document useful (0 votes)
4 views1 page

NLTK - Stem NLTK - Stem: Print Print Print Print

The document compares a Porter Stemmer and WordNet Lemmatizer for natural language processing tasks. The Porter Stemmer is simpler and faster but less accurate, while the WordNet Lemmatizer is more complex and slower but more accurate by considering context.

Uploaded by

pranavi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views1 page

NLTK - Stem NLTK - Stem: Print Print Print Print

The document compares a Porter Stemmer and WordNet Lemmatizer for natural language processing tasks. The Porter Stemmer is simpler and faster but less accurate, while the WordNet Lemmatizer is more complex and slower but more accurate by considering context.

Uploaded by

pranavi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

In [1]:

from nltk.stem import PorterStemmer


from nltk.stem import WordNetLemmatizer

# Porter Stemmer
stemmer = PorterStemmer()
print("Stemmer:")
print("running ->", stemmer.stem("running")) # Output: run (correct)
print("better ->", stemmer.stem("better")) # Output: bettr (incorrect, not a real word)
print("corpora ->", stemmer.stem("corpora")) # Output: corpora (incorrect, should be corpus)

# WordNet Lemmatizer (considering nouns by default)


lemmatizer = WordNetLemmatizer()
print("\nLemmatizer:")
print("running ->", lemmatizer.lemmatize("running")) # Output: running (correct)
print("better ->", lemmatizer.lemmatize("better")) # Output: good (better as an adjective)
print("better (as adjective) ->", lemmatizer.lemmatize("better", pos="a")) # Output: better (correct)
print("corpora ->", lemmatizer.lemmatize("corpora")) # Output: corpus (correct)

Stemmer:
running -> run
better -> better
corpora -> corpora

Lemmatizer:
running -> running
better -> better
better (as adjective) -> good
corpora -> corpus

In [ ]:
#Porter Stemmer

Simpler and faster: It uses a rule-based approach to chop off suffixes from words.
Less accurate: May not always produce actual words and can lead to stemming errors. For instance, stemming "runni
ng" might result in "run" which is a valid word, but stemming "caring" might result in "car" which is not a valid
word in this context.
Doesn't consider context: Focuses solely on the word itself, ignoring its part of speech (POS) or surrounding wor
ds.
WordNet Lemmatizer

#WORDNETLEMMATIZER
More complex and slower: Relies on a lexical database (WordNet) to map words to their dictionary base forms (lemm
as).
More accurate: Aims to produce actual words that exist in the language.
Considers context (ideally): Can incorporate part-of-speech (POS) tagging to choose the most appropriate lemma (e
.g., "running" as the present participle of "run" vs "run" as a noun).

You might also like