0% found this document useful (0 votes)
8 views

Week-4 NLP 2

The document describes performing tokenization, stemming, and lemmatization on text corpora. Code is provided to tokenize, stem, and lemmatize input text using NLTK modules.

Uploaded by

Varshini Gourani
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Week-4 NLP 2

The document describes performing tokenization, stemming, and lemmatization on text corpora. Code is provided to tokenize, stem, and lemmatize input text using NLTK modules.

Uploaded by

Varshini Gourani
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

NATURAL LANGUAGE PROCESSING LAB

WEEK-4
NAME:Varshini ROLL NO:21R21A7324
BRANCH:AIML DATE:21-03-2024
PROBLEM STATEMENT:
Perform Tokenization,Stemming and Lemmatization to carry out the analysis
with the text corpora.
CODE:
import nltk
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer, WordNetLemmatizer
from nltk.corpus import stopwords
import string
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')
# Get input text from the user
text_input = input("Enter text for tokenization, stemming, and lemmatization:
")
# Tokenization
tokens = word_tokenize(text_input.lower())
print("Tokens:")
print(tokens[:20])
# Stemming
stemmer = PorterStemmer()
stemmed_tokens = [stemmer.stem(token) for token in tokens]
print("\nStemmed Tokens:")
print(stemmed_tokens[:20]) # Print the first 20 stemmed tokens as an example
# Lemmatization
lemmatizer = WordNetLemmatizer()
lemmatized_tokens = [lemmatizer.lemmatize(token) for token in tokens]
print("\nLemmatized Tokens:")
print(lemmatized_tokens[:20])
OUTPUT:

You might also like