Lab Prgms Weel1-Output
Lab Prgms Weel1-Output
To get started, you need to install NLTK on your computer. Run the
following command:
import nltk
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger')
nltk.download('stopwords')
nltk.download('maxent_ne_chunker')
nltk.download('words')
/home/user/nltk_data...
...
Text Preprocessing
Text preprocessing is the practice of cleaning and preparing text data
for machine learning algorithms. The primary steps include tokenizing,
removing stop words, stemming, lemmatizing, and more.
These steps help reduce the complexity of the data and extract
meaningful information from it.
In the coming sections of this tutorial, we’ll walk you through each of
these steps using NLTK.
more."
sentences = sent_tokenize(text)
print(sentences)
words = word_tokenize(text)
print(words)
Output:
more.']
'more', '.']
Stopwords removal
In natural language processing, stopwords are words that you want to
ignore, so you filter them out when you’re processing your text.
These are usually words that occur very frequently in any text and do
not convey much meaning, such as “is”, “an”, “the”, “in”, etc.
NLTK comes with a predefined list of stopwords in several languages,
including English.
Let’s use NLTK to filter out stopwords from our list of tokenized words:
from nltk.corpus import stopwords
more."
stop_words = set(stopwords.words('english'))
words = word_tokenize(text)
in stop_words]
print(filtered_words)
Output: