Aiml P4
Aiml P4
Date: ……………
Practical No.4: Perform following data preprocessing on text/paragraph using NLTK
library:
a. Write a Python program to tokenize words, sentence wise.
b. Write a python program that accepts the list of tokenized word and stems
it into root word.
c. Write a program in python to identify the part of speech for each word in
the text.
d. Write a Python NLTK program to remove stop words from a given text.
e. Write a python program for identifying and correcting misspelled words
in a given text, such as an essay or a letter.
A. Objective: Learn data pre-processing using NLTK library to write the python
program.
B. Expected Program Outcomes (POs): PO1, PO2, PO3, PO4, PO5, PO6, PO7
C. Expected Skills to be developed based on competency:
1. https://www.nltk.org/
2. https://realpython.com/nltk-nlp-python/
Here is program logic for a Python program that utilizes NLTK for various NLP tasks:
1. Import the necessary modules and libraries:
• nltk for NLP functionalities
• Specific modules like PorterStemmer or WordNetLemmatizer for word
stemming or lemmatization
2. Define functions for each task:
• Tokenization:
• Use word_tokenize() to tokenize words Use
sent_tokenize() to tokenize sentences
• Word Stemming:
• Initialize a stemmer object (e.g., PorterStemmer()) Use the
stemmer's stem() function to stem each word
• Part-of-Speech (POS) Tagging:
• Use pos_tag() to get POS tags for each word
• Stop Words Removal:
• Use stopwords.words() to get a list of stopwords for a specific
language
• Filter out the stopwords from the tokenized words
• Misspelled Words Correction:
• Initialize a spell checker object (e.g., SpellChecker())
• Use the spell checker's correction() function to correct misspelled
words
3. Get user input or load text from a file.
Foundation of AI and ML (4351601) 216120316055
# Tokenization
deftokenize_words(text):
return word_tokenize(text)
deftokenize_sentences(text):
return sent_tokenize(text)
# Tokenization words =
tokenize_words(text) sentences =
tokenize_sentences(text)
I. Resources/Equipment Required
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
Foundation of AI and ML (4351601) 216120316055
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
Foundation of AI and ML (4351601) 216120316055
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
Foundation of AI and ML (4351601) 216120316055
M. References / Suggestions
1. https://www.geeksforgeeks.org/machine-learning/
2. https://www.geeksforgeeks.org/natural-language-processing-nlp-tutorial/
3. https://www.tutorialspoint.com/machine_learning_with_python/index.htm
N. Assessment-Rubrics
Total Exceptional Satisfactory (4 Developing (2- Limited
Criteria
Marks (5- Marks) to 3 -Marks) Marks) (1-Mark)