0% found this document useful (0 votes)
10 views12 pages

Aiml P4

The document outlines a practical assignment for data preprocessing using the NLTK library in Python, including tasks such as tokenization, stemming, part-of-speech tagging, stop words removal, and misspelled words correction. It provides objectives, expected outcomes, prerequisites, and a detailed program logic flow for implementing the tasks. Additionally, it includes safety precautions, resources required, and assessment rubrics for evaluating student performance.

Uploaded by

smitkathrotiya17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views12 pages

Aiml P4

The document outlines a practical assignment for data preprocessing using the NLTK library in Python, including tasks such as tokenization, stemming, part-of-speech tagging, stop words removal, and misspelled words correction. It provides objectives, expected outcomes, prerequisites, and a detailed program logic flow for implementing the tasks. Additionally, it includes safety precautions, resources required, and assessment rubrics for evaluating student performance.

Uploaded by

smitkathrotiya17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Foundation of AI and ML (4351601) 216120316055

Date: ……………
Practical No.4: Perform following data preprocessing on text/paragraph using NLTK
library:
a. Write a Python program to tokenize words, sentence wise.
b. Write a python program that accepts the list of tokenized word and stems
it into root word.
c. Write a program in python to identify the part of speech for each word in
the text.
d. Write a Python NLTK program to remove stop words from a given text.
e. Write a python program for identifying and correcting misspelled words
in a given text, such as an essay or a letter.

A. Objective: Learn data pre-processing using NLTK library to write the python
program.
B. Expected Program Outcomes (POs): PO1, PO2, PO3, PO4, PO5, PO6, PO7
C. Expected Skills to be developed based on competency:

 Able to apply data preprocessing on text/paragraph using NLTK library.

D. Expected Course Outcomes(Cos)


CO4
E. Practical Outcome(PRo)
The program demonstrates the usage by tokenizing the example text and prints each
tokenized sentence.
F. Expected Affective domain Outcome(ADos)
Follow ethical practices G.
Prerequisite Theory:
NLTK (Natural Language Toolkit) is a widely used library for natural language
processing (NLP) in Python. It provides a wide range of functionalities and resources
for tasks such as tokenization, stemming, part-of-speech tagging, syntactic and
semantic analysis, and much more. Here is an overview of the key components and
capabilities of NLTK:
Foundation of AI and ML (4351601) 216120316055

• Tokenization: NLTK offers tokenization functions to break down text into


individual words or sentences. It provides methods like word_tokenize() and
sent_tokenize() to split text accordingly.
• Stemming: Stemming is the process of reducing words to their base or root form.
NLTK includes several stemmers, such as the Porter Stemmer and the Snowball
Stemmer, which can be used to perform stemming operations on words.
• Part-of-Speech (POS) Tagging: POS tagging assigns grammatical tags to words
based on their context and role in a sentence. NLTK provides the pos_tag()
function, which uses pre-trained models to identify and tag the part of speech for
each word in a given text.
• Stop Words Removal: Stop words are common words like "a," "the," "and," etc.,
that often carry little or no meaningful information. NLTK includes a corpus of
stop words for various languages. You can use this corpus to filter out stop words
from your text and focus on more relevant words.
• Named Entity Recognition (NER): NLTK offers NER capabilities to identify and
classify named entities in text, such as names of persons, organizations, locations,
and other specified categories.
• Syntax and Semantic Analysis: NLTK provides tools for syntactic and semantic
analysis, including parsing algorithms, semantic role labeling, and semantic
similarity calculations.
• WordNet: NLTK integrates WordNet, a large lexical database of English words,
which provides a rich resource for semantic relationships, synsets (groups of
synonymous words), and definitions. You can use WordNet to perform tasks like
word sense disambiguation or synonym expansion.
• Machine Learning Integration: NLTK facilitates the integration of machine
learning algorithms for various NLP tasks. It provides support for feature
extraction, classification, clustering, and other machine learning techniques.
• Corpora and Language Resources: NLTK includes numerous pre-processed
corpora and language resources for tasks like sentiment analysis, text
classification, language modeling, and more. These resources can be leveraged
to train models and perform evaluations.
NLTK is highly extensible and allows users to customize and extend its
functionalities as per their requirements. It is widely used by researchers, students,
and professionals in the NLP field due to its comprehensive set of tools, extensive
documentation, and active community support. Overall, NLTK serves as a powerful
toolkit for various NLP tasks and serves as a great starting point for developing NLP
applications in Python.

Explore more on the following link:


Foundation of AI and ML (4351601) 216120316055

1. https://www.nltk.org/
2. https://realpython.com/nltk-nlp-python/

H. Experimental set up/ Program Logic-Flow chart :

Here is program logic for a Python program that utilizes NLTK for various NLP tasks:
1. Import the necessary modules and libraries:
• nltk for NLP functionalities
• Specific modules like PorterStemmer or WordNetLemmatizer for word
stemming or lemmatization
2. Define functions for each task:
• Tokenization:
• Use word_tokenize() to tokenize words  Use
sent_tokenize() to tokenize sentences
• Word Stemming:
• Initialize a stemmer object (e.g., PorterStemmer())  Use the
stemmer's stem() function to stem each word
• Part-of-Speech (POS) Tagging:
• Use pos_tag() to get POS tags for each word
• Stop Words Removal:
• Use stopwords.words() to get a list of stopwords for a specific
language
• Filter out the stopwords from the tokenized words
• Misspelled Words Correction:
• Initialize a spell checker object (e.g., SpellChecker())
• Use the spell checker's correction() function to correct misspelled
words
3. Get user input or load text from a file.
Foundation of AI and ML (4351601) 216120316055

4. Perform the desired NLP tasks:


• Tokenize the text into words and sentences.
• Stem or lemmatize the words if required.
• Perform POS tagging on the words.
• Remove stop words from the text.
• Correct any misspelled words in the text.
5. Display the results or store them for further processing.
Here is an example program structure that incorporates these steps:
import nltk from nltk.stem import
PorterStemmer from nltk.corpus import
stopwords from nltk import pos_tag,
word_tokenize from spellchecker import
SpellChecker

# Tokenization
deftokenize_words(text):
return word_tokenize(text)
deftokenize_sentences(text):
return sent_tokenize(text)

# Word Stemming defstem_words(words):


stemmer = PorterStemmer() return
[stemmer.stem(word) for word in words] # POS
Tagging defidentify_pos(words): return
pos_tag(words)

# Stop Words Removal defremove_stop_words(words):


Foundation of AI and ML (4351601) 216120316055

stop_words = set(stopwords.words("english")) return [word for


word in words if word.lower() not in stop_words]

# Misspelled Words Correction


defcorrect_spelling(words): spell =
SpellChecker() return [spell.correction(word) for
word in words]

# Example usage text = "This is an example sentence. And


here's another one!"

# Tokenization words =
tokenize_words(text) sentences =
tokenize_sentences(text)

# Word Stemming stemmed_words =


stem_words(words)

# POS Tagging pos_tags =


identify_pos(words) # Stop
Words Removal filtered_words
= remove_stop_words(words)

# Misspelled Words Correction corrected_words


= correct_spelling(words)
Foundation of AI and ML (4351601) 216120316055

# Display the results print("Tokenized


words:", words) print("Tokenized
sentences:", sentences) print("Stemmed
words:", stemmed_words)
print("POS tags:", pos_tags) print("Filtered
words:", filtered_words)
print("Corrected words:", corrected_words)

I. Resources/Equipment Required

Sr.No. Instrument/Equipment Specification


/Components/Trainer kit

1 Computer system with Windows 7 or higher Ver., macOS, and


operating system Linux, with 4GB or higher RAM, Python
versions: 2.7.X, 3.6.X
2 Python IDEs and Code Editors jupyter, spyder, google colab, Open
Source : Anaconda Navigator

J. Safety and necessary Precautions followed


 Read the experiment thoroughly before starting and ensure that you understand
all the steps and concepts involved from underpinning theory.
 Keep the workspace clean and organized, free from clutter and unnecessary
materials.
 Use the software according to its intended purpose and instructions.
 Ensure that all the necessary equipment and software are in good working
condition.
 Never eat or drink in the lab, as it can cause contamination and create safety
hazards.
 If any accidents or injuries occur, immediately notify the instructor and seek
medical attention if necessary.
Foundation of AI and ML (4351601) 216120316055

K. Procedure to be followed/Source code


Student must use the space for writing source code. Understand and re-implement
different methods for handling data.(Exhaustive use of functions must be done)
Source Code & Output
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
Foundation of AI and ML (4351601) 216120316055

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
Foundation of AI and ML (4351601) 216120316055

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
Foundation of AI and ML (4351601) 216120316055

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
Foundation of AI and ML (4351601) 216120316055

L. Practical related Quiz.


1. Which of the following is an example of a natural language generation task?
a) Identifying named entities b) Part-of-speech tagging in a text

c) Machine translation d) Generating new text based


on input
2. Which of the following is an example of a pre-processing step in natural language
processing?
a) Creating a language model b) Identifying named entities
in a text

c) Tokenization d) Text classification

M. References / Suggestions
1. https://www.geeksforgeeks.org/machine-learning/
2. https://www.geeksforgeeks.org/natural-language-processing-nlp-tutorial/
3. https://www.tutorialspoint.com/machine_learning_with_python/index.htm

N. Assessment-Rubrics
Total Exceptional Satisfactory (4 Developing (2- Limited
Criteria
Marks (5- Marks) to 3 -Marks) Marks) (1-Mark)

Watched other students Presentinpractical


performing sessionbutnotatte
Performe Performed practical practical but not ntivelyparticipate
d with others help tried him/herself dinperformance
Engagement /5 practical
him/hers
elf

Accuracy /5 Accurately done 1-2 3-5 Morethan5errors/


mistakes committed
errors/mistakes found errors/mistakes
identified

No errors, Complete write-up and Some of the


Program is well output tables commands
Documentation /5 Executed and but presentation is missing with missing Poor write-up and
Documented poor outputs diagram or missing
Properly. content
Foundation of AI and ML (4351601) 216120316055

Fully understood Understood the Partially Partially


the performance but understood the understood and
Understanding& /5 performance cannot explain performance & can cannot give
Explanation & can explain give little explanation
perfectly explanation

Work is submitted later Work done after


than 1week 2nd week but
Time /5 Completed the but by the end of before the end of Work submitted after
work within 2nd week 3rd week 3 week time
1week

Total Marks: /25 Signature with Date:

You might also like