0% found this document useful (0 votes)

13 views33 pages

NLP Lab - Manual

The document is a practical record for the Master of Science in Computer Science program at Vivekanandha College, focusing on Natural Language Processing (NLP) lab activities for the semester 2024-2026. It includes various NLP tasks such as tokenization, stemming, lemmatization, sentiment analysis, and data extraction, along with example programs and outputs. The document serves as a guide for students to document their practical work and is intended for submission during university practical examinations.

Uploaded by

Boomika G

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views33 pages

NLP Lab - Manual

Uploaded by

Boomika G

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

VIVEKANANDHA

COLLEGE OF ARTS AND SCIENCES FOR WOMEN

(AUTONOMOUS)
(AnISO9001:2015Certified Institution;
Affiliated to Periyar University, Salem, Approved by AICTE,
Re-accredited with“A++”Grade by NAAC, Recognized U/S 12(B), 2(f)ofUGCAct,1956)
Elayampalayam,Tiruchengode-637205.

MASTER OF SCIENCE IN COMPUTER SCIENCE

PRACTICAL RECORD

NAME :

REG.NO :

NATURAL LANGUAGE PROCESSING LAB

(24P2CSEP01)

SEMESTER–II

2024-2026
VIVEKANANDHA
COLLEGE OF ARTS AND SCIENCES FOR WOMEN
(AUTONOMOUS)
AnISO9001:2015CertifiedInstitution
(Affiliated to Periyar University-Salem, Approved by AICTE,
Reaccredited with “A++”Grade by NAAC, RecognizedU/S12(B), 2(f)ofUGCAct1956)
Elayampalayam , Tiruchengode-637205.

MASTER OF SCIENCE IN COMPUTER SCIENCE

Certified that this is a bonafide record of practical work done by

Ms/Mrs Reg. No: in the NATURAL

LANGUAGE PROCESSING LAB (24P2CSEP01) at the Vivekanandha College of Arts and

Sciences for Women (Autonomous), Elayampalayam, Tiruchengode.

Staff In-Charge Head of the Department

Submitted for the University Practical Examinations held on at PG

and Research Department of Computer Science and Applications, Vivekanandha

College of Arts and Sciences for Women (Autonomous), Elayampalayam,

Tiruchengode.

Internal Examiner External Examiner

INDEX
S.NO DATE CONTENTS PAGE SIGN
NO.

1
Tokenize a given text

2 Sentences of a text document

Tokenize text with stop words as

3 delimiters

Remove stop words and

4 punctuations in a text

A. Perform Stemming
5 B. Lemmatize a given Text

6 Extract Usernames from Email

Common words in text excluding

7 stop words

8 Spell correction in a given text

Classify A Text as Positive/Negative

9 Sentiment

10 Root word of any word in a sentence

a) load the iris data from a given csv

11 file into a dataframe
b) Extract Noun and Verb phrases
from a text
sets of synonyms and antonyms of a
12 given word
Print the first 15 random combine
13 labeled male and labeled female
names from names corpus
PROGRAM

1. Tokenize a text

from nltk.tokenize import word_tokenize, sent_tokenize

import nltk

nltk.download('punkt') # Download tokenizer data

# Example text
text = "NLP makes machines understand language. Tokenization is the first step."

# Sentence Tokenization
print("Sentences:", sent_tokenize(text))

# Word Tokenization
print("Words:", word_tokenize(text))
OUTPUT
PROGRAM

2. Sentences of a text document

from nltk.tokenize import sent_tokenize

import nltk

nltk.download('punkt') # Download tokenizer data

# Read the text from a file

file_path = "example.txt" # Replace with your file path
with open(file_path, 'r') as file:
text = file.read()

# Sentence Tokenization
sentences = sent_tokenize(text)

# Display the sentences

print("Sentences in the document:")
for i, sentence in enumerate(sentences, 1):
print(f"{i}: {sentence}")
save a text file as example.txt in jupyter notebook

OUTPUT
PROGRAM

3. Tokenize text with stop words as delimiters

from nltk.tokenize import word_tokenize

from nltk.corpus import stopwords
import nltk
# Download necessary data
nltk.download('punkt')
nltk.download('stopwords')
# Example text
text = "I enjoy learning Python and coding."
# Define stop words
stop_words = set(stopwords.words('english'))
# Tokenize the text
words = word_tokenize(text)
# Tokenize using stop words as delimiters
tokens_without_stopwords = [word for word in words if word.lower() not in
stop_words]
# Output the result
print("Original Tokens:", words)
print("Tokens without Stop Words:", tokens_without_stopwords)
OUTPUT
PROGRAM

4. Remove stop words and punctuations in a text

from nltk.tokenize import word_tokenize

from nltk.corpus import stopwords
import string
import nltk

# Download necessary data

nltk.download('punkt')
nltk.download('stopwords')

# Example text
text = "Python is great! It's simple and powerful."

# Define stop words

stop_words = set(stopwords.words('english'))

# Tokenize the text

words = word_tokenize(text)

# Remove stop words and punctuation

tokens_cleaned = [word for word in words if word.lower() not in stop_words and
word not in string.punctuation]

# Output the result

print("Tokens without Stop Words and Punctuation:", tokens_cleaned)
OUTPUT
PROGRAM

5. A. Perform Stemming

# import these modules

from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize

ps = PorterStemmer()

# choose some words to be stemmed

words = ["pythonprogramming", "programs", "programmer", "event", "thankyou"]

for w in words:
print(w, " : ", ps.stem(w))
OUTPUT
PROGRAM

5. B. Lemmatize A Given Text

from nltk.tokenize import word_tokenize

from nltk.stem import WordNetLemmatizer
import nltk

# Download necessary resources

nltk.download('punkt')
nltk.download('wordnet')

def lemmatize_text(text):
lemmatizer = WordNetLemmatizer()
tokens = word_tokenize(text)
lemmatized_text = ' '.join([lemmatizer.lemmatize(word) for word in tokens])
return lemmatized_text

text = "The cats are chasing mice and playing in the garden"
lemmatized_text = lemmatize_text(text)

print("Original Text:", text)

print("Lemmatized Text:", lemmatized_text)
OUTPUT
PROGRAM
6. Extract Usernames from Email

# Using regular expression

import re
# Defining an email string
e = "[email protected]"
# Using the search function to find a match for the domain part of the email
match = re.search(r'@([a-zA-Z0-9.-]+)', e)
# If a match is found, extracting the domain part (the part after '@') using the group()
method
if match:
domain = match.group(1)
print(domain)
OUTPUT
PROGRAM
7. Find the most common words in the text excluding stop words

import nltk
from nltk.corpus import stopwords
from collections import Counter
import string
# Download stopwords (only needed once)
nltk.download("stopwords")

def most_common_words(text, n=10):

stop_words = set(stopwords.words("english")) # Load stop words
words = text.lower().translate(str.maketrans("", "", string.punctuation)).split() #
Convert to lowercase & remove punctuation
filtered_words = [word for word in words if word not in stop_words] # Remove stop
words
word_counts = Counter(filtered_words) # Count words
return word_counts.most_common(n) # Get most common words
# Example text
text = "This is a simple example text. This text is just for testing the most common
words."
# Get the top 5 most common words
result = most_common_words(text, 5)
# Print result
print(result)
OUTPUT
PROGRAM

8. Spell correction in a given text

# list of incorrect spellings

# that need to be corrected
incorrect_words=['happpy', 'azmaing', 'intelliengt']

# loop for finding correct spellings

# based on jaccard distance
# and printing the correct word
for word in incorrect_words:
temp = [(jaccard_distance(set(ngrams(word, 2)),
set(ngrams(w, 2))),w)
for w in correct_words if w[0]==word[0]]
print(sorted(temp, key = lambda val:val[0])[0][1])
PROGRAM

9. Classify A Text as Positive/Negative Sentiment

from textblob import TextBlob

text_1 = "The movie was so awesome." text_2
= "The food here tastes terrible."

#Determining the Polarity

p_1 = TextBlob(text_1).sentiment.polarity
p_2 = TextBlob(text_2).sentiment.polarity

#Determining the Subjectivity

s_1 = TextBlob(text_1).sentiment.subjectivity s_2
= TextBlob(text_2).sentiment.subjectivity

print("Polarity of Text 1 is", p_1)

print("Polarity of Text 2 is", p_2)
print("Subjectivity of Text 1 is", s_1)
print("Subjectivity of Text 2 is", s_2)
PROGRAM

10. Find the ROOT word of any word in a sentence

from nltk.stem.porter import PorterStemmer

stemmer = PorterStemmer()
words = ["renting", "renter", "rental", "rents", "apple"]
all_rents = {}
for word in words:
stem = stemmer.stem(word)
if stem not in all_rents:
all_rents[stem] = []
all_rents[stem].appen
d(word) else:
all_rents[stem].append(word)
print(all_rents)
OUTPUT

{'rent': ['renting', 'rents'], 'renter': ['renter'], 'rental': ['rental'], 'appl': ['apple']}

PROGRAM

11. a) load the iris data from a given csv file into a dataframe and print
the shape of the data, type of the data and first 3 rows.

import pandas as pd
data =
pd.read_csv("iris.csv")
print("Shape of the
data:") print(data.shape)
print("\nData Type:")
print(type(data))
print("\nFirst 3 rows:")
print(data.head(3))
OUTPUT
Shape of the data:
(150, 6)

Data Type:
<class 'pandas.core.frame.DataFrame'>

First 3 rows:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species

0 1 5.1 3.5 1.4 0.2 Iris-setosa

1 2 4.9 3.0 1.4 0.2 Iris-setosa
2 3 4.7 3.2 1.3 0.2 Iris-setosa
PROGRAM

11.b) Extract Noun and Verb phrases from a text

import nltk

from nltk.tokenize import word_tokenize

from nltk import pos_tag, RegexpParser

nltk.download('punkt')

nltk.download('averaged_perceptron_tagger')

def chunk_sentence(sentence):

words = word_tokenize(sentence) # Tokenize words

tagged_words = pos_tag(words) # Perform POS tagging

# Define grammar for chunking

grammar = r"""

NP: {<DT|JJ|NN.*>+} # Chunk sequences of DT, JJ, NN

PP: {<IN><NP>} # Chunk prepositions followed by NP

VP: {<VB.*><NP|PP|CLAUSE>+$} # Chunk verbs and their arguments

CLAUSE: {<NP><VP>} # Chunk NP, VP pair """

parser = RegexpParser(grammar) # Create a chunk parser

chunked_sentence = parser.parse(tagged_words) # Apply parsing

return chunked_sentence

# Example sentence

sentence = "The quick brown fox jumps over the lazy dog"

# Perform chunking

chunked_sentence = chunk_sentence(sentence)

# Print chunked result

print(chunked_sentence)

# Optional: Draw chunk tree (Only works in GUI-supported environments)

chunked_sentence.draw()
OUTPUT
PROGRAM

12. Write a Python NLTK program to find the sets of synonyms and
antonyms of a given word.

def synonym_antonym_extractor(phrase):

from nltk.corpus import wordnet

synonyms = []

antonyms = []

for syn in wordnet.synsets(phrase):

for l in syn.lemmas():

synonyms.append(l.name())

if l.antonyms():

antonyms.append(l.antonyms()[0].name())

print(set(synonyms))

print(set(antonyms))

synonym_antonym_extractor(phrase="word")
OUTPUT
PROGRAM

13. Print the first 15 random combine labeled male and labeled female
names from names corpus.

from nltk.corpus import

names import random
male_names =
names.words('male.txt')
female_names =
names.words('female.txt')
labeled_male_names = [(str(name), 'male') for name in
male_names] labeled_female_names = [(str(name), 'female') for
name in female_names] # combine labeled male and labeled female
names
labeled_all_names = labeled_male_names +
labeled_female_names # shuffle the labeled names array
random.shuffle(labeled_all_names)

print("First 15 random labeled combined

names:") print (labeled_all_names[:15])
OUTPUT

Generated 100
0% (1)
Generated 100
1,531 pages
NLP Lab Manual (R20)
50% (2)
NLP Lab Manual (R20)
24 pages
Python With Data Science
No ratings yet
Python With Data Science
102 pages
Register Organization of 8086 PDF
100% (1)
Register Organization of 8086 PDF
10 pages
Ccs369 - Text and Speech Analysis - Lab Manual
100% (1)
Ccs369 - Text and Speech Analysis - Lab Manual
23 pages
DC-30 - System Recovery Guide - V2.0 - EN
No ratings yet
DC-30 - System Recovery Guide - V2.0 - EN
12 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
21 pages
Lathe Controller 990
No ratings yet
Lathe Controller 990
128 pages
Latch and Flipflops
100% (1)
Latch and Flipflops
9 pages
NLP-Lab Manual - Ashwini - Kachare
No ratings yet
NLP-Lab Manual - Ashwini - Kachare
41 pages
Usask Thesis Defense
100% (3)
Usask Thesis Defense
5 pages
1 CNC Press Break
No ratings yet
1 CNC Press Break
27 pages
Download: Solutions Intermediate Progress Tests Unit 1answer
No ratings yet
Download: Solutions Intermediate Progress Tests Unit 1answer
2 pages
Module 5
No ratings yet
Module 5
69 pages
Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
Lab 2
No ratings yet
Lab 2
49 pages
20BCP112 - NLP Lab - LAB - Manual
No ratings yet
20BCP112 - NLP Lab - LAB - Manual
65 pages
NLP Lab Work
No ratings yet
NLP Lab Work
34 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
32 pages
Removing Stopwords in NLP
No ratings yet
Removing Stopwords in NLP
32 pages
For Assignment-10 (Machine Learning With Python - NLP-2)
No ratings yet
For Assignment-10 (Machine Learning With Python - NLP-2)
37 pages
AutoCAD Cheat Sheet
No ratings yet
AutoCAD Cheat Sheet
2 pages
Natural Language Processing Lab Manual
No ratings yet
Natural Language Processing Lab Manual
24 pages
Wsma Final Manual
No ratings yet
Wsma Final Manual
58 pages
Tsarecord
No ratings yet
Tsarecord
22 pages
C Programming Full
No ratings yet
C Programming Full
93 pages
NLB Final Lab Manual
No ratings yet
NLB Final Lab Manual
23 pages
Log
No ratings yet
Log
25 pages
Sigma Personal Voice Assistance Mid - Defence - Report
No ratings yet
Sigma Personal Voice Assistance Mid - Defence - Report
27 pages
CSE 3652 Lab Record Format - PDF
No ratings yet
CSE 3652 Lab Record Format - PDF
13 pages
1 - Write A Python Program To Perform Following Tasks On Text A) Tokenization
No ratings yet
1 - Write A Python Program To Perform Following Tasks On Text A) Tokenization
13 pages
Complete ID3 Decision Tree
No ratings yet
Complete ID3 Decision Tree
15 pages
Final NLP Lab File
No ratings yet
Final NLP Lab File
28 pages
Tinywow Pythass3 77951173
No ratings yet
Tinywow Pythass3 77951173
17 pages
NLPPractical
No ratings yet
NLPPractical
12 pages
NLP Lab File
No ratings yet
NLP Lab File
13 pages
Jal Patel NLP
No ratings yet
Jal Patel NLP
32 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
17 pages
Lab - Manual - IR - BE AI&DS CL II
No ratings yet
Lab - Manual - IR - BE AI&DS CL II
38 pages
NLP Pratical
No ratings yet
NLP Pratical
14 pages
Sahil NLP
No ratings yet
Sahil NLP
16 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
19 pages
DSBA+Master+Codebook+ +Text+Mining+&+TSF
No ratings yet
DSBA+Master+Codebook+ +Text+Mining+&+TSF
11 pages
AI Lab Manual Aktu
No ratings yet
AI Lab Manual Aktu
11 pages
Experiment: 1
No ratings yet
Experiment: 1
28 pages
NLP Smitpatel
No ratings yet
NLP Smitpatel
32 pages
Date: Practical No.4:: Foundation of AI and ML (4351601)
No ratings yet
Date: Practical No.4:: Foundation of AI and ML (4351601)
10 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
Journal 1
No ratings yet
Journal 1
9 pages
8-Queen Problem
No ratings yet
8-Queen Problem
2 pages
1a NLTK
No ratings yet
1a NLTK
10 pages
Aiml P4
No ratings yet
Aiml P4
12 pages
DSBD 7 Ass
No ratings yet
DSBD 7 Ass
9 pages
Lab1 IR
No ratings yet
Lab1 IR
14 pages
Fundamentals-Of-Tally - Erp9 - 2
No ratings yet
Fundamentals-Of-Tally - Erp9 - 2
18 pages
Text Preprocessing For NLP
No ratings yet
Text Preprocessing For NLP
15 pages
SK NLP Practical (FS)
No ratings yet
SK NLP Practical (FS)
22 pages
SV TB Example 1671687516
No ratings yet
SV TB Example 1671687516
30 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
54 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
Python NLP Assignment
No ratings yet
Python NLP Assignment
9 pages
NLP Lab1
No ratings yet
NLP Lab1
6 pages
NLP
No ratings yet
NLP
12 pages
The Vadalian Issue 1
No ratings yet
The Vadalian Issue 1
12 pages
NLP Record
No ratings yet
NLP Record
15 pages
Lab DSA
No ratings yet
Lab DSA
7 pages
ID3 Decision Tree
No ratings yet
ID3 Decision Tree
5 pages
Natural Language Processing: Practical 1
No ratings yet
Natural Language Processing: Practical 1
64 pages
NLP - Practical List
No ratings yet
NLP - Practical List
14 pages
Bling
No ratings yet
Bling
7 pages
Lab2 IR
No ratings yet
Lab2 IR
16 pages
NLTK Tutorial
No ratings yet
NLTK Tutorial
33 pages
Batch 2
No ratings yet
Batch 2
13 pages
AI Practical No 9-13
No ratings yet
AI Practical No 9-13
5 pages
Composition ct2
No ratings yet
Composition ct2
25 pages
ASTW RA03 PracticalManual
No ratings yet
ASTW RA03 PracticalManual
18 pages
DSBDL Assn 07
No ratings yet
DSBDL Assn 07
4 pages
Multiple Output Power Supply
No ratings yet
Multiple Output Power Supply
15 pages
Milk Billing System Documentation
No ratings yet
Milk Billing System Documentation
1 page
Overview DIP5K/EN OS/A22 DIP 5000
No ratings yet
Overview DIP5K/EN OS/A22 DIP 5000
8 pages
Marine Panel Personal Computer Mvpc-1901: Unicont SPB LTD
No ratings yet
Marine Panel Personal Computer Mvpc-1901: Unicont SPB LTD
12 pages
Game Requirements For Venge Io (Clone)
No ratings yet
Game Requirements For Venge Io (Clone)
3 pages
Q3 Module1 G11 CSS-NCII Sison-Central-Is
No ratings yet
Q3 Module1 G11 CSS-NCII Sison-Central-Is
10 pages
Verb Charades Game Printable
No ratings yet
Verb Charades Game Printable
5 pages
A12 Route Visit Bus Report
No ratings yet
A12 Route Visit Bus Report
2 pages
Chapter 2
No ratings yet
Chapter 2
4 pages
January Budget 2021
No ratings yet
January Budget 2021
6 pages
A12-Passed Out Students Count-1
No ratings yet
A12-Passed Out Students Count-1
1 page
7.5 Effects of Layer 2 Devices On Data Flow: 7.5.1 Ethernet LAN Segmentation
No ratings yet
7.5 Effects of Layer 2 Devices On Data Flow: 7.5.1 Ethernet LAN Segmentation
9 pages
Fluttertutorial in Flutter Interview Questions
No ratings yet
Fluttertutorial in Flutter Interview Questions
20 pages
DCW20 960W DIN Rail Combo DC-UPS / DC-DC Converter: Main Features Embedded User Interface
No ratings yet
DCW20 960W DIN Rail Combo DC-UPS / DC-DC Converter: Main Features Embedded User Interface
4 pages
Face Book
No ratings yet
Face Book
2 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
Micromine Draft
No ratings yet
Micromine Draft
2 pages
Natural Language Processing
No ratings yet
Natural Language Processing
22 pages
710, Barton Centre, M G Road, Bangalore 560 001: A.O: Against Order Tax (Vat) 5% Extra
No ratings yet
710, Barton Centre, M G Road, Bangalore 560 001: A.O: Against Order Tax (Vat) 5% Extra
11 pages
Machine Learning NLP LAB Sayak Mallick
No ratings yet
Machine Learning NLP LAB Sayak Mallick
4 pages
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet