0% found this document useful (0 votes)

941 views12 pages

NLP - (Natural Language Processing Lab Manual)

Uploaded by

Durgempudi Sri Sai Krishna Reddy 222010323019

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

941 views12 pages

NLP - (Natural Language Processing Lab Manual)

Uploaded by

Durgempudi Sri Sai Krishna Reddy 222010323019

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

NLP- (NATURAL LANGUAGE PROCESSING LAB MANUAL)

EXPERIMENT-1

Implementing a regular expression to parse all the plural words

Aim: Write a program to implement a regular expression to parse all the plural words

Code:

import re
plural = 'foods'
match = re.search(r's$', plural)
if match is None:
print("Parsing is not successful and is not a plural word")
else:
print("Parsing is successful and is a plural word")

Output:

Parsing is successful and is a plural word

Conclusion:

Program to implement a regular expression to parse all the plural words is successful
EXPERIMENT – 2

Split the text into words and remove the special characters and punctuations

Aim: Write a program to split the text into words and remove the special characters and
punctuations from the text.

Code:

punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
my_str = "Hi!!!,..I am Roochita and my gmail is [email protected]."
no_punct = ""
for char in my_str:
if char not in punctuations:
no_punct = no_punct + char
print(no_punct)
print(no_punct.split())

Output:

['Hi', 'I', 'am', 'Roochita', 'and', 'my', 'gmail', 'is',

'221910310047gitamin']

Conclusion:
A program to split the text into words and remove the special characters and punctuations from
the text is successful.
EXPERIMENT – 3

Stemming and Lemmatization

Aim: Write a program for stemming and lemmatization of the root word of the words present in
a text.
Code:
Stemming

from nltk.stem import PorterStemmer

from nltk.tokenize import sent_tokenize, word_tokenize

words = ["gitam","university","sunflower","beautiful"]
ps = PorterStemmer()
for w in words:
rootWord = ps.stem(w)
print(rootWord)

Output:

gitam

univers

sunflow

Beauti
Lemmatization

from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()
import nltk
ltk.download("wordnet")
nltk.download("omw-1.4")
lemmatizer.lemmatize("foods")

Output:

True

Food

Conclusion:
Performed stemming and lemmatization of the root word of the words present in text.
EXPERIMENT – 4

Bi-Gram Model and Add-one Smoothing

Aim: Write a program to build a bi-gram model from the words present in the text. Apply the
add-one smoothing to smooth the probability distribution of the bi-gram model.

Code:

def readData():
data = ['This is a dog','This is a cat','I love my cat','This is my name ']
dat=[]
for i in range(len(data)):
for word in data[i].split():
dat.append(word)
print(dat)
return dat

def createBigram(data):
listOfBigrams = []
bigramCounts = {}
unigramCounts = {}
for i in range(len(data)-1):
if i < len(data) - 1 and data[i+1].islower():

listOfBigrams.append((data[i], data[i + 1]))

if (data[i], data[i+1]) in bigramCounts:

bigramCounts[(data[i], data[i + 1])] += 1
else:
bigramCounts[(data[i], data[i + 1])] = 1

if data[i] in unigramCounts:
unigramCounts[data[i]] += 1
else:
unigramCounts[data[i]] = 1
return listOfBigrams, unigramCounts, bigramCounts

def calcBigramProb(listOfBigrams, unigramCounts, bigramCounts):

listOfProb = {}
for bigram in listOfBigrams:
word1 = bigram[0]
word2 = bigram[1]
listOfProb[bigram] =(bigramCounts.get(bigram))/(unigramCounts.get(word1))
return listOfProb
if __name__ == '__main__':
data = readData()
listOfBigrams, unigramCounts, bigramCounts = createBigram(data)

print("\n All the possible Bigrams are ")

print(listOfBigrams)

print("\n Bigrams along with their frequency ")

print(bigramCounts)

print("\n Unigrams along with their frequency ")

print(unigramCounts)

bigramProb = calcBigramProb(listOfBigrams, unigramCounts, bigramCounts)

print("\n Bigrams along with their probability ")

print(bigramProb)
inputList="This is my cat"
splt=inputList.split()
outputProb1 = 1
bilist=[]
bigrm=[]

for i in range(len(splt) - 1):

if i < len(splt) - 1:

bilist.append((splt[i], splt[i + 1]))

print("\n The bigrams in given sentence are ")

print(bilist)
for i in range(len(bilist)):
if bilist[i] in bigramProb:

outputProb1 *= bigramProb[bilist[i]]
else:

outputProb1 *= 0
print('\n' + 'Probablility of sentence \"This is my cat\" = ' + str(outputProb1))

Output:
['This', 'is', 'a', 'dog', 'This', 'is', 'a', 'cat', 'I', 'love', 'my',
'cat', 'This', 'is', 'my', 'name']
All the possible Bigrams are
[('This', 'is'), ('is', 'a'), ('a', 'dog'), ('This', 'is'), ('is', 'a'),
('a', 'cat'), ('I', 'love'), ('love', 'my'), ('my', 'cat'), ('This',
'is'), ('is', 'my'), ('my', 'name')]

Bigrams along with their frequency

{('This', 'is'): 3, ('is', 'a'): 2, ('a', 'dog'): 1, ('a', 'cat'): 1,
('I', 'love'): 1, ('love', 'my'): 1, ('my', 'cat'): 1, ('is', 'my'): 1,
('my', 'name'): 1}

Unigrams along with their frequency

{'This': 3, 'is': 3, 'a': 2, 'dog': 1, 'cat': 2, 'I': 1, 'love': 1, 'my':
2}

Bigrams along with their probability

{('This', 'is'): 1.0, ('is', 'a'): 0.6666666666666666, ('a', 'dog'): 0.5,
('a', 'cat'): 0.5, ('I', 'love'): 1.0, ('love', 'my'): 1.0, ('my', 'cat'):
0.5, ('is', 'my'): 0.3333333333333333, ('my', 'name'): 0.5}

The bigrams in given sentence are

[('This', 'is'), ('is', 'my'), ('my', 'cat')]

Probablility of sentence "This is my cat" = 0.16666666666666666

Conclusion:

Built a Bi-Gram model from the words present in the text.

Applied the add-one smoothing to smooth the probability distribution of the bi-gram model.

N-Gram Model

Aim: Write a program to build a N-Gram model from the words present in the text.

Code:

# Method 1
from nltk import ngrams
gvn_str = "Hello Welcome to Python Programs"
gvn_n_val = 2
splt_lst = gvn_str.split()
rslt_n_grms = ngrams(splt_lst, gvn_n_val)
for itr in rslt_n_grms:
print(itr)
Output:

('Hello', 'Welcome')
('Welcome', 'to')
('to', 'Python')
('Python', 'Programs')

# Method 2

from nltk import ngrams

gvn_str = input("Enter some random string = ")
gvn_n_val = int(input("Enter some random number(n) = "))
splt_lst = gvn_str.split()
rslt_n_grms = ngrams(splt_lst, gvn_n_val)
for itr in rslt_n_grms:
print(itr)

Output:

Enter some random string = qwwusu xfycxgv xyufqv

Enter some random number(n) = 3
('qwwusu', 'xfycxgv', 'xyufqv')

Conclusion:

Built a N-Gram model from the words present in the text.

EXPERIMENT – 5

HMM Parts of Speech Tagging

Aim: Write to program for constructing an HMM part of speech tagger.

Code:

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize
stop_words = set(stopwords.words('english'))

txt = "The Natural Language Toolkit NLTK is a platform used for building programs
for text analysis. One of the more powerful aspects of the NLTK module is the
Part of Speech tagging."

tokenized = sent_tokenize(txt)
for i in tokenized:
wordsList = nltk.word_tokenize(i)
wordsList = [w for w in wordsList if not w in stop_words]
tagged = nltk.pos_tag(wordsList)
print(tagged)

Output:

[('The', 'DT'), ('Natural', 'NNP'), ('Language', 'NNP'), ('Toolkit',

'NNP'), ('NLTK', 'NNP'), ('platform', 'NN'), ('used', 'VBN'), ('building',
'NN'), ('programs', 'NNS'), ('text', 'JJ'), ('analysis', 'NN'), ('.',
'.')]
[('One', 'CD'), ('powerful', 'JJ'), ('aspects', 'NNS'), ('NLTK', 'NNP'),
('module', 'NN'), ('Part', 'NNP'), ('Speech', 'NNP'), ('tagging', 'NN'),
('.', '.')]

Conclusion:

Constructed an HMM part of Speech Tagger.

EXTRA EXPERIMENTS

 Stemming

Aim: Perform Stemming using Various Stemmers.

Code:

import nltk
from nltk.stem import PorterStemmer
from nltk.stem import LancasterStemmer
from nltk.stem import RegexpStemmer
from nltk.stem import SnowballStemmer

porter = PorterStemmer()
words =
['Connects','Connecting','Connections','Connected','Connection','Connectings','C
onnect']
for word in words:
print(word,"--->",porter.stem(word))

Output:

Connects ---> connect

Connecting ---> connect
Connections ---> connect
Connected ---> connect
Connection ---> connect
Connectings ---> connect
Connect ---> connect

snowball = SnowballStemmer(language='english')
words = ['generous','generate','generously','generation']
for word in words:
print(word,"--->",snowball.stem(word))

Output:

generous ---> generous

generate ---> generat
generously ---> generous
generation ---> generat

lancaster = LancasterStemmer()
words = ['eating','eats','eaten','puts','putting']
for word in words:
print(word,"--->",lancaster.stem(word))

Output:

eating ---> eat

eats ---> eat
eaten ---> eat
puts ---> put
putting ---> put

regexp = RegexpStemmer('ing$|s$|e$|able$', min=4)

words = ['mass','was','bee','computer','advisable']
for word in words:
print(word,"--->",regexp.stem(word))

Output:

mass ---> mas

was ---> was
bee ---> bee
computer ---> computer
advisable ---> advis

Conclusion:

Performed Stemming using various Stemmers.

 Counting Length, no of words, no of spaces in a text

Aim: Write a program to count the length of text, Number of words in a text and Number of
spaces in a text

Code:
st = input("Enter String : ")
print(st)

Output:

Enter String : sdfghj w yug e66tugu8

sdfghj w yug e66tugu8

#char spaces
count = 0
for i in st:
if i == " ":
count += 1
print(count)
Output:
3
#no of words
stk = []
s = ""
for i in st:
if i==" ":
stk.append(s)
s=""
continue
if s:
stk.append(s)
print(stk)

Output:
['', '', '']
4

#no of chars
print("Total number of chars : ",len(st))

Output:

Total number of chars : 21

 Pull a random string from a text file

Aim: Write a program to pull a random string from a text file.

Code:

import random
f = open('a.txt','r')
txt = f.read()
r = random.randint(0,len(txt))
r1 = random.randint(0,r)
print(txt[r1:r])
f.close()

Output:

dcuk cwtqvxg vwuqvukxvd vxuqvwyx

Conclusion:

Pulled a random string from a text file.

KIE Diploma in Civil Engineering
No ratings yet
KIE Diploma in Civil Engineering
393 pages
Ai&Ml Bai601 NLP Lab Manual
No ratings yet
Ai&Ml Bai601 NLP Lab Manual
48 pages
R16 Question Papers Flat
No ratings yet
R16 Question Papers Flat
12 pages
It - (R22) - 2-2 - Automata and Compiler Design - Digital Notes - (2023-24)
No ratings yet
It - (R22) - 2-2 - Automata and Compiler Design - Digital Notes - (2023-24)
64 pages
Brackets Lesson For Coding and Programming by Slidesgo
No ratings yet
Brackets Lesson For Coding and Programming by Slidesgo
57 pages
UPDATED - HGDML - ALL QUIZ QUESTIONS and ANSWERS v2.3.1
100% (1)
UPDATED - HGDML - ALL QUIZ QUESTIONS and ANSWERS v2.3.1
15 pages
Cse Flat Digital Notes Full 2020 21
No ratings yet
Cse Flat Digital Notes Full 2020 21
195 pages
N Gram Presentation
No ratings yet
N Gram Presentation
29 pages
Welding Machine Specifications PDF
0% (1)
Welding Machine Specifications PDF
4 pages
7 Exp
No ratings yet
7 Exp
6 pages
NLP Lab Manual 3-2 Aiml R22 Update
100% (1)
NLP Lab Manual 3-2 Aiml R22 Update
20 pages
NLP Unit III
No ratings yet
NLP Unit III
17 pages
Atcd Unit-3 PDF
No ratings yet
Atcd Unit-3 PDF
31 pages
Account Statement 1 Sep 2024 To 21 Mar 2025
No ratings yet
Account Statement 1 Sep 2024 To 21 Mar 2025
8 pages
A7 NLP Exp2
No ratings yet
A7 NLP Exp2
11 pages
NLP Sem Questions and Answers
No ratings yet
NLP Sem Questions and Answers
72 pages
CISSP Mindmap
No ratings yet
CISSP Mindmap
24 pages
API ISCAN-LITE Scanner
No ratings yet
API ISCAN-LITE Scanner
4 pages
Optimal Decision in Games
No ratings yet
Optimal Decision in Games
68 pages
R Art 42999-10
No ratings yet
R Art 42999-10
5 pages
Daa Question Bank
0% (1)
Daa Question Bank
8 pages
Panel Kapasitor Bank-Model - PDF 1
No ratings yet
Panel Kapasitor Bank-Model - PDF 1
1 page
Programming Techniques For Turing Machine Construction
No ratings yet
Programming Techniques For Turing Machine Construction
31 pages
Comprehensive FLAT Question Bank
100% (1)
Comprehensive FLAT Question Bank
13 pages
CCS369 - TSS-Unit 3
No ratings yet
CCS369 - TSS-Unit 3
55 pages
NLP Assignment-1 Solution
No ratings yet
NLP Assignment-1 Solution
4 pages
MSS-SP-25 (2013) PDF
67% (3)
MSS-SP-25 (2013) PDF
31 pages
NLP Assignment-4 Solution
100% (1)
NLP Assignment-4 Solution
5 pages
23-04-2024 Tuesday Educational Information and o
No ratings yet
23-04-2024 Tuesday Educational Information and o
2 pages
Recloser-Fuse Coordination of Radial Distribution Systems in Presence of DG: Analysis, Simulation Studies, & An Adaptive Relaying Scheme
No ratings yet
Recloser-Fuse Coordination of Radial Distribution Systems in Presence of DG: Analysis, Simulation Studies, & An Adaptive Relaying Scheme
31 pages
Question Bank Sybbi It Sem 3 2024-25
No ratings yet
Question Bank Sybbi It Sem 3 2024-25
2 pages
NLP Notes
No ratings yet
NLP Notes
18 pages
5.hyperparameters and Validation Sets (C)
No ratings yet
5.hyperparameters and Validation Sets (C)
3 pages
NI Serial Hardware Specifications PDF
No ratings yet
NI Serial Hardware Specifications PDF
62 pages
Network Engineer - Praneesha Martha
No ratings yet
Network Engineer - Praneesha Martha
4 pages
Disease Prediction With Android Application: Shagun Patial, Shashwat Agarwal, Shruti Pathak, Prabhat Verma
No ratings yet
Disease Prediction With Android Application: Shagun Patial, Shashwat Agarwal, Shruti Pathak, Prabhat Verma
6 pages
FlexRig Fleet International
No ratings yet
FlexRig Fleet International
2 pages
Heuristic Search: Dr.M. Nagaratna Professor, Dept - of CSE Jntuceh
No ratings yet
Heuristic Search: Dr.M. Nagaratna Professor, Dept - of CSE Jntuceh
54 pages
R20 Iii-Ii ML Lab Manual
100% (1)
R20 Iii-Ii ML Lab Manual
79 pages
AI QB For All 5 Units - 2 Marks
No ratings yet
AI QB For All 5 Units - 2 Marks
28 pages
11 em Acc Public MLM
No ratings yet
11 em Acc Public MLM
11 pages
1-NLP - Lab Manual
No ratings yet
1-NLP - Lab Manual
15 pages
Machine Learning Unit 5
No ratings yet
Machine Learning Unit 5
43 pages
Laboratory Record Note Book: Amity University Chhattisgarh
No ratings yet
Laboratory Record Note Book: Amity University Chhattisgarh
21 pages
Manual
No ratings yet
Manual
64 pages
Rohini 56509347058
No ratings yet
Rohini 56509347058
4 pages
STM Question Paper R18
No ratings yet
STM Question Paper R18
2 pages
Sponge Evaporation 1.2 2021-08-17
No ratings yet
Sponge Evaporation 1.2 2021-08-17
10 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
Vtu NLP Questions
100% (1)
Vtu NLP Questions
5 pages
Assessment 2 Task 2-3-43g1gbwp
100% (1)
Assessment 2 Task 2-3-43g1gbwp
9 pages
Enrrique Gomez, El Ingeniero Tercermundista
No ratings yet
Enrrique Gomez, El Ingeniero Tercermundista
3 pages
Dbms Lab Manual II Cse II Sem
No ratings yet
Dbms Lab Manual II Cse II Sem
58 pages
Unit I - QB
100% (1)
Unit I - QB
3 pages
Api 5C
No ratings yet
Api 5C
12 pages
Qualcomm 213
No ratings yet
Qualcomm 213
28 pages
Natural Language Processing
No ratings yet
Natural Language Processing
37 pages
General Architecture of Text Mining Systems
No ratings yet
General Architecture of Text Mining Systems
6 pages
Reading Passage
No ratings yet
Reading Passage
2 pages
FSED 27F Places of Assembly Occupancy Checklist Rev01
No ratings yet
FSED 27F Places of Assembly Occupancy Checklist Rev01
4 pages
Introduction To: Energy Modelling & Building Simulation
No ratings yet
Introduction To: Energy Modelling & Building Simulation
14 pages
Question Bank For Ai
0% (1)
Question Bank For Ai
2 pages
R18 CSM 3-2 Devops
No ratings yet
R18 CSM 3-2 Devops
28 pages
LM7 Approximate Inference in BN
No ratings yet
LM7 Approximate Inference in BN
18 pages
Cp4252-Machine Learning Lab Manual 23-24
No ratings yet
Cp4252-Machine Learning Lab Manual 23-24
28 pages
CD Unitwise Imp Questions
100% (1)
CD Unitwise Imp Questions
5 pages
Undecidable Problems For Recursively Enumerable Languages: Continued
No ratings yet
Undecidable Problems For Recursively Enumerable Languages: Continued
54 pages
Studocu DAA Unit 1 Notes
No ratings yet
Studocu DAA Unit 1 Notes
52 pages
Program-1 Write A Program in C To Create Two Sets and Perform The Union Operation On Sets
No ratings yet
Program-1 Write A Program in C To Create Two Sets and Perform The Union Operation On Sets
22 pages
NLP ORAL - Sample Question Bank: Modul e No. Sr. No - Description
No ratings yet
NLP ORAL - Sample Question Bank: Modul e No. Sr. No - Description
9 pages
ANFIS
No ratings yet
ANFIS
42 pages
Unit 5 1
No ratings yet
Unit 5 1
18 pages
First and Follow Set
86% (7)
First and Follow Set
5 pages
CSE4022 Natural-Language-Processing ETH 1 AC41
No ratings yet
CSE4022 Natural-Language-Processing ETH 1 AC41
6 pages
Model Question Paper
0% (1)
Model Question Paper
2 pages
Unit-3-Second Chapter
No ratings yet
Unit-3-Second Chapter
9 pages
Theory of Computation
50% (2)
Theory of Computation
1 page
Price List Toshiba - Januari 2011
No ratings yet
Price List Toshiba - Januari 2011
3 pages
LP I ML Viva Questions
100% (1)
LP I ML Viva Questions
9 pages
Recursively Enumerable Languages
No ratings yet
Recursively Enumerable Languages
8 pages
Lexical Analysis: (Section 3.3)
100% (1)
Lexical Analysis: (Section 3.3)
3 pages
XXX
No ratings yet
XXX
2 pages
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
No ratings yet
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
7 pages
Compiler Design Model Lab Questions
No ratings yet
Compiler Design Model Lab Questions
4 pages
Pune University Soft Computing Exam Papers
No ratings yet
Pune University Soft Computing Exam Papers
4 pages
DAA UNIT 4 - Final
No ratings yet
DAA UNIT 4 - Final
12 pages
Web Lab Question Bank
100% (1)
Web Lab Question Bank
2 pages
DAA Question Bank
No ratings yet
DAA Question Bank
9 pages
Assignment No. 1 Class: T.E. Computer Subject: Theory of Computation
No ratings yet
Assignment No. 1 Class: T.E. Computer Subject: Theory of Computation
6 pages
CS6612 Compiler Lab Manual
100% (4)
CS6612 Compiler Lab Manual
60 pages

NLP - (Natural Language Processing Lab Manual)

Uploaded by

NLP - (Natural Language Processing Lab Manual)

Uploaded by

NLP- (NATURAL LANGUAGE PROCESSING LAB MANUAL)

Implementing a regular expression to parse all the plural words

Parsing is successful and is a plural word

['Hi', 'I', 'am', 'Roochita', 'and', 'my', 'gmail', 'is',

Stemming and Lemmatization

from nltk.stem import PorterStemmer

from nltk.stem import WordNetLemmatizer

Bi-Gram Model and Add-one Smoothing

listOfBigrams.append((data[i], data[i + 1]))

if (data[i], data[i+1]) in bigramCounts:

def calcBigramProb(listOfBigrams, unigramCounts, bigramCounts):

print("\n All the possible Bigrams are ")

print("\n Bigrams along with their frequency ")

print("\n Unigrams along with their frequency ")

bigramProb = calcBigramProb(listOfBigrams, unigramCounts, bigramCounts)

print("\n Bigrams along with their probability ")

for i in range(len(splt) - 1):

bilist.append((splt[i], splt[i + 1]))

print("\n The bigrams in given sentence are ")

Bigrams along with their frequency

Unigrams along with their frequency

Bigrams along with their probability

The bigrams in given sentence are

Probablility of sentence "This is my cat" = 0.16666666666666666

Built a Bi-Gram model from the words present in the text.

from nltk import ngrams

Enter some random string = qwwusu xfycxgv xyufqv

Built a N-Gram model from the words present in the text.

HMM Parts of Speech Tagging

Aim: Write to program for constructing an HMM part of speech tagger.

[('The', 'DT'), ('Natural', 'NNP'), ('Language', 'NNP'), ('Toolkit',

Constructed an HMM part of Speech Tagger.

Aim: Perform Stemming using Various Stemmers.

Connects ---> connect

generous ---> generous

eating ---> eat

regexp = RegexpStemmer('ing$|s$|e$|able$', min=4)

mass ---> mas

Performed Stemming using various Stemmers.

Enter String : sdfghj w yug e66tugu8

Total number of chars : 21

Aim: Write a program to pull a random string from a text file.

dcuk cwtqvxg vwuqvukxvd vxuqvwyx

Pulled a random string from a text file.

You might also like