0% found this document useful (0 votes)

13 views27 pages

NLP FinAL

Uploaded by

Aishwarya Sonawane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views27 pages

NLP FinAL

Uploaded by

Aishwarya Sonawane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Mrunal Aher Natural Language Processing Roll No: 1

Deccan Education Society’s

Kirti M. Doongursee College of Arts, Science and Commerce
[AUTONOMOUS]

M.Sc. [Information Technology]

Practical Journal

Course Name: Natural Language Processing

Seat Number [ ]

(Academic Year 2022-2023)

Department of Computer Science and Information Technology

Mrunal Aher Natural Language Processing Roll No: 1

Department of Computer Science and Information Technology

Deccan Education Society’s
Kirti M. Doongursee College of Arts, Science and Commerce
[AUTONOMOUS]

CERTIFICATE

This is to certify that Miss. Aishwarya Nandkishore Sonawane of M.Sc. (CS) with
Seat No. 23
has complete 8 Practical of Paper-( Course Name- Natural Language
Processing) under mysupervision in this College during the Fourth Semester of academic
year 2022-2023.

Prof. Jaymala Deshpande Dr. Apurva Yadav

H.O.D.
Lecturer-In-Charge Department of
Computer Science & IT

Date: / /2023 Date:

Examined by: Remarks:

Date:
INDEX

Sr. No. Practical Sign

1 A. Convert the given text to speech.
B. Convert audio file Speech to Text.
2 A. Create and use your own corpora (plaintext, categorical)

B. Study of tagged corpora with methods like tagged_sents,

tagged_words.
C. Map Words to Properties Using Python Dictionaries.
3 A. Study DefaultTagger,
B. Study UnigramTagger
4 A. Study of Wordnet Dictionary with methods as synsets,
definitions, examples, antonyms
B. Write a program using python to find synonym and antonym
of word "active" using Wordnet.
C. Compare two nouns
5 A. Tokenization using Python’s split() function
B. Tokenization using Regular Expressions (RegEx)
C. Tokenization using Keras
6 A. Named Entity recognition using user defined text.

B. Named Entity recognition with diagram using NLTK corpus

– treebank.
7 A. Define grammar using nltk. Analyze a sentence using the
same.

B. Implementation of Deductive Chart Parsing using context free

grammar and a given sentence.
8 Study PorterStemmer, LancasterStemmer, RegexpStemmer,
SnowballStemmer Study WordNetLemmatizer
Practical 1(A)

Aim: Convert the given text to speech.

Program:
from playsound import playsound

from gtts import gTTS

mytext="happy birthday to you"

language="en"

myobj=gTTS(text=mytext,lang=language,slow=False)

myobj.save("myfile.mp3")

playsound("myfile.mp3")

Output:
welcomeNLP.mp3 audio file is getting created and it plays the file with playsound() method,
while running the program
Practical 1(B)

Aim: Convert audio file Speech to Text.

Program:
import speech_recognition as sr

filename="C:/Users/kcmlab cs/Desktop/NLP PRACS/kirti.wav"

r=sr.Recognizer()

with sr.AudioFile(filename)as source:

audio_data=r.record(source)

text=r.recognize_google(audio_data)

print(text)

Output:
Practical 2(A)

Aim: Create and use your own corpora (plaintext, categorical)

Program:
import nltk

from nltk.corpus import PlaintextCorpusReader

corpus_root = 'C:/Users/kcmlab cs/Desktop/NLP PRACS'

filelist = PlaintextCorpusReader(corpus_root, '.*')

print ('\n File list: \n')

print (filelist.fileids())

print (filelist.root)

'''display other information about each text, by looping over all the values of fileid

corresponding to the filelist file identifiers listed earlier and then computing statistics

for each text.'''

print ('\n\nStatistics for each text:\n')

print ('AvgWordLen\tAvgSentenceLen\tno.ofTimesEachWordAppearsOnAvg\tFileName')

for fileid in filelist.fileids():

num_chars = len(filelist.raw(fileid))

num_words = len(filelist.words(fileid))

num_sents = len(filelist.sents(fileid))

num_vocab = len(set([w.lower() for w in filelist.words(fileid)]))

print (int(num_chars/num_words),'\t\t\t', int(num_words/num_sents),'\t\t\t',

int(num_words/num_vocab),'\t\t', fileid
Output:
Practical 2(B)
Aim: Study of tagged corpora with methods like tagged_sents, tagged_words.

Program:
import nltk

from nltk import tokenize

nltk.download('punkt')

nltk.download('words')

para = "Hello! My name is Beena Kapadia. Today you'll be learning NLTK."

sents = tokenize.sent_tokenize(para)

print("\nsentence tokenization\n===================\n",sents)

# word tokenization

print("\nword tokenization\n===================\n")

for index in range(len(sents)):

words = tokenize.word_tokenize(sents[index])

print(words)

Output:
Practical 2(C)
Aim: Map Words to Properties Using Python Dictionaries

Program:
thisdict= {

"brand":"Mercedes",

"model": "G-Class",

"year":1964

print(thisdict)

print(thisdict["brand"])

print(len(thisdict))

print(type(thisdict))

Output:
Practical 3(A)
Aim: Study Default Tagger

Program:
import nltk

from nltk.tag import DefaultTagger

exptagger=DefaultTagger('NN')

from nltk.corpus import treebank

testsentences=treebank.tagged_sents()[1000:]

print(exptagger.evaluate(testsentences))

import nltk

from nltk.tag import DefaultTagger

exptagger=DefaultTagger

exptagger=DefaultTagger('NN')

print(exptagger.tag_sents([['Hey',','],['How','are','you','?']]))

Output:
Practical 3(B)
Aim: Study Unigram Tagger

Program:
# Loading Libraries

from nltk.tag import UnigramTagger

from nltk.corpus import treebank

# Training using first 10 tagged sentences of the treebank corpus as data.

# Using data

train_sents = treebank.tagged_sents()[:10]

# Initializing

tagger = UnigramTagger(train_sents)

# Lets see the first sentence

# (of the treebank corpus) as list

print(treebank.sents()[0])

print('\n',tagger.tag(treebank.sents()[0]))

#Finding the tagged results after training.

tagger.tag(treebank.sents()[0])

#Overriding the context model

tagger = UnigramTagger(model ={'Pierre': 'NN'})

print('\n',tagger.tag(treebank.sents()[0]))
Output:
Practical 4(A)
Aim: Study of Wordnet Dictionary with methods as synsets, definitions, examples, antonyms

Program:
'''WordNet provides synsets which is the collection of synonym words also called

“lemmas”'''

import nltk

from nltk.corpus import wordnet

print(wordnet.synsets("computer"))

# definition and example of the word ‘computer’

print(wordnet.synset("computer.n.01").definition())

#examples

print("Examples:", wordnet.synset("computer.n.01").examples())

#get Antonyms

print(wordnet.lemma('buy.v.01.buy').antonyms())

Output:
Practical 4(B)
Aim: Write a program using python to find synonym and antonym of word "active" using Wordnet.

Program:
from nltk.corpus import wordnet

print( wordnet.synsets("active"))

print(wordnet.lemma('active.a.01.active').antonyms())

Output:
Practical 4(C)
Aim: Compare two nouns

Program:
import nltk

from nltk.corpus import wordnet

syn1 = wordnet.synsets('football')

syn2 = wordnet.synsets('soccer')

# A word may have multiple synsets, so need to compare each synset of word1 with synset of word2

for s1 in syn1:

for s2 in syn2:

print("Path similarity of: ")

print(s1, '(', s1.pos(), ')', '[', s1.definition(), ']')

print(s2, '(', s2.pos(), ')', '[', s2.definition(), ']')

print(" is", s1.path_similarity(s2))

print()

Output:
Practical 5(A)

Aim: Tokenization using Python’s split() function

Program:
text = """ This tool is an a beta stage. Alexa developers can use Get Metrics API to seamlessly analyse
metric. It also supports custom skill model, prebuilt Flash Briefing model, and the Smart Home Skill API.
You can use this tool for creation of monitors, alarms, and dashboards that spotlight changes. The
release of these three tools will enable developers to create visual rich skills for Alexa devices with
screens. Amazon describes these tools as the collection of tech and tools for creating visually rich and
interactive voice experiences. """

data = text.split('.')

for i in data:

print (i)

Output:
Practical 5(B)

Aim: Tokenization using Regular Expressions (RegEx)

Program:
import nltk

# import RegexpTokenizer() method from nltk

from nltk.tokenize import RegexpTokenizer

# Create a reference variable for Class RegexpTokenizer

tk = RegexpTokenizer('\s+', gaps = True)

# Create a string input

str = "I love to study CHATGPT 4"

# Use tokenize method

tokens = tk.tokenize(str)

print(tokens)

Output:
Practical 5(C)

Aim: Tokenization using Keras

Program:
import keras

from keras.preprocessing.text import text_to_word_sequence

# Create a string input

str = "I love to study Chat GPT 4"

# tokenizing the text

tokens = text_to_word_sequence(str)

print(tokens)

Output:
Practical 6(A)

Aim: Named Entity recognition using user defined text.

Program:
import spacy

# Load English tokenizer, tagger, parser and NER

nlp = spacy.load("en_core_web_sm")

# Process whole documents

text = ("When Sebastian Thrun started working on self-driving cars at "

"Google in 2007, few people outside of the company took him "

"seriously. “I can tell you very senior CEOs of major American "

"car companies would shake my hand and turn away because I wasn’t "

"worth talking to,” said Thrun, in an interview with Recode earlier "

"this week.")

doc = nlp(text)

# Analyse syntax

print("Noun phrases:", [chunk.text for chunk in doc.noun_chunks])

print("Verbs:", [token.lemma_ for token in doc if token.pos_ == "VERB"])

Output:
Practical 6(B)

Aim: Named Entity recognition with diagram using NLTK corpus – treebank.

Program:
import nltk

nltk.download('treebank')

from nltk.corpus import treebank_chunk

treebank_chunk.tagged_sents()[0]

treebank_chunk.chunked_sents()[0]

treebank_chunk.chunked_sents()[0].draw()

Output:
Practical 7(A)

Aim: Define grammar using nltk. Analyze a sentence using the same

Program:
import nltk

from nltk import tokenize

grammar1 = nltk.CFG.fromstring("""

S -> VP

VP -> VP NP

NP -> Det NP

Det -> 'that'

NP -> singular Noun

NP -> 'flight'

VP -> 'Book'

""")

sentence = "Book that flight"

for index in range(len(sentence)):

all_tokens = tokenize.word_tokenize(sentence)

print(all_tokens)

parser = nltk.ChartParser(grammar1)

for tree in parser.parse(all_tokens):

print(tree)

tree.draw()
Output:
Practical 7(B)

Aim: Implementation of Deductive Chart Parsing using context free grammar and a given
sentence.

Program:
import nltk

from nltk import tokenize

grammar1 = nltk.CFG.fromstring("""

S -> NP VP

PP -> P NP

NP -> Det N | Det N PP | 'I'

VP -> V NP | VP PP

Det -> 'a' | 'my'

N -> 'bird' | 'balcony'

V -> 'saw'

P -> 'in'

""")

sentence = "I saw a bird in my balcony"

for index in range(len(sentence)):

all_tokens = tokenize.word_tokenize(sentence)

print(all_tokens)

# all_tokens = ['I', 'saw', 'a', 'bird', 'in', 'my', 'balcony']

parser = nltk.ChartParser(grammar1)

for tree in parser.parse(all_tokens):

print(tree)

tree.draw()
Output:
Practical 8

Aim: Study PorterStemmer, LancasterStemmer, RegexpStemmer, SnowballStemmer Study

WordNetLemmatizer

Program:
print('PorterStemmer')

import nltk

from nltk.stem import PorterStemmer

word_stemmer = PorterStemmer()

print(word_stemmer.stem('writing'))

print('LancasterStemmer')

import nltk

from nltk.stem import LancasterStemmer

Lanc_stemmer = LancasterStemmer()

print(Lanc_stemmer.stem('writing'))

print('RegexpStemmer')

import nltk

from nltk.stem import RegexpStemmer

Reg_stemmer = RegexpStemmer('ing$|s$|e$|able$', min=4)

print(Reg_stemmer.stem('writing'))

print('SnowballStemmer')

import nltk

from nltk.stem import SnowballStemmer

english_stemmer = SnowballStemmer('english')

print(english_stemmer.stem ('writing'))

print('WordNetLemmatizer')

from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()

print("word :\tlemma")

print("rocks :", lemmatizer.lemmatize("rocks"))

print("corpora :", lemmatizer.lemmatize("corpora"))

# a denotes adjective in "pos"

print("better :", lemmatizer.lemmatize("better", pos ="a"))

Output:

EF4e Uppint Filetest 5a
100% (6)
EF4e Uppint Filetest 5a
7 pages
Scope Statement For The Time Table Generation System For Thapar University
60% (5)
Scope Statement For The Time Table Generation System For Thapar University
4 pages
REPORT Contour
100% (3)
REPORT Contour
7 pages
Natural Language Toolkit NLTK PDF
No ratings yet
Natural Language Toolkit NLTK PDF
23 pages
Alternative Delivery Mode Learning Resource Standards (Reviewer's Copy) I. Background
No ratings yet
Alternative Delivery Mode Learning Resource Standards (Reviewer's Copy) I. Background
32 pages
A Beginner's Guide To Natural Language Processing - IBM Developer
No ratings yet
A Beginner's Guide To Natural Language Processing - IBM Developer
9 pages
Shubham Jade MSC It 31031420010 NLP Practical Journal
No ratings yet
Shubham Jade MSC It 31031420010 NLP Practical Journal
17 pages
NLP - Practical List
No ratings yet
NLP - Practical List
14 pages
ASTW RA03 PracticalManual
No ratings yet
ASTW RA03 PracticalManual
18 pages
Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
NLP Manual (1-12) 1
No ratings yet
NLP Manual (1-12) 1
56 pages
2 - 6N302 Natural Language Processing
No ratings yet
2 - 6N302 Natural Language Processing
6 pages
Session2 3
No ratings yet
Session2 3
18 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
33 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
25 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
Deep and Surface Learning PDF
No ratings yet
Deep and Surface Learning PDF
1 page
NLP Record
No ratings yet
NLP Record
15 pages
NLP Record
No ratings yet
NLP Record
6 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
Guidance Mandatory Competence Attainment Report (v7) Final 04072012
No ratings yet
Guidance Mandatory Competence Attainment Report (v7) Final 04072012
8 pages
Dsbdal A7
No ratings yet
Dsbdal A7
65 pages
NLP Notes and Related Questions
No ratings yet
NLP Notes and Related Questions
7 pages
CCS369 - Text and Speech Analysis
No ratings yet
CCS369 - Text and Speech Analysis
31 pages
SK NLP Practical (FS)
No ratings yet
SK NLP Practical (FS)
22 pages
NLP Manual
No ratings yet
NLP Manual
15 pages
NLP Lab1
No ratings yet
NLP Lab1
6 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
01 NLP - Merged Vinay
No ratings yet
01 NLP - Merged Vinay
27 pages
Jurnal Manajemen Strategi Agribisnis Jessica Halaman 74 - 87
No ratings yet
Jurnal Manajemen Strategi Agribisnis Jessica Halaman 74 - 87
46 pages
Complete Notes of Bme
No ratings yet
Complete Notes of Bme
250 pages
Plucker and Callahan 2014
No ratings yet
Plucker and Callahan 2014
17 pages
NLP Smitpatel
No ratings yet
NLP Smitpatel
32 pages
Mitutoyo - Przenośny Twardościomierz Leeb HH-411 - 2006 EN
No ratings yet
Mitutoyo - Przenośny Twardościomierz Leeb HH-411 - 2006 EN
2 pages
NLTK Cheatsheet
No ratings yet
NLTK Cheatsheet
27 pages
Econometrics Problem Set
No ratings yet
Econometrics Problem Set
5 pages
Jal Patel NLP
No ratings yet
Jal Patel NLP
32 pages
Nistgcr10 917 8 PDF
No ratings yet
Nistgcr10 917 8 PDF
268 pages
NLP Lecture2 Text Pre Processing
No ratings yet
NLP Lecture2 Text Pre Processing
54 pages
NLP Practicals
No ratings yet
NLP Practicals
6 pages
Final NLP Lab File
No ratings yet
Final NLP Lab File
28 pages
DS-ZF - 400 - A Gear Box For Volvo Penta d13
No ratings yet
DS-ZF - 400 - A Gear Box For Volvo Penta d13
4 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
21 pages
NLP Lab Programs
No ratings yet
NLP Lab Programs
18 pages
Work Measurement Techniques Methods Types
No ratings yet
Work Measurement Techniques Methods Types
5 pages
NLP Op
No ratings yet
NLP Op
16 pages
Prof Ed 2023 New Curriculum
No ratings yet
Prof Ed 2023 New Curriculum
17 pages
NW NSC GR 11 Maths Lit P1 Eng Memo Nov 2019
No ratings yet
NW NSC GR 11 Maths Lit P1 Eng Memo Nov 2019
7 pages
NLP Practicals All
No ratings yet
NLP Practicals All
57 pages
Introduction EMT357-upload Ver
No ratings yet
Introduction EMT357-upload Ver
19 pages
Q. No Sub Q.No Answer: (Autonomous)
No ratings yet
Q. No Sub Q.No Answer: (Autonomous)
23 pages
Administering Questionnaires: Group Viii: 19608 Shilpa Dha KAL 19626 Kusum Yonzan Lama
No ratings yet
Administering Questionnaires: Group Viii: 19608 Shilpa Dha KAL 19626 Kusum Yonzan Lama
25 pages
MIL 11 - 12 Q3 0102 What Is Media and Information Literacy PS
No ratings yet
MIL 11 - 12 Q3 0102 What Is Media and Information Literacy PS
14 pages
Natural Language Processing With Python's NLTK Package - Real Python
No ratings yet
Natural Language Processing With Python's NLTK Package - Real Python
27 pages
NLP Lab - Manual
No ratings yet
NLP Lab - Manual
33 pages
CH 6
No ratings yet
CH 6
19 pages
Assignment/Tugasan: Kod Kursus /course Code: EBTQ 3103 Tajuk Kursus /course Title: Quality Control
No ratings yet
Assignment/Tugasan: Kod Kursus /course Code: EBTQ 3103 Tajuk Kursus /course Title: Quality Control
6 pages
Liebert Psa5 500 1500va User Guide - 00
No ratings yet
Liebert Psa5 500 1500va User Guide - 00
26 pages
R22 NLP Python Programs
No ratings yet
R22 NLP Python Programs
15 pages
NLP - Course EDC 1 29
No ratings yet
NLP - Course EDC 1 29
29 pages
Hazop Ip
No ratings yet
Hazop Ip
117 pages
OB Biruktawit Zegeye
No ratings yet
OB Biruktawit Zegeye
6 pages
Foot-Surface-Structure Analysis Using A Smartphone-Based 3D Foot Scanner
No ratings yet
Foot-Surface-Structure Analysis Using A Smartphone-Based 3D Foot Scanner
7 pages
(English-Vietnamese) Bạn có nhiều hơn một cuộc đời - Marc Levy - Have A Sip EP98 (DownSub.com)
No ratings yet
(English-Vietnamese) Bạn có nhiều hơn một cuộc đời - Marc Levy - Have A Sip EP98 (DownSub.com)
46 pages
NLP
No ratings yet
NLP
12 pages
NLP File
No ratings yet
NLP File
21 pages
NLP Pratical
No ratings yet
NLP Pratical
14 pages
NLP Record
No ratings yet
NLP Record
16 pages
NLP Assignment (917722H031)
No ratings yet
NLP Assignment (917722H031)
18 pages
AIPT LAB 24-25 MANUAL EXPE 4 To8
No ratings yet
AIPT LAB 24-25 MANUAL EXPE 4 To8
15 pages
Python NLP Assignment
No ratings yet
Python NLP Assignment
9 pages
NLP Record
No ratings yet
NLP Record
23 pages
Leave Application For The Death in The Family
No ratings yet
Leave Application For The Death in The Family
1 page
C24064 - NLP - Lab Manual
No ratings yet
C24064 - NLP - Lab Manual
28 pages
CCS369-Text and Speech Analysis Lab (1-9)
No ratings yet
CCS369-Text and Speech Analysis Lab (1-9)
37 pages
123 NLP 456
No ratings yet
123 NLP 456
4 pages
DS 7
No ratings yet
DS 7
3 pages
Robotic Gripper Using Four Bar Mechanism
No ratings yet
Robotic Gripper Using Four Bar Mechanism
54 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
19 pages
8.design and Analysis of A Conformal MIMO Ingestible Bolus Sensor Antenna For Wireless Capsule Endoscopy For Animal Husbandry
No ratings yet
8.design and Analysis of A Conformal MIMO Ingestible Bolus Sensor Antenna For Wireless Capsule Endoscopy For Animal Husbandry
9 pages
Tinywow Pythass3 77951173
No ratings yet
Tinywow Pythass3 77951173
17 pages
NLP Notebook
No ratings yet
NLP Notebook
20 pages
NLP - Record (Weeks 1-12)
No ratings yet
NLP - Record (Weeks 1-12)
41 pages
Tsa Labmanual
No ratings yet
Tsa Labmanual
26 pages
NLP
No ratings yet
NLP
29 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
13 pages
CSE 3652 Lab Record Format - PDF
No ratings yet
CSE 3652 Lab Record Format - PDF
13 pages
NLP Lab
No ratings yet
NLP Lab
7 pages