0% found this document useful (0 votes)

42 views41 pages

NLP-Lab Manual - Ashwini - Kachare

nlp

Uploaded by

shrutikadam-cmpn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views41 pages

NLP-Lab Manual - Ashwini - Kachare

nlp

Uploaded by

shrutikadam-cmpn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

LAB MANUAL

Subject: Natural Language Processing (CSDL7013)

Semester: VI

Prepared By: Ms. Ashwini Kachare

Lab Outcomes

LO1 Learn capabilities and limitations of current natural language technologies

LO2 Model linguistic phenomena with formal grammars
LO3 Design, implement and test algorithms for NLP problems
LO4 Apply NLP techniques to design real world NLP
applications
List of experiments

Experiment Experiment Name CO Mapping

No
1 Write a Program and Implement of LO1
Processing text Word and Sentence
Tokenization, LowerCase &
Uppercase using Python
Programming.
2 Write a Program and LO1
Implement of Processing on
Text Stop Word Removal
,Punctuations And Filtration.
3 Write a Program and Implementation of LO2
Stemming And Lemmatization

4 Write a Program and implementation of the LO2

different POS
taggers and Perform POS tagging on the text.
5 Write a Program and implement N-Gram LO3
model for the given text input.
6 Write a Program and Analysis of Exploratory LO3
data on text (Word Cloud)
7 Study and Implement Wordnet – Lesk LO3
Algorithm
8 CASE STUDY : Application of NLP- Sentiment LO4
Analysis of Real Comments in Social Media
platform
Experiment No.1

Aim: Write a Program and Implement of Processing text Word and Sentence Tokenization,
LowerCase & Uppercase using Python Programming.

Objective: To provide Students an overview of how text processing is implemented.

Theory:

Tokenization is the process by which a large quantity of text is divided into smaller parts called
tokens. These tokens are very useful for finding patterns and are considered as a base step for
stemming and lemmatization. Tokenization also helps to substitute sensitive data elements with
non-sensitive data elements.

Natural language processing is used for building applications such as Text classification,
intelligent chatbot, sentimental analysis, language translation, etc. It becomes vital to
understand the pattern in the text to achieve the above-stated purpose.

Natural Language toolkit has very important module NLTK tokenize sentences which further
comprises of sub-modules

1. Word tokenize

2. Sentence tokenize

Tokenization of words

We use the method word_tokenize() to split a sentence into words. The output of word
tokenization can be converted to Data Frame for better text understanding in machine learning
applications. It can also be provided as input for further text cleaning steps such as punctuation
removal, numeric character removal or stemming. Machine learning models need numeric data
to be trained and make a prediction. Word tokenization becomes a crucial part of the text
(string) to numeric data conversion

Tokenization of Sentences

Sub-module available for the above is sent_tokenize. An obvious question in your mind would
be why sentence tokenization is needed when we have the option of word tokenization. Imagine
you need to count average words per sentence, how you will calculate? For accomplishing such
a task, you need both NLTK sentence tokenizer as well as NLTK word tokenizer to calculate
the ratio. Such output serves as an important feature for machine training as the answer would
be numeric.
Code:

#Tokenization
text = "Hello Everyone"
import nltk
nltk.download('punkt')
from nltk.tokenize import sent_tokenize,word_tokenize
Output:
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Unzipping tokenizers/punkt.zip.
Word Tokenization
word_tokenize(text)
Output
['Hello', 'Everyone']
Sentence Tokenization

from nltk.tokenize import word_tokenize

text = "God is Great! I Won a Lottery."
print (word_tokenize(text))

Output:
['God', 'is', 'Great', '!', 'I', 'Won', 'a', 'Lottery', '.']

Screenshot:
Lower Case
Code:
import string
raw_docs = ["I am wariting some very basic english sentences", "I`m just writing it for the
demo PURPOSE to make audience understand the basics .""The Point is to learn HOW it
works_on #simple # data. "]
raw_docs = [doc.lower() for doc in
raw_docs] print(raw_docs)
Output:
['i am wariting some very basic english sentences', 'i`m just writing it for the demo purpose to
make audience understand the basics .the point is to learn how it works_on #simple # data. ']

Upper Case

Code:

Output:

Outcome: After Finishing of this Practical student will be able to understand basics Text
Processing
Experiment No.2

Aim: Write a Program and Implement of Processing on Text Stop Word Removal,Punctuations and
Filtration.

Objectives: To make students how to remove stop words from text

Theory:

English text may contain stop words like ‘the’, ‘is’, ‘are’. Stop words can be filtered from the
text to be processed. There is no universal list of stop words in NLP research, however the
NLTK module contains a list of stop words. Now you will learn how to remove stop words
using the NLTK.

Filtration:

Many of the words used in the phrase are insignificant and hold no meaning. For example –
English is a subject. Here, ‘English’ and ‘subject’ are the most significant words and ‘is’, ‘a’
are almost useless. English subject and subject English holds the same meaning even if we
remove the insignificant words – (‘is’, ‘a’).

Stop-Word

Removal Code:

import nltk
nltk.download("stopword
Output:
s")
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data] Unzipping corpora/stopwords.zip.
True

Code:

from nltk.corpus import stopwords

print(stopwords.words("english"))
Output:
['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're"]
Screenshot
Filtering

Code:

import nltk

text = "This is an example text for stopword removal and filtering. This is done using
NLTK's stopwords."
words = nltk.word_tokenize(text)

print("Unfiltered: ", words)

stopwords = nltk.corpus.stopwords.words("english")

cleaned = [word for word in words if word not in

stopwords] print("Filtered: ", cleaned)
Output:
Unfiltered: ['This', 'is', 'an', 'example', 'text', 'for', 'stopword', 'removal', 'and', 'filtering', '.',
'This', 'is', 'done', 'using', 'NLTK', "'s", 'stopwords', '.']
Filtered: ['This', 'example', 'text', 'stopword', 'removal', 'filtering', '.', 'This', 'done', 'using',
'NLTK', "'s", 'stopwords', '.']

Screenshot

Outcome: - After the practical, student have understood and Implementation of Processing
text Stop Word Removal, Filtration.
Experiment No.3

Aim: Write a Program and Implementation of Stemming and

Lemmatization Algorithm

Objective: To make student learn how stem and lemmatize words in NLP

Theory:

What is Stemming?

Stemming is a kind of normalization for words. Normalization is a technique where a set of

words in a sentence are converted into a sequence to shorten its lookup. The words which have
the same meaning but have some variation according to the context or sentence are normalized.

In another word, there is one root word, but there are many variations of the same words. For
example, the root word is "eat" and it's variations are "eats, eating, eaten and like so". In the
same way, with the help of Stemming, we can find the root word of any variations.

Example

He was riding.

He was taking the ride.

In the above two sentences, the meaning is the same, i.e., riding activity in the past. A human
can easily understand that both meanings are the same. But for machines, both sentences are
different. Thus it became hard to convert it into the same data row. In case we do not provide
the same data-set, then machine fails to predict. So it is necessary to differentiate the meaning
of each word to prepare the dataset for machine learning. And here stemming is used to
categorize the same type of data by getting its root word.

What is Lemmatization

Lemmatization is a text normalization technique used in Natural Language Processing

(NLP), that switches any kind of a word to its base root mode. Lemmatization is responsible
for grouping different inflected forms of words into the root form, having the same meaning.
Tagging systems, indexing, SEOs, information retrieval, and web search all use
lemmatization to a vast extent. Lemmatization usually involves using a vocabulary and
morphological analysis of words, removing inflectional endings, and returning the dictionary
form of a word (the lemma).
Example of Lemmation

Difference between Stemming and Lemmatization

In stemming, the end or beginning of a word is cut off, keeping common prefixes and suffixes
that can be found in inflected words in mind. Lemmatization uses dictionaries to conduct a
morphological analysis of the word and link it to its lemma. Lemmatization always returns the
dictionary meaning of the word while converting into root-form.

Stemming Code:

from nltk.stem import PorterStemmer,

LancasterStemmer porter_stemmer = PorterStemmer ()
print (porter_stemmer.stem('observing'))
print(porter_stemmer.stem('observs'))
print(porter_stemmer.stem('observe'))
Output
observ
observ
observ

lancaster_stemmer = LancasterStemmer()
print(lancaster_stemmer.stem('observing'))
print(lancaster_stemmer.stem('observs'))
print(lancaster_stemmer.stem('observe'))
Output
observ
observ
observ
Screenshot:

Lemmatization Code
from nltk.stem import WordNetLemmatize
nltk.download('wordnet')
Output
[nltk_data] Downloading package wordnet to /root/nltk_data...
True
lemmatizer = WordNetLemmatizer()
print(lemmatizer.lemmatize("running"))
print(lemmatizer.lemmatize("runs"))
output
running
run
Screenshot:

Lemmatizer-Returns verb,noun,Adverb,Adjective
form
11
def lemmtize(word):
lemmatizer = WordNetLemmatizer()
print("verb form: "+
lemmatizer.lemmatize(word,pos="v")) print("noun form:" +
lemmatizer.lemmatize(word,pos="n")) print("adverb form:"
+ lemmatizer.lemmatize(word,pos="r")) print("adjective
form:" +lemmatizer.lemmatize(word,pos="a"))
lemmtize("ears")
verb form: ears
Output
noun form:ear
adverb form:ears
adjective
form:ears

Screensh

The Following code snippet shows the comparison between stemming and
lemmatization

Code:

from nltk.stem import PorterStemmer

from nltk.stem import WordNetLemmatizer
stemmer = PorterStemmer ();
lemmatizer = WordNetLemmatizer()
print(stemmer.stem("deactivating"))
print(stemmer.stem("deactivated"))
print(stemmer.stem("deactivates"))
Output
deactiv
deactiv
deactiv

import nltk
nltk.download('wordnet')
Output
[nltk_data] Downloading package wordnet to /root/nltk_data...

print(lemmatizer.lemmatize("deactivating",pos="v")
)
print(lemmatizer.lemmatize("deactivative",pos="r"))
True
print(lemmatizer.lemmatize("deactivating",pos="n")
) Output
deactivate
deactivative
deactivatin
g

print(stemmer.stem('stones'))
print(stemmer.stem('speaking'))
print(stemmer.stem('bedroom'))
print(stemmer.stem('jokes'))
print(stemmer.stem('lisa'))
print(stemmer.stem('purple'))
Output
stone
speak
bedroom
joke
lisa
purpl

print(lemmatizer.lemmatize('stones'))
print(lemmatizer.lemmatize('speaking'))
print(lemmatizer.lemmatize('bedroom'))
print(lemmatizer.lemmatize('jokes'))
print(lemmatizer.lemmatize('lisa'))
print(lemmatizer.lemmatize('purple'))
Output
stone

speaking
bedroom
joke
lisa
purple

Screenshot:

Outcome: - After the practical, student have understood and Implementation of Stemming
and Lemmatization.
Experiment No.4

Aim: Write a Program and implementation of the different POS taggers and Perform POS
tagging on the text.

Objectives: To make student learn the POS Tagging.

Theory:

POS Tagging (Parts of Speech Tagging) is a process to mark up the words in text format for a
particular part of a speech based on its definition and context. It is responsible for text reading
in a language and assigning some specific token (Parts of Speech) to each word. It is also called
grammatical tagging.

Let's learn with a NLTK Part of Speech

example: Input: Everything to permit us.

Output: [('Everything', NN),('to', TO), ('permit', VB), ('us', PRP)]

Steps Involved in the POS tagging example:

● Tokenize text (word_tokenize)

● apply pos_tag to above step that is

nltk.pos_tag(tokenize_text) NLTK POS Tags Examples are as

Below:
What is Chunking in NLP

Chunking in NLP is a process to take small pieces of information and group them into large
units. The primary use of Chunking is making groups of "noun phrases." It is used to add
structure to the sentence by following POS tagging combined with regular expressions. The
resulted group of words are called "chunks." It is also called shallow parsing.

In shallow parsing, there is maximum one level between roots and leaves while deep parsing
comprises of more than one level. Shallow parsing is also called light parsing or chunking.
Rules for Chunking: There are no pre-defined rules, but you can combine them according to
need and requirement.

For example, you need to tag Noun, verb (past tense), adjective, and coordinating junction
from the sentence. You can use the rule as below

chunk:{<NN.?>*<VBD.?>*<JJ.?>*<CC>?}

Following table shows what the various symbol means:

Name of symbol Description

. Any character except new line
* Match 0 or more repetitions
? Match 0 or 1 repetitions

Code:

Part-Of-Speech (POS) Tagging

import nltk
nltk.download('punkt')
from nltk import word_tokenize,pos_tag
nltk.download('averaged_perceptron_tagger'
) sentence = "Book the ticket"
sentence_tokens = word_tokenize(sentence)
print(sentence_tokens)
pos_tag (sentence_tokens)
Output
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Unzipping tokenizers/punkt.zip.
True
[nltk_data] Downloading package averaged_perceptron_tagger
to [nltk_data] /root/nltk_data...
[nltk_data] Unzipping taggers/averaged_perceptron_tagger.zip.
True
['Book', 'the', 'ticket']

[('Book', 'IN'), ('the', 'DT'), ('ticket', 'NN')]

Screenshot:

Chunking-making word
phrases Code:

import nltk
text = "The clean data is important for application development."
tokens = nltk.word_tokenize(text)
print(tokens)
tag = nltk.pos_tag(tokens)
print(tag)
grammar = "NP: {<DT>?<JJ>*<NN>}"
cp =nltk.RegexpParser(grammar)
result = cp.parse(tag)
print(result)
Output:

['The', 'clean', 'data', 'is', 'important', 'for', 'application', 'development', '.']

[('The', 'DT'), ('clean', 'JJ'), ('data', 'NN'), ('is', 'VBZ'), ('important', 'JJ'), ('for', 'IN'),
('application', 'NN'), ('development', 'NN'), ('.', '.')]

(S
(NP The/DT clean/JJ
data/NN) is/VBZ
important/JJ
for/IN
(NP application/NN)
(NP development/NN)
./.)

Screenshot:

Outcome: - After the practical, student have understood and implementation of the different
POS taggers and Perform POS tagging on the text.
Experiment No.5

Aim: Write a Program and implement N-Gram model for the given text input.

Objective: To make student understand the N-Gram Model

Theory: N-grams are one of the fundamental concepts every data scientist and computer
science professional must know while working with text data. In this beginner-level tutorial,
we will learn what n-grams are and explore them on text data in Python. The objective of the
blog is to analyze different types of n-grams on the given text data and hence decide which n-
gram works the best for our data.

N-gram model predicts the most probable word that might follow this sequence. It's a
probabilistic model that's trained on a corpus of text. Such a model is useful in many NLP
applications including speech recognition, machine translation and predictive text input. An N-
gram model is built by counting how often word sequences occur in corpus text and then
estimating the probabilities. Since a simple N-gram model has limitations, improvements are
often made via smoothing, interpolation and backoff. An N-gram model is one type of a
Language Model (LM), which is about finding the probability distribution over word
sequences.

Code:

#Step 1 Install Import NLTK

import os
import nltk
import nltk.corpus
from nltk.util import
bigrams,trigrams,ngrams
nltk.download('punkt')
string ="The best and most beautiful things in the world cannot be seen or ever toched, they
must be felt with the heart"
quotes_tokens =nltk.word_tokenize(string)
quotes_tokens
quotes_bigrams = list(nltk.bigrams(quotes_tokens))
quotes_bigrams
quotes_trigrams = list(nltk.trigrams(quotes_tokens))
quotes_trigrams
quotes_ngrams = list(nltk.ngrams(quotes_tokens,5))
Output:
quotes_ngrams
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Package punkt is already up-to-date!
True
['The',
'best',
'and',
'most',
'beautiful'
, 'things',
'in',
'the',
'world',
'can',
'not',
'be',
'seen',
'or',
'ever',
'toched',
',',
'they',
'must',
'be',
'felt',
'with',
'the',
'heart']
[('The', 'best'), ('best', 'and'), ('and', 'most'), ('most', 'beautiful'), ('beautiful', 'things'), ('things',
'in'), ('in', 'the'), ('the', 'world'), ('world', 'can'), ('can', 'not'), ('not', 'be'), ('be', 'seen'), ('seen',
'or'), ('or', 'ever'), ('ever', 'toched'), ('toched', ','), (',', 'they'), ('they', 'must'), ('must', 'be'), ('be',
'felt'), ('felt', 'with'), ('with', 'the'), ('the', 'heart')]
[('The', 'best', 'and'), ('best', 'and', 'most'), ('and', 'most', 'beautiful'), ('most', 'beautiful', 'things'),
('beautiful', 'things', 'in'), ('things', 'in', 'the'), ('in', 'the', 'world'), ('the', 'world', 'can'), ('world',
'can', 'not'), ('can', 'not', 'be'), ('not', 'be', 'seen'), ('be', 'seen', 'or'), ('seen', 'or', 'ever'), ('or', 'ever',
'toched'), ('ever', 'toched', ','), ('toched', ',', 'they'), (',', 'they', 'must'), ('they', 'must', 'be'), ('must',
'be', 'felt'), ('be', 'felt', 'with'), ('felt', 'with', 'the'), ('with', 'the', 'heart')]

[('The', 'best', 'and', 'most', 'beautiful'),

('best', 'and', 'most', 'beautiful', 'things'),
('and', 'most', 'beautiful', 'things', 'in'),
('most', 'beautiful', 'things', 'in', 'the'),
('beautiful', 'things', 'in', 'the', 'world'),
('things', 'in', 'the', 'world', 'can'),
('in', 'the', 'world', 'can', 'not'),
('the', 'world', 'can', 'not', 'be'),
('world', 'can', 'not', 'be', 'seen'),
('can', 'not', 'be', 'seen', 'or'),
('not', 'be', 'seen', 'or', 'ever'),
('be', 'seen', 'or', 'ever', 'toched'),
('seen', 'or', 'ever', 'toched', ','),
('or', 'ever', 'toched', ',', 'they'),
('ever', 'toched', ',', 'they', 'must'),
('toched', ',', 'they', 'must', 'be'),
(',', 'they', 'must', 'be', 'felt'),
('they', 'must', 'be', 'felt', 'with'),
('must', 'be', 'felt', 'with', 'the'),
('be', 'felt', 'with', 'the', 'heart')]

Screenshot:
Outcome: - After the practical, student have understood and implement N-Gram model for
the given text input.
Experiment No. 06

Aim: Write a Program and Analysis of Exploratory data on text (Word Cloud)

Objective: To make student understand the Exploratory data on text (Word Cloud)

Theory:

Exploratory Data Analysis is the process of exploring data, generating insights, testing
hypotheses, checking assumptions and revealing underlying hidden patterns in the data.

There are no shortcuts in a machine learning project lifecycle. We can’t simply skip to the
model building stage after gathering the data. We need to plan our approach in a structured
manner and the exploratory data analytics (EDA) stage plays a huge part in that.

We need to perform investigative and detective analysis of our data to see if we can unearth
any insights.

And there’s no shortage of text data, is there? We have data being generated from tweets, digital
media platforms, blogs, and a whole host of other sources. As a data scientist and an NLP
enthusiast, it’s important to analyze all this text data to help your organization make data-driven
decisions.

Code:

import numpy as np
import pandas as pd
from google.colab import files
import matplotlib.pyplot as plt
import seaborn as sns
import string
from wordcloud import wordcloud
upload =files.upload()
for fn in upload.keys():
print('User uploaded file "{name}" with length {length}
bytes'.format( name=fn,length=len(upload[fn])))
upload
import io
Reviews_df = pd.read_csv(io.StringIO(upload['Reviews.csv'].decode('utf-8')))
Outcome: - After the practical, student have understood and implemented the Exploratory
data on text (Word Cloud)
Experiment No. 07

Aim: Write a Program and Implement Wordnet – Lesk

Algorithm

Objective: To make student understand the Wordnet – Lesk Algorithm.

Theory:

WordNet is a lexical database for the English language, which was created by Princeton, and
is part of the NLTK corpus. It is a machine-readable database of words which can be accessed
from most popular programming languages (C, C#, Java, Ruby, Python etc.). WordNet
superficially resembles a thesaurus, in that it groups words together based on their meanings.

WordNet is not like your traditional dictionary. WordNet focuses on the relationship between
words along with their definitions, and this makes a WordNet a network instead of a list. NLTK
includes the English WordNet, with 155,287 words and 117,659 synonym sets.

In the WordNet network, the words are connected by linguistic relations. These linguistic
relations (hypernym, hyponym, meronym, holonym and other fancy sounding stuff), are
WordNet’s secret sauce. They give you powerful capabilities that are missing in an ordinary
dictionary/thesaurus.

1) Synonyms
WordNet stores synonyms in the form of synsets where each word in the synset shares
the same meaning. Basically, each synset is a group of synonyms. Each synset has a
definition associated with it. Relations are stored between different synsets. In the
following example. Take the word ‘sofa’. We have only one synset for ‘sofa’ which
means that it has only one context or meaning. Another word like ‘jupiter’ will give
two synsets because it has two meanings – one as ‘planet’ and the other as ‘Roman
God’.
1) Synonyms
WordNet stores synonyms in the form of synsets where each word in the synset
shares the same meaning. Basically, each synset is a group of synonyms. Each
synset has a definition associated with it. Relations are stored between different
synsets. In the following example. Take the word ‘sofa’. We have only one synset
for ‘sofa’ which means that it has only one context or meaning. Another word like
‘jupiter’ will give two synsets because it has two meanings – one as ‘planet’ and the
other as ‘Roman God’.

#First we have to import nltk and download the wordnet package

import nltk
nltk.download('wordnet')
#Next we import wordnet from nltk
from nltk.corpus import wordnet as
wn

#We can lookup a specific synset of a

word wn.synsets("star")
[nltk_data] Downloading package wordnet to /root/nltk_data...
[Synset('star.n.01'),
Synset('ace.n.03'),
Synset('star.n.03'),
Synset('star.n.04'),
Synset('star.n.05'),
Synset('headliner.n.01'),
Synset('asterisk.n.01'),
Synset('star_topology.n.01'),
Synset('star.v.01'),
Synset('star.v.02'),
Synset('star.v.03'),
Synset('leading.s.01')]
import nltk
nltk.download('wordn
et')

#Next we import wordnet from nltk

from nltk.corpus import wordnet as
wn

#We can lookup a specific synset of a

word syns = wn.synsets("Jupiter")
syns =
wn.synsets("Jupiter")
syns
[nltk_data]
# Just theDownloading
word: package wordnet to /root/nltk_data...
[nltk_data] Package wordnet is already up-to-date!
print(syns[0].lemmas()[0].name())

[Synset('jupiter.n.01'), Synset('jupiter.n.02')]
syns[0].definition()
‘the largest planet and the 5th from the sun; has many satellites and is one of the brightest
objects in the night sky’
syns[1].definition()
‘(Roman mythology) supreme god of Romans; counterpart of Greek Zeus’
2) Hyponyms and Hypernyms
Hyponyms and Hypernyms are specific and generalized concepts respectively. For
example, ‘beach house’ and ‘guest house’ are hyponyms of ‘house’. They are more
specific concepts of ‘house’. And ‘house’ is a hypernym of ‘guest house’ because
it is a general concept. ‘Egg Noodle’ is a hyponym of ‘noodle’ and ‘pasta’ is a
hypernym of ‘noodle.

wn.synset('noodle.n.01').hyponyms()
[Synset('egg_noodle.n.01')]
wn.synset('noodle.n.01').hypernyms()
[Synset('pasta.n.02')]

wn.synset('egg_noodle.n.01').definition()
‘narrow strip of pasta dough made with eggs’
wn.synset('pasta.n.01').definition()
‘a dish that contains pasta as its main ingredient’
3) Meronyms and Holonyms
Meronyms and Holonyms represent the part-whole relationship. The meronym
represents the part and the holonym represents the whole. For example, ‘kitchen’ is
a meronym of ‘home'(the kitchen is a part of the home), ‘mattress’ is a meronym of
‘bed’, and ‘bedroom’ is a holonym of ‘bed’.

wn.synset('bed.n.01').part_holonyms()
[Synset('bedroom.n.01')]

wn.synset('bed.n.01').part_meronyms()
[Synset('bedstead.n.01'), Synset('mattress.n.01')]

4) Word Similarity
We can compute the similarity between two words based on the distance between
words in the WordNet network. The smaller the distance, the more similar the
words. In this way, it is possible to quantitatively figure out that a cat and a dog are
similar, a phone and a computer are similar, but a cat and a phone are not similar!

import nltk
nltk.edit_distance("humpty", "dumpty")
Output: 1
import
difflib

a = 'Thanks for calling America

Expansion' b = 'Thanks for calling
American Express'

seq =
difflib.SequenceMatcher(None,a,b) d
= seq.ratio()*100
print(d)
Output: 87.32394366197182
import difflib

a = 'phone'
b = 'computer'

seq =
difflib.SequenceMatcher(None,a,b) d
= seq.ratio()*100
print(d)
Output: 30.76923076923077

Lesk algorithm
consider three examples of the distinct senses that exist for the word "bass":
1. a type of fish
2. tones of low frequency
3. a type of
instrument and the
sentences:
1. I went fishing for some sea bass.
2. The bass line of the song is too weak.
To a human, it is obvious that the first sentence is using the word "bass (fish)", as
in the former sense above and in the second sentence, the word "bass (instrument)"
is being used as in the latter sense below. Developing algorithms to replicate this
human ability can often be a difficult task,
In the above example, for the first and the second sentence, the lesk algorithm is
some what accurate in understanding the context of the word bass in the sentence.
But for the third sentence where the bass is in the context of musical instrument, it
is estimating the word as Synset('sea_bass.n.01) which is clearly not correct!
Unfortunately, Lesk’s approach is very sensitive to the exact wording of definitions,
so the absence of a certain word can radically change the results.

#First we have to import nltk and download the wordnet package

import nltk
nltk.download('wordnet')

#Next we import wordnet from nltk

from nltk.corpus import wordnet as wn

#We can lookup a specific synset of a word

wn.synsets("star")
Output:
[nltk_data] Downloading package wordnet to /root/nltk_data...
[Synset('star.n.01'),
Synset('ace.n.03'),
Synset('star.n.03'),
Synset('star.n.04'),
Synset('star.n.05'),
Synset('headliner.n.01'),
Synset('asterisk.n.01'),
Synset('star_topology.n.01'),
Synset('star.v.01'),
Synset('star.v.02'),
Synset('star.v.03'),
Synset('leading.s.01')]
#First we have to import nltk and download the wordnet package
import nltk
nltk.download('wordnet')

#Next we import wordnet from nltk

from nltk.corpus import wordnet as
wn

#We can lookup a specific synset of a

word syns = wn.synsets("Jupiter")
Output:
syns
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Package wordnet is already up-to-date!
[Synset('jupiter.n.01'), Synset('jupiter.n.02')]

Outcome: - After the practical, student have understood and implemented the lesk algorithm
Experiment .08
Aim: CASE STUDY: Application of NLP-Sentiment Analysis of Real Comments
in Social Media platform

Theory: A Twitter sentiment analysis determines negative, positive, or neutral emotions within the
text of a tweet using NLP and ML models. Sentiment analysis or opinion mining refers to
identifying as well as classifying the sentiments that are expressed in the text source. Tweets are
often useful in generating a vast amount of sentiment data upon analysis. These data are useful in
understanding the opinion of people on social media for a variety of topics.

Practical:
!pip install -q transformers
from transformers import pipeline
sentiment_pipeline = pipeline("sentiment-
analysis") data =["I Love You","I hate you"]
sentiment_pipeline(data)
specific_model = pipeline(model="finiteautomata/bertweet-base-sentiment-analysis")

SENTIMENT ANALYSIS WITH TextBlob

import pandas as
pd import numpy
as np
from textblob import TextBlob
import matplotlib.pyplot as
plt from nltk.corpus import
stopwords Subjectivity Tells Facts or
opinion blob1.sentiment
text2 = "flight was horrible and filled with
turbulence" blob2 = TextBlob(text2)
blob2.sentiment
text3 ="earth revolves around the
sun" blob3 = TextBlob(text3)

Screenshot:
Outcome: After the practical, student have understood and implemented the Sentiment Analysis of tweets in
Social Media platform

An Exegetical Summary of 1-2 Thessalonians
100% (4)
An Exegetical Summary of 1-2 Thessalonians
211 pages
Text Preprocessing For NLP
No ratings yet
Text Preprocessing For NLP
15 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
54 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
55 pages
NLP Manual (1-12) 1
No ratings yet
NLP Manual (1-12) 1
56 pages
Wsma Final Manual
No ratings yet
Wsma Final Manual
58 pages
Lab 2
No ratings yet
Lab 2
49 pages
Ir Manual
No ratings yet
Ir Manual
53 pages
NLTK
No ratings yet
NLTK
3 pages
Lab Prgms Weel1-Output
No ratings yet
Lab Prgms Weel1-Output
4 pages
Natural Language Processing: Practical 1
No ratings yet
Natural Language Processing: Practical 1
64 pages
Viva Questions
No ratings yet
Viva Questions
6 pages
ANLP semVI Labmanual
No ratings yet
ANLP semVI Labmanual
33 pages
Final LP-VI NLP Manual 2023-24
No ratings yet
Final LP-VI NLP Manual 2023-24
29 pages
NLP Programs
No ratings yet
NLP Programs
5 pages
UBC Summer School in NLP - VSP 2019 Lecture 10
No ratings yet
UBC Summer School in NLP - VSP 2019 Lecture 10
33 pages
NLP Intro
No ratings yet
NLP Intro
15 pages
NLPEXP3[1]
No ratings yet
NLPEXP3[1]
3 pages
NLP Lab Work
No ratings yet
NLP Lab Work
34 pages
NLP Unit-2
No ratings yet
NLP Unit-2
12 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
19 pages
AP For NLP-Word 2 Vec
No ratings yet
AP For NLP-Word 2 Vec
33 pages
NLP Lab - Manual
No ratings yet
NLP Lab - Manual
33 pages
Natural Language Processing
No ratings yet
Natural Language Processing
25 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
17 pages
NLP Experiment 2
No ratings yet
NLP Experiment 2
5 pages
AP For NLP-LO1
No ratings yet
AP For NLP-LO1
61 pages
NLP Exp 2
No ratings yet
NLP Exp 2
4 pages
NLP Lab Manual Lab Work
No ratings yet
NLP Lab Manual Lab Work
24 pages
NLP Lecture2 Text Pre Processing
No ratings yet
NLP Lecture2 Text Pre Processing
54 pages
NLP Lab Manual (R20)
50% (2)
NLP Lab Manual (R20)
24 pages
NLP Lab Manual Final
No ratings yet
NLP Lab Manual Final
25 pages
NLP Manual
No ratings yet
NLP Manual
15 pages
LP Vi Manual
No ratings yet
LP Vi Manual
77 pages
Aiml P4
No ratings yet
Aiml P4
12 pages
For Assignment-10 (Machine Learning With Python - NLP-2)
No ratings yet
For Assignment-10 (Machine Learning With Python - NLP-2)
37 pages
NLTK
No ratings yet
NLTK
4 pages
NLP Experiment 1
No ratings yet
NLP Experiment 1
13 pages
NLP 02
No ratings yet
NLP 02
6 pages
NLP Smitpatel
No ratings yet
NLP Smitpatel
32 pages
NLP Programming
No ratings yet
NLP Programming
39 pages
NLP Module 1
No ratings yet
NLP Module 1
71 pages
Date: Practical No.4:: Foundation of AI and ML (4351601)
No ratings yet
Date: Practical No.4:: Foundation of AI and ML (4351601)
10 pages
Token Ization
No ratings yet
Token Ization
5 pages
NLP - 1 - 250119 - 222702
No ratings yet
NLP - 1 - 250119 - 222702
71 pages
Text Preprocessing Stages
No ratings yet
Text Preprocessing Stages
8 pages
NLP CT1
No ratings yet
NLP CT1
6 pages
Experiment: 1
No ratings yet
Experiment: 1
28 pages
Web and Social Media Analytics Lab
No ratings yet
Web and Social Media Analytics Lab
34 pages
Jal Patel NLP
No ratings yet
Jal Patel NLP
32 pages
Removing Stopwords in NLP
No ratings yet
Removing Stopwords in NLP
32 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
01 NLP - Merged Vinay
No ratings yet
01 NLP - Merged Vinay
27 pages
NLP - Exp 1 11
No ratings yet
NLP - Exp 1 11
29 pages
NLB Final Lab Manual
No ratings yet
NLB Final Lab Manual
23 pages
NLP Lab Manual 3-2 Aiml R22 Update
100% (1)
NLP Lab Manual 3-2 Aiml R22 Update
20 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
4.twitter Extraction and Analytics
No ratings yet
4.twitter Extraction and Analytics
45 pages
65 SC Tae1 A3
No ratings yet
65 SC Tae1 A3
3 pages
Schumann John 2012
No ratings yet
Schumann John 2012
5 pages
Lesson 6: Communication Through Verbal and Non-Verbal Messages
No ratings yet
Lesson 6: Communication Through Verbal and Non-Verbal Messages
8 pages
Direct and Indirect Speech
No ratings yet
Direct and Indirect Speech
16 pages
Language - Belonging - Politics
No ratings yet
Language - Belonging - Politics
274 pages
Phonology-Segmental and Suprasegmental
No ratings yet
Phonology-Segmental and Suprasegmental
4 pages
America Class Worksheet
No ratings yet
America Class Worksheet
0 pages
Language Learning ESL Lesson Plan - Fluentize
No ratings yet
Language Learning ESL Lesson Plan - Fluentize
6 pages
Section 1
No ratings yet
Section 1
10 pages
қмж 10, 18.11.22ж
No ratings yet
қмж 10, 18.11.22ж
2 pages
Test 7
No ratings yet
Test 7
9 pages
Taylor - Sources of The Self (CH 2,3,6,21)
No ratings yet
Taylor - Sources of The Self (CH 2,3,6,21)
109 pages
Codepages For SAP
No ratings yet
Codepages For SAP
2 pages
General English Pre Intermediate Syllabus PDF
No ratings yet
General English Pre Intermediate Syllabus PDF
10 pages
Contemporary India and Education Unit 1
No ratings yet
Contemporary India and Education Unit 1
15 pages
Ripu KachariOfdk
No ratings yet
Ripu KachariOfdk
7 pages
The Ashtádhyáyí of Pá Ini
No ratings yet
The Ashtádhyáyí of Pá Ini
149 pages
An Analytical Study of Spelling Difficulties Among Secondary School Students
0% (1)
An Analytical Study of Spelling Difficulties Among Secondary School Students
6 pages
Sensory Processing Disorder - Learning Survey
100% (2)
Sensory Processing Disorder - Learning Survey
1 page
How To Say Thesis in Spanish
100% (4)
How To Say Thesis in Spanish
8 pages
66cf1770a094f7acccca1017 - ## - VARC Planner
No ratings yet
66cf1770a094f7acccca1017 - ## - VARC Planner
1 page
Scopus
No ratings yet
Scopus
8 pages
Semantrix Pilot (Schedule)
No ratings yet
Semantrix Pilot (Schedule)
3 pages
English 1101
No ratings yet
English 1101
5 pages
Present Past Perfect Progressive Tenses
No ratings yet
Present Past Perfect Progressive Tenses
3 pages
Listening Through Reading - Musical Literacy and The Concert Audience - Botstein 1992
No ratings yet
Listening Through Reading - Musical Literacy and The Concert Audience - Botstein 1992
18 pages
BS 1st
No ratings yet
BS 1st
2 pages
Table-of-Specification-Q1-EXAM-ENGLISH-6-BY-JENRAPISTA JULY 14 2019
No ratings yet
Table-of-Specification-Q1-EXAM-ENGLISH-6-BY-JENRAPISTA JULY 14 2019
3 pages
Hindi
No ratings yet
Hindi
38 pages
Learn Kannada Word English Word Simp
No ratings yet
Learn Kannada Word English Word Simp
9 pages

NLP-Lab Manual - Ashwini - Kachare

Uploaded by

NLP-Lab Manual - Ashwini - Kachare

Uploaded by

LAB MANUAL

Subject: Natural Language Processing (CSDL7013)

Prepared By: Ms. Ashwini Kachare

LO1 Learn capabilities and limitations of current natural language technologies

Experiment Experiment Name CO Mapping

4 Write a Program and implementation of the LO2

Objective: To provide Students an overview of how text processing is implemented.

from nltk.tokenize import word_tokenize

Objectives: To make students how to remove stop words from text

from nltk.corpus import stopwords

print("Unfiltered: ", words)

cleaned = [word for word in words if word not in

Aim: Write a Program and Implementation of Stemming and

Stemming is a kind of normalization for words. Normalization is a technique where a set of

He was taking the ride.

Lemmatization is a text normalization technique used in Natural Language Processing

Difference between Stemming and Lemmatization

from nltk.stem import PorterStemmer,

from nltk.stem import PorterStemmer

Objectives: To make student learn the POS Tagging.

Let's learn with a NLTK Part of Speech

example: Input: Everything to permit us.

Output: [('Everything', NN),('to', TO), ('permit', VB), ('us', PRP)]

Steps Involved in the POS tagging example:

● Tokenize text (word_tokenize)

● apply pos_tag to above step that is

nltk.pos_tag(tokenize_text) NLTK POS Tags Examples are as

Following table shows what the various symbol means:

Name of symbol Description

Part-Of-Speech (POS) Tagging

[('Book', 'IN'), ('the', 'DT'), ('ticket', 'NN')]

['The', 'clean', 'data', 'is', 'important', 'for', 'application', 'development', '.']

Objective: To make student understand the N-Gram Model

#Step 1 Install Import NLTK

[('The', 'best', 'and', 'most', 'beautiful'),

Aim: Write a Program and Implement Wordnet – Lesk

Objective: To make student understand the Wordnet – Lesk Algorithm.

#First we have to import nltk and download the wordnet package

#We can lookup a specific synset of a

#Next we import wordnet from nltk

#We can lookup a specific synset of a

a = 'Thanks for calling America

#First we have to import nltk and download the wordnet package

#Next we import wordnet from nltk

#We can lookup a specific synset of a word

#Next we import wordnet from nltk

#We can lookup a specific synset of a

SENTIMENT ANALYSIS WITH TextBlob

You might also like