0% found this document useful (0 votes)
39 views41 pages

NLP-Lab Manual - Ashwini - Kachare

nlp

Uploaded by

shrutikadam-cmpn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views41 pages

NLP-Lab Manual - Ashwini - Kachare

nlp

Uploaded by

shrutikadam-cmpn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

LAB MANUAL

Subject: Natural Language Processing (CSDL7013)

Semester: VI

Prepared By: Ms. Ashwini Kachare


Lab Outcomes

LO1 Learn capabilities and limitations of current natural language technologies


LO2 Model linguistic phenomena with formal grammars
LO3 Design, implement and test algorithms for NLP problems
LO4 Apply NLP techniques to design real world NLP
applications
List of experiments

Experiment Experiment Name CO Mapping


No
1 Write a Program and Implement of LO1
Processing text Word and Sentence
Tokenization, LowerCase &
Uppercase using Python
Programming.
2 Write a Program and LO1
Implement of Processing on
Text Stop Word Removal
,Punctuations And Filtration.
3 Write a Program and Implementation of LO2
Stemming And Lemmatization

4 Write a Program and implementation of the LO2


different POS
taggers and Perform POS tagging on the text.
5 Write a Program and implement N-Gram LO3
model for the given text input.
6 Write a Program and Analysis of Exploratory LO3
data on text (Word Cloud)
7 Study and Implement Wordnet – Lesk LO3
Algorithm
8 CASE STUDY : Application of NLP- Sentiment LO4
Analysis of Real Comments in Social Media
platform
Experiment No.1

Aim: Write a Program and Implement of Processing text Word and Sentence Tokenization,
LowerCase & Uppercase using Python Programming.

Objective: To provide Students an overview of how text processing is implemented.

Theory:

Tokenization is the process by which a large quantity of text is divided into smaller parts called
tokens. These tokens are very useful for finding patterns and are considered as a base step for
stemming and lemmatization. Tokenization also helps to substitute sensitive data elements with
non-sensitive data elements.

Natural language processing is used for building applications such as Text classification,
intelligent chatbot, sentimental analysis, language translation, etc. It becomes vital to
understand the pattern in the text to achieve the above-stated purpose.

Natural Language toolkit has very important module NLTK tokenize sentences which further
comprises of sub-modules

1. Word tokenize

2. Sentence tokenize

Tokenization of words

We use the method word_tokenize() to split a sentence into words. The output of word
tokenization can be converted to Data Frame for better text understanding in machine learning
applications. It can also be provided as input for further text cleaning steps such as punctuation
removal, numeric character removal or stemming. Machine learning models need numeric data
to be trained and make a prediction. Word tokenization becomes a crucial part of the text
(string) to numeric data conversion

Tokenization of Sentences

Sub-module available for the above is sent_tokenize. An obvious question in your mind would
be why sentence tokenization is needed when we have the option of word tokenization. Imagine
you need to count average words per sentence, how you will calculate? For accomplishing such
a task, you need both NLTK sentence tokenizer as well as NLTK word tokenizer to calculate
the ratio. Such output serves as an important feature for machine training as the answer would
be numeric.
Code:

#Tokenization
text = "Hello Everyone"
import nltk
nltk.download('punkt')
from nltk.tokenize import sent_tokenize,word_tokenize
Output:
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Unzipping tokenizers/punkt.zip.
Word Tokenization
word_tokenize(text)
Output
['Hello', 'Everyone']
Sentence Tokenization

from nltk.tokenize import word_tokenize


text = "God is Great! I Won a Lottery."
print (word_tokenize(text))

Output:
['God', 'is', 'Great', '!', 'I', 'Won', 'a', 'Lottery', '.']

Screenshot:
Lower Case
Code:
import string
raw_docs = ["I am wariting some very basic english sentences", "I`m just writing it for the
demo PURPOSE to make audience understand the basics .""The Point is to learn HOW it
works_on #simple # data. "]
raw_docs = [doc.lower() for doc in
raw_docs] print(raw_docs)
Output:
['i am wariting some very basic english sentences', 'i`m just writing it for the demo purpose to
make audience understand the basics .the point is to learn how it works_on #simple # data. ']

Upper Case

Code:

Output:

Outcome: After Finishing of this Practical student will be able to understand basics Text
Processing
Experiment No.2

Aim: Write a Program and Implement of Processing on Text Stop Word Removal,Punctuations and
Filtration.

Objectives: To make students how to remove stop words from text

Theory:

English text may contain stop words like ‘the’, ‘is’, ‘are’. Stop words can be filtered from the
text to be processed. There is no universal list of stop words in NLP research, however the
NLTK module contains a list of stop words. Now you will learn how to remove stop words
using the NLTK.

Filtration:

Many of the words used in the phrase are insignificant and hold no meaning. For example –
English is a subject. Here, ‘English’ and ‘subject’ are the most significant words and ‘is’, ‘a’
are almost useless. English subject and subject English holds the same meaning even if we
remove the insignificant words – (‘is’, ‘a’).

Stop-Word

Removal Code:

import nltk
nltk.download("stopword
Output:
s")
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data] Unzipping corpora/stopwords.zip.
True

Code:

from nltk.corpus import stopwords

print(stopwords.words("english"))
Output:
['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're"]
Screenshot
Filtering

Code:

import nltk

text = "This is an example text for stopword removal and filtering. This is done using
NLTK's stopwords."
words = nltk.word_tokenize(text)

print("Unfiltered: ", words)


stopwords = nltk.corpus.stopwords.words("english")

cleaned = [word for word in words if word not in


stopwords] print("Filtered: ", cleaned)
Output:
Unfiltered: ['This', 'is', 'an', 'example', 'text', 'for', 'stopword', 'removal', 'and', 'filtering', '.',
'This', 'is', 'done', 'using', 'NLTK', "'s", 'stopwords', '.']
Filtered: ['This', 'example', 'text', 'stopword', 'removal', 'filtering', '.', 'This', 'done', 'using',
'NLTK', "'s", 'stopwords', '.']

Screenshot

Outcome: - After the practical, student have understood and Implementation of Processing
text Stop Word Removal, Filtration.
Experiment No.3

Aim: Write a Program and Implementation of Stemming and

Lemmatization Algorithm

Objective: To make student learn how stem and lemmatize words in NLP

Theory:

What is Stemming?

Stemming is a kind of normalization for words. Normalization is a technique where a set of


words in a sentence are converted into a sequence to shorten its lookup. The words which have
the same meaning but have some variation according to the context or sentence are normalized.

In another word, there is one root word, but there are many variations of the same words. For
example, the root word is "eat" and it's variations are "eats, eating, eaten and like so". In the
same way, with the help of Stemming, we can find the root word of any variations.

Example

He was riding.

He was taking the ride.

In the above two sentences, the meaning is the same, i.e., riding activity in the past. A human
can easily understand that both meanings are the same. But for machines, both sentences are
different. Thus it became hard to convert it into the same data row. In case we do not provide
the same data-set, then machine fails to predict. So it is necessary to differentiate the meaning
of each word to prepare the dataset for machine learning. And here stemming is used to
categorize the same type of data by getting its root word.

What is Lemmatization

Lemmatization is a text normalization technique used in Natural Language Processing


(NLP), that switches any kind of a word to its base root mode. Lemmatization is responsible
for grouping different inflected forms of words into the root form, having the same meaning.
Tagging systems, indexing, SEOs, information retrieval, and web search all use
lemmatization to a vast extent. Lemmatization usually involves using a vocabulary and
morphological analysis of words, removing inflectional endings, and returning the dictionary
form of a word (the lemma).
Example of Lemmation

Difference between Stemming and Lemmatization


In stemming, the end or beginning of a word is cut off, keeping common prefixes and suffixes
that can be found in inflected words in mind. Lemmatization uses dictionaries to conduct a
morphological analysis of the word and link it to its lemma. Lemmatization always returns the
dictionary meaning of the word while converting into root-form.

Stemming Code:

from nltk.stem import PorterStemmer,


LancasterStemmer porter_stemmer = PorterStemmer ()
print (porter_stemmer.stem('observing'))
print(porter_stemmer.stem('observs'))
print(porter_stemmer.stem('observe'))
Output
observ
observ
observ

lancaster_stemmer = LancasterStemmer()
print(lancaster_stemmer.stem('observing'))
print(lancaster_stemmer.stem('observs'))
print(lancaster_stemmer.stem('observe'))
Output
observ
observ
observ
Screenshot:

Lemmatization Code
from nltk.stem import WordNetLemmatize
nltk.download('wordnet')
Output
[nltk_data] Downloading package wordnet to /root/nltk_data...
True
lemmatizer = WordNetLemmatizer()
print(lemmatizer.lemmatize("running"))
print(lemmatizer.lemmatize("runs"))
output
running
run
Screenshot:

Lemmatizer-Returns verb,noun,Adverb,Adjective
form
11
def lemmtize(word):
lemmatizer = WordNetLemmatizer()
print("verb form: "+
lemmatizer.lemmatize(word,pos="v")) print("noun form:" +
lemmatizer.lemmatize(word,pos="n")) print("adverb form:"
+ lemmatizer.lemmatize(word,pos="r")) print("adjective
form:" +lemmatizer.lemmatize(word,pos="a"))
lemmtize("ears")
verb form: ears
Output
noun form:ear
adverb form:ears
adjective
form:ears

Screensh

The Following code snippet shows the comparison between stemming and
lemmatization

Code:

from nltk.stem import PorterStemmer


from nltk.stem import WordNetLemmatizer
stemmer = PorterStemmer ();
lemmatizer = WordNetLemmatizer()
print(stemmer.stem("deactivating"))
print(stemmer.stem("deactivated"))
print(stemmer.stem("deactivates"))
Output
deactiv
deactiv
deactiv

import nltk
nltk.download('wordnet')
Output
[nltk_data] Downloading package wordnet to /root/nltk_data...

print(lemmatizer.lemmatize("deactivating",pos="v")
)
print(lemmatizer.lemmatize("deactivative",pos="r"))
True
print(lemmatizer.lemmatize("deactivating",pos="n")
) Output
deactivate
deactivative
deactivatin
g

print(stemmer.stem('stones'))
print(stemmer.stem('speaking'))
print(stemmer.stem('bedroom'))
print(stemmer.stem('jokes'))
print(stemmer.stem('lisa'))
print(stemmer.stem('purple'))
Output
stone
speak
bedroom
joke
lisa
purpl

print(lemmatizer.lemmatize('stones'))
print(lemmatizer.lemmatize('speaking'))
print(lemmatizer.lemmatize('bedroom'))
print(lemmatizer.lemmatize('jokes'))
print(lemmatizer.lemmatize('lisa'))
print(lemmatizer.lemmatize('purple'))
Output
stone

speaking
bedroom
joke
lisa
purple

Screenshot:

Outcome: - After the practical, student have understood and Implementation of Stemming
and Lemmatization.
Experiment No.4

Aim: Write a Program and implementation of the different POS taggers and Perform POS
tagging on the text.

Objectives: To make student learn the POS Tagging.

Theory:

POS Tagging (Parts of Speech Tagging) is a process to mark up the words in text format for a
particular part of a speech based on its definition and context. It is responsible for text reading
in a language and assigning some specific token (Parts of Speech) to each word. It is also called
grammatical tagging.

Let's learn with a NLTK Part of Speech

example: Input: Everything to permit us.

Output: [('Everything', NN),('to', TO), ('permit', VB), ('us', PRP)]

Steps Involved in the POS tagging example:

● Tokenize text (word_tokenize)

● apply pos_tag to above step that is

nltk.pos_tag(tokenize_text) NLTK POS Tags Examples are as

Below:
What is Chunking in NLP

Chunking in NLP is a process to take small pieces of information and group them into large
units. The primary use of Chunking is making groups of "noun phrases." It is used to add
structure to the sentence by following POS tagging combined with regular expressions. The
resulted group of words are called "chunks." It is also called shallow parsing.

In shallow parsing, there is maximum one level between roots and leaves while deep parsing
comprises of more than one level. Shallow parsing is also called light parsing or chunking.
Rules for Chunking: There are no pre-defined rules, but you can combine them according to
need and requirement.

For example, you need to tag Noun, verb (past tense), adjective, and coordinating junction
from the sentence. You can use the rule as below

chunk:{<NN.?>*<VBD.?>*<JJ.?>*<CC>?}

Following table shows what the various symbol means:

Name of symbol Description


. Any character except new line
* Match 0 or more repetitions
? Match 0 or 1 repetitions

Code:

Part-Of-Speech (POS) Tagging

import nltk
nltk.download('punkt')
from nltk import word_tokenize,pos_tag
nltk.download('averaged_perceptron_tagger'
) sentence = "Book the ticket"
sentence_tokens = word_tokenize(sentence)
print(sentence_tokens)
pos_tag (sentence_tokens)
Output
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Unzipping tokenizers/punkt.zip.
True
[nltk_data] Downloading package averaged_perceptron_tagger
to [nltk_data] /root/nltk_data...
[nltk_data] Unzipping taggers/averaged_perceptron_tagger.zip.
True
['Book', 'the', 'ticket']

[('Book', 'IN'), ('the', 'DT'), ('ticket', 'NN')]


Screenshot:

Chunking-making word
phrases Code:

import nltk
text = "The clean data is important for application development."
tokens = nltk.word_tokenize(text)
print(tokens)
tag = nltk.pos_tag(tokens)
print(tag)
grammar = "NP: {<DT>?<JJ>*<NN>}"
cp =nltk.RegexpParser(grammar)
result = cp.parse(tag)
print(result)
Output:

['The', 'clean', 'data', 'is', 'important', 'for', 'application', 'development', '.']


[('The', 'DT'), ('clean', 'JJ'), ('data', 'NN'), ('is', 'VBZ'), ('important', 'JJ'), ('for', 'IN'),
('application', 'NN'), ('development', 'NN'), ('.', '.')]

(S
(NP The/DT clean/JJ
data/NN) is/VBZ
important/JJ
for/IN
(NP application/NN)
(NP development/NN)
./.)

Screenshot:

Outcome: - After the practical, student have understood and implementation of the different
POS taggers and Perform POS tagging on the text.
Experiment No.5

Aim: Write a Program and implement N-Gram model for the given text input.

Objective: To make student understand the N-Gram Model

Theory: N-grams are one of the fundamental concepts every data scientist and computer
science professional must know while working with text data. In this beginner-level tutorial,
we will learn what n-grams are and explore them on text data in Python. The objective of the
blog is to analyze different types of n-grams on the given text data and hence decide which n-
gram works the best for our data.

N-gram model predicts the most probable word that might follow this sequence. It's a
probabilistic model that's trained on a corpus of text. Such a model is useful in many NLP
applications including speech recognition, machine translation and predictive text input. An N-
gram model is built by counting how often word sequences occur in corpus text and then
estimating the probabilities. Since a simple N-gram model has limitations, improvements are
often made via smoothing, interpolation and backoff. An N-gram model is one type of a
Language Model (LM), which is about finding the probability distribution over word
sequences.

Code:

#Step 1 Install Import NLTK


import os
import nltk
import nltk.corpus
from nltk.util import
bigrams,trigrams,ngrams
nltk.download('punkt')
string ="The best and most beautiful things in the world cannot be seen or ever toched, they
must be felt with the heart"
quotes_tokens =nltk.word_tokenize(string)
quotes_tokens
quotes_bigrams = list(nltk.bigrams(quotes_tokens))
quotes_bigrams
quotes_trigrams = list(nltk.trigrams(quotes_tokens))
quotes_trigrams
quotes_ngrams = list(nltk.ngrams(quotes_tokens,5))
Output:
quotes_ngrams
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Package punkt is already up-to-date!
True
['The',
'best',
'and',
'most',
'beautiful'
, 'things',
'in',
'the',
'world',
'can',
'not',
'be',
'seen',
'or',
'ever',
'toched',
',',
'they',
'must',
'be',
'felt',
'with',
'the',
'heart']
[('The', 'best'), ('best', 'and'), ('and', 'most'), ('most', 'beautiful'), ('beautiful', 'things'), ('things',
'in'), ('in', 'the'), ('the', 'world'), ('world', 'can'), ('can', 'not'), ('not', 'be'), ('be', 'seen'), ('seen',
'or'), ('or', 'ever'), ('ever', 'toched'), ('toched', ','), (',', 'they'), ('they', 'must'), ('must', 'be'), ('be',
'felt'), ('felt', 'with'), ('with', 'the'), ('the', 'heart')]
[('The', 'best', 'and'), ('best', 'and', 'most'), ('and', 'most', 'beautiful'), ('most', 'beautiful', 'things'),
('beautiful', 'things', 'in'), ('things', 'in', 'the'), ('in', 'the', 'world'), ('the', 'world', 'can'), ('world',
'can', 'not'), ('can', 'not', 'be'), ('not', 'be', 'seen'), ('be', 'seen', 'or'), ('seen', 'or', 'ever'), ('or', 'ever',
'toched'), ('ever', 'toched', ','), ('toched', ',', 'they'), (',', 'they', 'must'), ('they', 'must', 'be'), ('must',
'be', 'felt'), ('be', 'felt', 'with'), ('felt', 'with', 'the'), ('with', 'the', 'heart')]

[('The', 'best', 'and', 'most', 'beautiful'),


('best', 'and', 'most', 'beautiful', 'things'),
('and', 'most', 'beautiful', 'things', 'in'),
('most', 'beautiful', 'things', 'in', 'the'),
('beautiful', 'things', 'in', 'the', 'world'),
('things', 'in', 'the', 'world', 'can'),
('in', 'the', 'world', 'can', 'not'),
('the', 'world', 'can', 'not', 'be'),
('world', 'can', 'not', 'be', 'seen'),
('can', 'not', 'be', 'seen', 'or'),
('not', 'be', 'seen', 'or', 'ever'),
('be', 'seen', 'or', 'ever', 'toched'),
('seen', 'or', 'ever', 'toched', ','),
('or', 'ever', 'toched', ',', 'they'),
('ever', 'toched', ',', 'they', 'must'),
('toched', ',', 'they', 'must', 'be'),
(',', 'they', 'must', 'be', 'felt'),
('they', 'must', 'be', 'felt', 'with'),
('must', 'be', 'felt', 'with', 'the'),
('be', 'felt', 'with', 'the', 'heart')]

Screenshot:
Outcome: - After the practical, student have understood and implement N-Gram model for
the given text input.
Experiment No. 06

Aim: Write a Program and Analysis of Exploratory data on text (Word Cloud)

Objective: To make student understand the Exploratory data on text (Word Cloud)

Theory:

Exploratory Data Analysis is the process of exploring data, generating insights, testing
hypotheses, checking assumptions and revealing underlying hidden patterns in the data.

There are no shortcuts in a machine learning project lifecycle. We can’t simply skip to the
model building stage after gathering the data. We need to plan our approach in a structured
manner and the exploratory data analytics (EDA) stage plays a huge part in that.

We need to perform investigative and detective analysis of our data to see if we can unearth
any insights.

And there’s no shortage of text data, is there? We have data being generated from tweets, digital
media platforms, blogs, and a whole host of other sources. As a data scientist and an NLP
enthusiast, it’s important to analyze all this text data to help your organization make data-driven
decisions.

Code:

import numpy as np
import pandas as pd
from google.colab import files
import matplotlib.pyplot as plt
import seaborn as sns
import string
from wordcloud import wordcloud
upload =files.upload()
for fn in upload.keys():
print('User uploaded file "{name}" with length {length}
bytes'.format( name=fn,length=len(upload[fn])))
upload
import io
Reviews_df = pd.read_csv(io.StringIO(upload['Reviews.csv'].decode('utf-8')))
Outcome: - After the practical, student have understood and implemented the Exploratory
data on text (Word Cloud)
Experiment No. 07

Aim: Write a Program and Implement Wordnet – Lesk

Algorithm

Objective: To make student understand the Wordnet – Lesk Algorithm.

Theory:

WordNet is a lexical database for the English language, which was created by Princeton, and
is part of the NLTK corpus. It is a machine-readable database of words which can be accessed
from most popular programming languages (C, C#, Java, Ruby, Python etc.). WordNet
superficially resembles a thesaurus, in that it groups words together based on their meanings.

WordNet is not like your traditional dictionary. WordNet focuses on the relationship between
words along with their definitions, and this makes a WordNet a network instead of a list. NLTK
includes the English WordNet, with 155,287 words and 117,659 synonym sets.

In the WordNet network, the words are connected by linguistic relations. These linguistic
relations (hypernym, hyponym, meronym, holonym and other fancy sounding stuff), are
WordNet’s secret sauce. They give you powerful capabilities that are missing in an ordinary
dictionary/thesaurus.

1) Synonyms
WordNet stores synonyms in the form of synsets where each word in the synset shares
the same meaning. Basically, each synset is a group of synonyms. Each synset has a
definition associated with it. Relations are stored between different synsets. In the
following example. Take the word ‘sofa’. We have only one synset for ‘sofa’ which
means that it has only one context or meaning. Another word like ‘jupiter’ will give
two synsets because it has two meanings – one as ‘planet’ and the other as ‘Roman
God’.
1) Synonyms
WordNet stores synonyms in the form of synsets where each word in the synset
shares the same meaning. Basically, each synset is a group of synonyms. Each
synset has a definition associated with it. Relations are stored between different
synsets. In the following example. Take the word ‘sofa’. We have only one synset
for ‘sofa’ which means that it has only one context or meaning. Another word like
‘jupiter’ will give two synsets because it has two meanings – one as ‘planet’ and the
other as ‘Roman God’.

#First we have to import nltk and download the wordnet package


import nltk
nltk.download('wordnet')
#Next we import wordnet from nltk
from nltk.corpus import wordnet as
wn

#We can lookup a specific synset of a


word wn.synsets("star")
[nltk_data] Downloading package wordnet to /root/nltk_data...
[Synset('star.n.01'),
Synset('ace.n.03'),
Synset('star.n.03'),
Synset('star.n.04'),
Synset('star.n.05'),
Synset('headliner.n.01'),
Synset('asterisk.n.01'),
Synset('star_topology.n.01'),
Synset('star.v.01'),
Synset('star.v.02'),
Synset('star.v.03'),
Synset('leading.s.01')]
import nltk
nltk.download('wordn
et')

#Next we import wordnet from nltk


from nltk.corpus import wordnet as
wn

#We can lookup a specific synset of a


word syns = wn.synsets("Jupiter")
syns =
wn.synsets("Jupiter")
syns
[nltk_data]
# Just theDownloading
word: package wordnet to /root/nltk_data...
[nltk_data] Package wordnet is already up-to-date!
print(syns[0].lemmas()[0].name())

[Synset('jupiter.n.01'), Synset('jupiter.n.02')]
syns[0].definition()
‘the largest planet and the 5th from the sun; has many satellites and is one of the brightest
objects in the night sky’
syns[1].definition()
‘(Roman mythology) supreme god of Romans; counterpart of Greek Zeus’
2) Hyponyms and Hypernyms
Hyponyms and Hypernyms are specific and generalized concepts respectively. For
example, ‘beach house’ and ‘guest house’ are hyponyms of ‘house’. They are more
specific concepts of ‘house’. And ‘house’ is a hypernym of ‘guest house’ because
it is a general concept. ‘Egg Noodle’ is a hyponym of ‘noodle’ and ‘pasta’ is a
hypernym of ‘noodle.

wn.synset('noodle.n.01').hyponyms()
[Synset('egg_noodle.n.01')]
wn.synset('noodle.n.01').hypernyms()
[Synset('pasta.n.02')]

wn.synset('egg_noodle.n.01').definition()
‘narrow strip of pasta dough made with eggs’
wn.synset('pasta.n.01').definition()
‘a dish that contains pasta as its main ingredient’
3) Meronyms and Holonyms
Meronyms and Holonyms represent the part-whole relationship. The meronym
represents the part and the holonym represents the whole. For example, ‘kitchen’ is
a meronym of ‘home'(the kitchen is a part of the home), ‘mattress’ is a meronym of
‘bed’, and ‘bedroom’ is a holonym of ‘bed’.

wn.synset('bed.n.01').part_holonyms()
[Synset('bedroom.n.01')]

wn.synset('bed.n.01').part_meronyms()
[Synset('bedstead.n.01'), Synset('mattress.n.01')]

4) Word Similarity
We can compute the similarity between two words based on the distance between
words in the WordNet network. The smaller the distance, the more similar the
words. In this way, it is possible to quantitatively figure out that a cat and a dog are
similar, a phone and a computer are similar, but a cat and a phone are not similar!

import nltk
nltk.edit_distance("humpty", "dumpty")
Output: 1
import
difflib

a = 'Thanks for calling America


Expansion' b = 'Thanks for calling
American Express'

seq =
difflib.SequenceMatcher(None,a,b) d
= seq.ratio()*100
print(d)
Output: 87.32394366197182
import difflib

a = 'phone'
b = 'computer'

seq =
difflib.SequenceMatcher(None,a,b) d
= seq.ratio()*100
print(d)
Output: 30.76923076923077

Lesk algorithm
consider three examples of the distinct senses that exist for the word "bass":
1. a type of fish
2. tones of low frequency
3. a type of
instrument and the
sentences:
1. I went fishing for some sea bass.
2. The bass line of the song is too weak.
To a human, it is obvious that the first sentence is using the word "bass (fish)", as
in the former sense above and in the second sentence, the word "bass (instrument)"
is being used as in the latter sense below. Developing algorithms to replicate this
human ability can often be a difficult task,
In the above example, for the first and the second sentence, the lesk algorithm is
some what accurate in understanding the context of the word bass in the sentence.
But for the third sentence where the bass is in the context of musical instrument, it
is estimating the word as Synset('sea_bass.n.01) which is clearly not correct!
Unfortunately, Lesk’s approach is very sensitive to the exact wording of definitions,
so the absence of a certain word can radically change the results.

#First we have to import nltk and download the wordnet package


import nltk
nltk.download('wordnet')

#Next we import wordnet from nltk


from nltk.corpus import wordnet as wn

#We can lookup a specific synset of a word


wn.synsets("star")
Output:
[nltk_data] Downloading package wordnet to /root/nltk_data...
[Synset('star.n.01'),
Synset('ace.n.03'),
Synset('star.n.03'),
Synset('star.n.04'),
Synset('star.n.05'),
Synset('headliner.n.01'),
Synset('asterisk.n.01'),
Synset('star_topology.n.01'),
Synset('star.v.01'),
Synset('star.v.02'),
Synset('star.v.03'),
Synset('leading.s.01')]
#First we have to import nltk and download the wordnet package
import nltk
nltk.download('wordnet')

#Next we import wordnet from nltk


from nltk.corpus import wordnet as
wn

#We can lookup a specific synset of a


word syns = wn.synsets("Jupiter")
Output:
syns
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Package wordnet is already up-to-date!
[Synset('jupiter.n.01'), Synset('jupiter.n.02')]

Outcome: - After the practical, student have understood and implemented the lesk algorithm
Experiment .08
Aim: CASE STUDY: Application of NLP-Sentiment Analysis of Real Comments
in Social Media platform

Theory: A Twitter sentiment analysis determines negative, positive, or neutral emotions within the
text of a tweet using NLP and ML models. Sentiment analysis or opinion mining refers to
identifying as well as classifying the sentiments that are expressed in the text source. Tweets are
often useful in generating a vast amount of sentiment data upon analysis. These data are useful in
understanding the opinion of people on social media for a variety of topics.

Practical:
!pip install -q transformers
from transformers import pipeline
sentiment_pipeline = pipeline("sentiment-
analysis") data =["I Love You","I hate you"]
sentiment_pipeline(data)
specific_model = pipeline(model="finiteautomata/bertweet-base-sentiment-analysis")

SENTIMENT ANALYSIS WITH TextBlob


import pandas as
pd import numpy
as np
from textblob import TextBlob
import matplotlib.pyplot as
plt from nltk.corpus import
stopwords Subjectivity Tells Facts or
opinion blob1.sentiment
text2 = "flight was horrible and filled with
turbulence" blob2 = TextBlob(text2)
blob2.sentiment
text3 ="earth revolves around the
sun" blob3 = TextBlob(text3)

Screenshot:
Outcome: After the practical, student have understood and implemented the Sentiment Analysis of tweets in
Social Media platform

You might also like