0% found this document useful (0 votes)

79 views18 pages

ASTW RA03 PracticalManual

This document provides code samples for implementing various natural language processing (NLP) techniques in Python, including sentiment analysis, named entity recognition, stemming and lemmatization, bag of words, term frequency-inverse document frequency, stopwords removal, part-of-speech tagging, chunking, WordNet for synonyms and antonyms, and generating word clouds. Each code sample is accompanied by its output.

Uploaded by

Diksha Nasa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views18 pages

ASTW RA03 PracticalManual

Uploaded by

Diksha Nasa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Implement Sentiment Analysis 1

by a movie

Implement Named Entity 3

Recognition (NER) in Python
with Spacy
Implement Stemming & 5
Lemmatization

Implement Bag of Words 7

Implement Term Frequency– 8

Inverse Document Frequency
(TF-IDF)
Implement Stopwords 9

Implement POS Tagging 10

Implement Chunking 12

Implement WordNet 13

Implement Word Cloud 14

1) Implement Sentiment Analysis by a movie

Code:

import pandas as pd

import nltk

from nltk.sentiment.vader import SentimentIntensityAnalyzer

# reading and wragling data

df_avatar = pd.read_csv('avatar.csv', engine='python')

df_avatar_lines = df_avatar.groupby('character').count()

df_avatar_lines = df_avatar_lines.sort_values(by=['character_words'], ascending=False)[:10]

top_character_names = df_avatar_lines.index.values

# filtering out non-top characters

df_character_sentiment = df_avatar[df_avatar['character'].isin(top_character_names)]

df_character_sentiment = df_character_sentiment[['character', 'character_words']]

# calculating sentiment score

sid = SentimentIntensityAnalyzer()

df_character_sentiment.reset_index(inplace=True, drop=True)

df_character_sentiment[['neg', 'neu', 'pos', 'compound']] =

df_character_sentiment['character_words'].apply(sid.polarity_scores).apply(pd.Series)

df_character_sentiment
Output :
2) Implement Named Entity Recognition (NER) in Python with Spacy
! pip install spacy

! pip install nltk

! python -m spacy download en_core_web_sm

import spacy

from spacy import displacy

NER = spacy.load("en_core_web_sm")

raw_text="The Indian Space Research Organisation or is the national space agency of India,
headquartered in Bengaluru. It operates under Department of Space which is directly overseen by
the Prime Minister of India while Chairman of ISRO acts as executive of DOS as well."

text1= NER(raw_text)

for word in text1.ents:

print(word.text,word.label_)
Output :
3) Implement Stemming & Lemmatization

Stemming
import nltk

from nltk.stem.porter import PorterStemmer

porter_stemmer = PorterStemmer()

text = "studies studying cries cry"

tokenization = nltk.word_tokenize(text)

for w in tokenization:

print("Stemming for {} is {}".format(w,porter_stemmer.stem(w)))

Lemmatization

import nltk

from nltk.stem import WordNetLemmatizer

wordnet_lemmatizer = WordNetLemmatizer()

text = "studies studying cries cry"

tokenization = nltk.word_tokenize(text)

for w in tokenization:

print("Lemma for {} is {}".format(w, wordnet_lemmatizer.lemmatize(w)))

Output:
4) Implement Bag of Words

import pandas as pd

from sklearn.feature_extraction.text import CountVectorizer

text = ["I love writing code in Python. I love Python code",

"I hate writing code in Java. I hate Java code"]

df = pd.DataFrame({'review': ['review1', 'review2'], 'text':text})

cv = CountVectorizer(stop_words='english')

cv_matrix = cv.fit_transform(df['text'])

df_dtm = pd.DataFrame(cv_matrix.toarray(),

index=df['review'].values,

columns=cv.get_feature_names())

df_dtm

Output :
5) Implement Term Frequency–Inverse Document
Frequency (TF-IDF)

import pandas as pd

from sklearn.feature_extraction.text import TfidfVectorizer

text = ["I love writing code in Python. I love Python code",

"I hate writing code in Java. I hate Java code"]

df = pd.DataFrame({'review': ['review1', 'review2'], 'text':text})

tfidf = TfidfVectorizer(stop_words='english', norm=None)

tfidf_matrix = tfidf.fit_transform(df['text'])

df_dtm = pd.DataFrame(tfidf_matrix.toarray(),

index=df['review'].values,

columns=tfidf.get_feature_names())

df_dtm

Output :
6) Implement Stopwords

import nltk

from nltk.corpus import stopwords

sw_nltk = stopwords.words('english')

print(sw_nltk)

print(len(sw_nltk))

text = "When I first met her she was very quiet. She remained quiet during the entire two hour long
journey from Stony Brook to New York."

words = [word for word in text.split() if word.lower() not in sw_nltk]

new_text = " ".join(words)

print(new_text)

print("Old length: ", len(text))

print("New length: ", len(new_text))

Output :
7) Implement POS Tagging
import nltk

from nltk.corpus import stopwords

from nltk.tokenize import word_tokenize, sent_tokenize

stop_words = set(stopwords.words('english'))

txt = "Sukanya, Rajib and Naba are my good friends. " \

"Sukanya is getting married next year. " \

"Marriage is a big step in one’s life." \

"It is both exciting and frightening. " \

"But friendship is a sacred bond between people." \

"It is a special kind of love between us. " \

"Many of you must have tried searching for a friend "\

"but never found the right one."

# sent_tokenize is one of instances of

# PunktSentenceTokenizer from the nltk.tokenize.punkt module

tokenized = sent_tokenize(txt)

for i in tokenized:

# Word tokenizers is used to find the words

# and punctuation in a string

wordsList = nltk.word_tokenize(i)

# removing stop words from wordList

wordsList = [w for w in wordsList if not w in stop_words]

# Using a Tagger. Which is part-of-speech

# tagger or POS-tagger.

tagged = nltk.pos_tag(wordsList)

print(tagged)

Output :
8) Implement Chunking

import nltk

sentence = [

("the", "DT"),

("book", "NN"),

("has","VBZ"),

("many","JJ"),

("chapters","NNS")

chunker = nltk.RegexpParser(

r'''

NP:{<DT><NN.*><.*>*<NN.*>}

}<VB.*>{

'''

chunker.parse(sentence)

Output = chunker.parse(sentence)

print(Output)

Output :
9) Implement WordNet

import nltk

from nltk.corpus import wordnet

synonyms = []

antonyms = []

for synset in wordnet.synsets("evil"):

for l in synset.lemmas():

synonyms.append(l.name())

if l.antonyms():

antonyms.append(l.antonyms()[0].name())

print(set(synonyms))

print(set(antonyms))

Output :
10) Implement Word Cloud

import matplotlib.pyplot as plt

from nltk.corpus import stopwords

from nltk.tokenize import word_tokenize

from wordcloud import WordCloud

class WordCloudGeneration:

def preprocessing(self, data):

# convert all words to lowercase

data = [item.lower() for item in data]

# load the stop_words of english

stop_words = set(stopwords.words('english'))

# concatenate all the data with spaces.

paragraph = ' '.join(data)

# tokenize the paragraph using the inbuilt tokenizer

word_tokens = word_tokenize(paragraph)

# filter words present in stopwords list

preprocessed_data = ' '.join([word for word in word_tokens if not word in stop_words])

print("\n Preprocessed Data: " ,preprocessed_data)

return preprocessed_data

def create_word_cloud(self, final_data):

# initiate WordCloud object with parameters width, height, maximum font size and background
color
# call the generate method of WordCloud class to generate an image

wordcloud = WordCloud(width=1600, height=800, max_font_size=200,

background_color="black").generate(final_data)

# plt the image generated by WordCloud class

plt.figure(figsize=(12,10))

plt.imshow(wordcloud)

plt.axis("off")

plt.show()

wordcloud_generator = WordCloudGeneration()

# you may uncomment the following line to use custom input

# input_text = input("Enter the text here: ")

input_text = 'These datasets are used for machine-learning research and have been cited in
peer-reviewed academic journals. Datasets are an integral part of the field of machine learning.
Major advances in this field can result from advances in learning algorithms (such as deep learning),
computer hardware, and, less-intuitively, the availability of high-quality training datasets.[1]
High-quality labeled training datasets for supervised and semi-supervised machine learning
algorithms are usually difficult and expensive to produce because of the large amount of time
needed to label the data. Although they do not need to be labeled, high-quality datasets for
unsupervised learning can also be difficult and costly to produce.'

input_text = input_text.split('.')

clean_data = wordcloud_generator.preprocessing(input_text)

wordcloud_generator.create_word_cloud(clean_data)

Output :

Insect DK Eyewitness New Edition - DK
No ratings yet
Insect DK Eyewitness New Edition - DK
74 pages
NLP Preprocessing Steps
No ratings yet
NLP Preprocessing Steps
20 pages
British Airways Forage Report
No ratings yet
British Airways Forage Report
12 pages
Be Computer Engineering Semester 5 2024 May Internet of Things and Embedded Systems It Es Pattern 2019
No ratings yet
Be Computer Engineering Semester 5 2024 May Internet of Things and Embedded Systems It Es Pattern 2019
2 pages
Web 3 For Beginners
No ratings yet
Web 3 For Beginners
4 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
21 pages
NLP 1 Week Tutorial NLTK
No ratings yet
NLP 1 Week Tutorial NLTK
15 pages
NLP - Cheatsheet
No ratings yet
NLP - Cheatsheet
10 pages
Adoption of Open Source Software in Software-Intensive Industry
No ratings yet
Adoption of Open Source Software in Software-Intensive Industry
278 pages
Cover Letter Ebook Design
No ratings yet
Cover Letter Ebook Design
22 pages
Sample Paper For SubStaff-Clerk 2020 by Murugan
No ratings yet
Sample Paper For SubStaff-Clerk 2020 by Murugan
140 pages
C24064 - NLP - Lab Manual
No ratings yet
C24064 - NLP - Lab Manual
28 pages
Maxbox Starter136 Google Gemini API
No ratings yet
Maxbox Starter136 Google Gemini API
8 pages
NLPPractical
No ratings yet
NLPPractical
12 pages
Combine PDF
No ratings yet
Combine PDF
124 pages
NLP Record
No ratings yet
NLP Record
16 pages
NLP Lab
No ratings yet
NLP Lab
7 pages
20BCP112 - NLP Lab - LAB - Manual
No ratings yet
20BCP112 - NLP Lab - LAB - Manual
65 pages
Text Processing Techniques
No ratings yet
Text Processing Techniques
14 pages
123 NLP 456
No ratings yet
123 NLP 456
4 pages
AI Lab Manual Aktu
No ratings yet
AI Lab Manual Aktu
11 pages
DS 7
No ratings yet
DS 7
3 pages
Assignment 2
No ratings yet
Assignment 2
14 pages
0004000000197100
50% (2)
0004000000197100
8 pages
Adobe Scan 08 Jan 2025
No ratings yet
Adobe Scan 08 Jan 2025
7 pages
7 TextAnalysis
No ratings yet
7 TextAnalysis
3 pages
NLP - Assignment2 Proper RNN Working
No ratings yet
NLP - Assignment2 Proper RNN Working
3 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
17 pages
R22 NLP Python Programs
No ratings yet
R22 NLP Python Programs
15 pages
NLP Record
No ratings yet
NLP Record
23 pages
1 s2.0 S1386505623003556 Main
No ratings yet
1 s2.0 S1386505623003556 Main
11 pages
Logic Development
No ratings yet
Logic Development
4 pages
Log-2024 03 30 06 24
No ratings yet
Log-2024 03 30 06 24
10 pages
840dsl Initial Commissioning
No ratings yet
840dsl Initial Commissioning
138 pages
SPC224 - Architecting and Automating SharePoint Governance
No ratings yet
SPC224 - Architecting and Automating SharePoint Governance
64 pages
Stata Workshop
No ratings yet
Stata Workshop
13 pages
NLP Assignment (917722H031)
No ratings yet
NLP Assignment (917722H031)
18 pages
Aped For Fake News
No ratings yet
Aped For Fake News
6 pages
CS-875-Lecture 4
No ratings yet
CS-875-Lecture 4
47 pages
Final Record Analog Consists of All The Materials That U Can Get Full Marks
No ratings yet
Final Record Analog Consists of All The Materials That U Can Get Full Marks
171 pages
DSBA+Master+Codebook+ +Text+Mining+&+TSF
No ratings yet
DSBA+Master+Codebook+ +Text+Mining+&+TSF
11 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
19 pages
NLP Lab
No ratings yet
NLP Lab
18 pages
Detailed Explanation of The Code
No ratings yet
Detailed Explanation of The Code
4 pages
NLP Notebook
No ratings yet
NLP Notebook
20 pages
Syllabus
No ratings yet
Syllabus
2 pages
3.NCE V100R019C00 Basic Software and Hardware Operations (TaiShan) - 201910
No ratings yet
3.NCE V100R019C00 Basic Software and Hardware Operations (TaiShan) - 201910
25 pages
E319 Quiz PDF
No ratings yet
E319 Quiz PDF
48 pages
NLP Lab Programs
No ratings yet
NLP Lab Programs
18 pages
NLP Assignment2
No ratings yet
NLP Assignment2
7 pages
Picovoice Interview Questions
No ratings yet
Picovoice Interview Questions
1 page
Sumati
No ratings yet
Sumati
10 pages
R002 KrishAhuja BDA Lab9.Ipynb - Colab
No ratings yet
R002 KrishAhuja BDA Lab9.Ipynb - Colab
3 pages
NLP Lab1
No ratings yet
NLP Lab1
6 pages
NLP Manual
No ratings yet
NLP Manual
21 pages
1a NLTK
No ratings yet
1a NLTK
10 pages
NLP Record
No ratings yet
NLP Record
15 pages
DSBD 7 Ass
No ratings yet
DSBD 7 Ass
9 pages
NLP FinAL
No ratings yet
NLP FinAL
27 pages
SK NLP Practical (FS)
No ratings yet
SK NLP Practical (FS)
22 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
4 pages
Self Evaluation Exercises
No ratings yet
Self Evaluation Exercises
12 pages
Lab2 IR
No ratings yet
Lab2 IR
16 pages
NLP Tushar
No ratings yet
NLP Tushar
21 pages
NLP Lab - Manual
No ratings yet
NLP Lab - Manual
33 pages
NLP - Practical List
No ratings yet
NLP - Practical List
14 pages
Basenlp
No ratings yet
Basenlp
5 pages
A SIEM Solution Implementation: TEK-UP University
100% (1)
A SIEM Solution Implementation: TEK-UP University
39 pages
HuongDan ASM1
No ratings yet
HuongDan ASM1
6 pages
Unit V Virtual Instrumentation: 191Eic502T Industrial Instrumentation - Ii
No ratings yet
Unit V Virtual Instrumentation: 191Eic502T Industrial Instrumentation - Ii
31 pages
HP ILO Update
No ratings yet
HP ILO Update
5 pages
AI Phash3
No ratings yet
AI Phash3
11 pages
Metaswitch Datasheet Perimeta SBC Overview
No ratings yet
Metaswitch Datasheet Perimeta SBC Overview
2 pages
Notes
No ratings yet
Notes
19 pages
A Comprehensive Guide To Understand and Implement Text Classification in Python
No ratings yet
A Comprehensive Guide To Understand and Implement Text Classification in Python
34 pages
NLP Practicals
No ratings yet
NLP Practicals
6 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
SL-3 - Assignment No 7
No ratings yet
SL-3 - Assignment No 7
14 pages
Comprehensive AI & ML Course - From Beginner To Gen...
No ratings yet
Comprehensive AI & ML Course - From Beginner To Gen...
5 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
A Guide To Installing CMG 2015 Software
No ratings yet
A Guide To Installing CMG 2015 Software
19 pages
CATIA V5 Lectures
No ratings yet
CATIA V5 Lectures
94 pages
Online Food Delivery App Foodie
No ratings yet
Online Food Delivery App Foodie
12 pages
Machine Learning NLP LAB Sayak Mallick
No ratings yet
Machine Learning NLP LAB Sayak Mallick
4 pages
DBM E-Budget Lgu360
100% (3)
DBM E-Budget Lgu360
8 pages
IOT Proj - Report 1
No ratings yet
IOT Proj - Report 1
31 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
A Practical Strategy and Workflow For Large Projects
No ratings yet
A Practical Strategy and Workflow For Large Projects
9 pages
NLP Final
No ratings yet
NLP Final
26 pages
Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
Natural Language Processing
No ratings yet
Natural Language Processing
22 pages
Cloud Computing, Seminar Report
78% (9)
Cloud Computing, Seminar Report
17 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet