0% found this document useful (0 votes)
59 views4 pages

Machine Learning NLP LAB Sayak Mallick

This document contains code for two machine learning tasks: sentiment analysis and n-gram modeling. For sentiment analysis, the code imports data, cleans text by removing punctuation and stopwords, encodes sentiment labels, and visualizes the distribution of labels. For n-gram modeling, the code defines a function to generate n-grams from text, constructs n-grams of size 3 from a sample text by adding delimiter characters, strips whitespace and returns a list of n-grams without space. It then loops through words to generate n-grams for each word.

Uploaded by

Sayak Mallick
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views4 pages

Machine Learning NLP LAB Sayak Mallick

This document contains code for two machine learning tasks: sentiment analysis and n-gram modeling. For sentiment analysis, the code imports data, cleans text by removing punctuation and stopwords, encodes sentiment labels, and visualizes the distribution of labels. For n-gram modeling, the code defines a function to generate n-grams from text, constructs n-grams of size 3 from a sample text by adding delimiter characters, strips whitespace and returns a list of n-grams without space. It then loops through words to generate n-grams for each word.

Uploaded by

Sayak Mallick
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Machine Learning AND NLP LAB Examinations

Name:- Sayak Mallick


ID :- 191001111025
Stream :- B.SC IT (Machine Learning)

1. Sentiment Analysis
import re
from matplotlib import rcParams
from nltk.stem import WordNetLemmatizer
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from nltk.corpus import stopwords
from wordcloud import WordCloud

df_train=pd.read_csv("train.txt",delimiter=';',names=['text','label'])

df_val=pd.read_csv("val.txt",delimiter=';',names=['text','label'])
print(df_val)

df=pd.concat([df_train, df_val])
df.reset_index(inplace=True,drop=True)
print(df)
print("Shape of the Data frame: ",df.shape)
print(df.sample(5))

sns.countplot(df.label, data=df)
plt.show()

def custom_encoder(df):
df.replace(to_replace ="surprise", value =1, inplace=True)
df.replace(to_replace ="love", value =1, inplace=True)
df.replace(to_replace ="joy", value =1, inplace=True)
df.replace(to_replace ="fear", value =0, inplace=True)
df.replace(to_replace ="anger", value =0, inplace=True)
df.replace(to_replace ="sadness", value =0, inplace=True)

#driver code for custom_encoder


custom_encoder(df['label'])
sns.countplot(df.label,data=df)
plt.show()

import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from pylab import rcParams

lm=WordNetLemmatizer()
def text_transformation(df_col):
corpus = []
for item in df_col:
new_item = re.sub('[^a-zA-Z]',' ',str(item))
new_item = new_item.lower()
new_item = new_item.split()
new_item = [lm.lemmatize(word) for word in new_item if word not in
set(stopwords.words('english'))]
corpus.append(' '.join(str(x) for x in new_item))
return corpus

corpus=text_transformation(df['text'])
rcParams['figure.figsize'] = 20,8
word_cloud=""
for row in corpus:
for word in row:
word_cloud += " ".join(word)
wordcloud=WordCloud(width=1000,height=500,
background_color='white',min_font_size=10).generate(word_cloud)
plt.imshow(wordcloud)

2. ngram program
from nltk import ngrams
import numpy
def remove(string):
return string.replace(" ", "")
vocab = "Today is a good day to learn natural language proccesing"
print("Sample Document - ",vocab)

#contructing lexicon
lex = vocab.split(" ")
lex
spaced = ' '
for i in lex[0]:
spaced = spaced + i + " "
spaced = "$ " + spaced + " $"
n=3
ngrams_ = ngrams(spaced.split(), n)
ngram_list = []
for i in ngrams_:
ngram_list.append((''.join([w + ' ' for w in i])).strip())
for i in range(len(ngram_list)):
ngram_list[i] = remove(ngram_list[i])
ngram_list

ngram_list = []
for word in lex:
spaced = ' '
for i in word:
spaced = spaced + i + " "
spaced = "$ " + spaced + " $"
n=3
ngrams_ = ngrams(spaced.split(), n)
l = []
for i in ngrams_:
l.append((''.join([w + ' ' for w in i])).strip())
for i in range(len(l)):
l[i] = remove(l[i])
ngram_list.append(l)
ngram_list

You might also like