0% found this document useful (0 votes)
24 views

Workshop - NLP - Ipynb - Colaboratory

The document shows how to build a sentiment analysis model using scikit-learn and NLP techniques like tokenization, lemmatization and CountVectorizer. It loads data, encodes labels, trains a random forest classifier on the transformed text data and evaluates the model on test data.

Uploaded by

andy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Workshop - NLP - Ipynb - Colaboratory

The document shows how to build a sentiment analysis model using scikit-learn and NLP techniques like tokenization, lemmatization and CountVectorizer. It loads data, encodes labels, trains a random forest classifier on the transformed text data and evaluates the model on test data.

Uploaded by

andy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

#import all necessary libraries

!pip install scikit-plot


from scikitplot.metrics import plot_confusion_matrix
import torch
import pandas as pd
import seaborn as sns
import re
import nltk
nltk.download('wordnet')
nltk.download('stopwords')
nltk.download('omw-1.4')
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score,precision_score,recall_score,confusion_matrix,c

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/p


Collecting scikit-plot
Downloading scikit_plot-0.3.7-py3-none-any.whl (33 kB)
Requirement already satisfied: scipy>=0.9 in /usr/local/lib/python3.7/dist-packages (
Requirement already satisfied: matplotlib>=1.4.0 in /usr/local/lib/python3.7/dist-pac
Requirement already satisfied: joblib>=0.10 in /usr/local/lib/python3.7/dist-packages
Requirement already satisfied: scikit-learn>=0.18 in /usr/local/lib/python3.7/dist-pa
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local
Requirement already satisfied: numpy>=1.11 in /usr/local/lib/python3.7/dist-packages
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-pac
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-pac
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (fr
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.7/dist-
Installing collected packages: scikit-plot
Successfully installed scikit-plot-0.3.7
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data] Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')


device

device(type='cuda')

df = pd.read_csv("/content/train.txt",delimiter=';',names=['text','label'])
df
text label

0 i didnt feel humiliated sadness

1 i can go from feeling so hopeless to so damned... sadness

2 im grabbing a minute to post i feel greedy wrong anger

3 i am ever feeling nostalgic about the fireplac... love

4 i am feeling grouchy anger

... ... ...

17995 im having ssa examination tomorrow in the morn... sadness

17996 i constantly worry about their fight against n... joy

17997 i feel its important to share this info for th... joy
sns.countplot(df['label'])
17998 i truly feel that if you are passionate enough... joy
# sns.countplot(df.label)
17999 i feel like i just wanna buy any cute make up ... joy
/usr/local/lib/python3.7/dist-packages/seaborn/_decorators.py:43: FutureWarning: Pass
18000 rows × 2 columns
FutureWarning
<matplotlib.axes._subplots.AxesSubplot at 0x7f5baa2f20d0>

df.label.unique()

array(['sadness', 'anger', 'love', 'surprise', 'fear', 'joy'],


dtype=object)

# def custom_encoder(df):
# df = df.replace(to_replace ="surprise", value =1)
# df = df.replace(to_replace ="love", value =1)
# df = df.replace(to_replace ="joy", value =1)
# df = df.replace(to_replace ="fear", value =0)
# df = df.replace(to_replace ="anger", value =0)
# df = df.replace(to_replace ="sadness", value =0)
# return df

# df['label'] = custom_encoder(df['label'])
def custom_encoder(df):
df.replace(to_replace ="surprise", value =1, inplace=True)
df.replace(to_replace ="love", value =1, inplace=True)
df.replace(to_replace ="joy", value =1, inplace=True)
df.replace(to_replace ="fear", value =0, inplace=True)
df.replace(to_replace ="anger", value =0, inplace=True)
df.replace(to_replace ="sadness", value =0, inplace=True)

custom_encoder(df['label'])

sns.countplot(df.label)

/usr/local/lib/python3.7/dist-packages/seaborn/_decorators.py:43: FutureWarning: Pass


FutureWarning
<matplotlib.axes._subplots.AxesSubplot at 0x7f5baa656590>

lm = WordNetLemmatizer()

stops = set(stopwords.words('english'))
# stops

def text_transformation(df_col):
corpus = []
for item in df_col:
new_item = re.sub('[^a-zA-Z]',' ',str(item)) #get rid of symbols($, Rs.) and only
new_item = new_item.lower()
new_item = new_item.split()
new_item = [lm.lemmatize(word) for word in new_item if word not in stops]
corpus.append(' '.join(str(x) for x in new_item))
return corpus

corpus = text_transformation(df['text'])

cv = CountVectorizer(ngram_range=(1,2))
traindata = cv.fit_transform(corpus)
X = traindata
y = df.label

New speaker: Aditya Shah

rfc = RandomForestClassifier(max_features='auto',
max_depth=None,
n_estimators=500,
min_samples_split=5,
min_samples_leaf=1)
rfc.fit(X,y)

RandomForestClassifier(min_samples_split=5, n_estimators=500)

test_df = pd.read_csv('/content/test.txt',delimiter=';',names=['text','label'])

X_test,y_test = test_df.text,test_df.label
#encode the labels into two classes , 0 and 1
test_df = custom_encoder(y_test)
#pre-processing of text
test_corpus = text_transformation(X_test)
#convert text data into vectors
testdata = cv.transform(test_corpus)
#predict the target
predictions = rfc.predict(testdata)

plot_confusion_matrix(y_test,predictions)
acc_score = accuracy_score(y_test,predictions)
pre_score = precision_score(y_test,predictions)
rec_score = recall_score(y_test,predictions)
print('Accuracy_score: ',acc_score)
print('Precision_score: ',pre_score)
print('Recall_score: ',rec_score)
print("-"*50)
cr = classification_report(y_test,predictions)
print(cr)
Accuracy_score: 0.9615
Precision_score: 0.9616648411829135
Recall_score: 0.9543478260869566
--------------------------------------------------
precision recall f1-score support

0 0.96 0.97 0.96 1080


1 0.96 0.95 0.96 920

accuracy 0.96 2000


macro avg 0.96 0.96 0.96 2000
weighted avg 0.96 0.96 0.96 2000

def expression_check(prediction_input):
if prediction_input == 0:
print("Input statement has Negative Sentiment.")
elif prediction_input == 1:
print("Input statement has Positive Sentiment.")
else:
print("Invalid Statement.")

# function to take the input statement and perform the same transformations we did earlier
def sentiment_predictor(input):
input = text_transformation(input)
transformed_input = cv.transform(input)
prediction = rfc.predict(transformed_input)
expression_check(prediction)

input1 = ["Worst laptop I have ever seen"]


input2 = ["Synapse is good"]

sentiment_predictor(input1)
sentiment_predictor(input2)

Input statement has Negative Sentiment.


Input statement has Positive Sentiment.

You might also like