0% found this document useful (0 votes)

24 views

Workshop - NLP - Ipynb - Colaboratory

The document shows how to build a sentiment analysis model using scikit-learn and NLP techniques like tokenization, lemmatization and CountVectorizer. It loads data, encodes labels, trains a random forest classifier on the transformed text data and evaluates the model on test data.

Uploaded by

andy

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

Workshop - NLP - Ipynb - Colaboratory

Uploaded by

andy

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

#import all necessary libraries

!pip install scikit-plot

from scikitplot.metrics import plot_confusion_matrix
import torch
import pandas as pd
import seaborn as sns
import re
import nltk
nltk.download('wordnet')
nltk.download('stopwords')
nltk.download('omw-1.4')
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score,precision_score,recall_score,confusion_matrix,c

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/p

Collecting scikit-plot
Downloading scikit_plot-0.3.7-py3-none-any.whl (33 kB)
Requirement already satisfied: scipy>=0.9 in /usr/local/lib/python3.7/dist-packages (
Requirement already satisfied: matplotlib>=1.4.0 in /usr/local/lib/python3.7/dist-pac
Requirement already satisfied: joblib>=0.10 in /usr/local/lib/python3.7/dist-packages
Requirement already satisfied: scikit-learn>=0.18 in /usr/local/lib/python3.7/dist-pa
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local
Requirement already satisfied: numpy>=1.11 in /usr/local/lib/python3.7/dist-packages
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-pac
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-pac
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (fr
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.7/dist-
Installing collected packages: scikit-plot
Successfully installed scikit-plot-0.3.7
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data] Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

device

device(type='cuda')

df = pd.read_csv("/content/train.txt",delimiter=';',names=['text','label'])
df
text label

0 i didnt feel humiliated sadness

1 i can go from feeling so hopeless to so damned... sadness

2 im grabbing a minute to post i feel greedy wrong anger

3 i am ever feeling nostalgic about the fireplac... love

4 i am feeling grouchy anger

... ... ...

17995 im having ssa examination tomorrow in the morn... sadness

17996 i constantly worry about their fight against n... joy

17997 i feel its important to share this info for th... joy
sns.countplot(df['label'])
17998 i truly feel that if you are passionate enough... joy
# sns.countplot(df.label)
17999 i feel like i just wanna buy any cute make up ... joy
/usr/local/lib/python3.7/dist-packages/seaborn/_decorators.py:43: FutureWarning: Pass
18000 rows × 2 columns
FutureWarning
<matplotlib.axes._subplots.AxesSubplot at 0x7f5baa2f20d0>

df.label.unique()

array(['sadness', 'anger', 'love', 'surprise', 'fear', 'joy'],

dtype=object)

# def custom_encoder(df):
# df = df.replace(to_replace ="surprise", value =1)
# df = df.replace(to_replace ="love", value =1)
# df = df.replace(to_replace ="joy", value =1)
# df = df.replace(to_replace ="fear", value =0)
# df = df.replace(to_replace ="anger", value =0)
# df = df.replace(to_replace ="sadness", value =0)
# return df

# df['label'] = custom_encoder(df['label'])
def custom_encoder(df):
df.replace(to_replace ="surprise", value =1, inplace=True)
df.replace(to_replace ="love", value =1, inplace=True)
df.replace(to_replace ="joy", value =1, inplace=True)
df.replace(to_replace ="fear", value =0, inplace=True)
df.replace(to_replace ="anger", value =0, inplace=True)
df.replace(to_replace ="sadness", value =0, inplace=True)

custom_encoder(df['label'])

sns.countplot(df.label)

/usr/local/lib/python3.7/dist-packages/seaborn/_decorators.py:43: FutureWarning: Pass

FutureWarning
<matplotlib.axes._subplots.AxesSubplot at 0x7f5baa656590>

lm = WordNetLemmatizer()

stops = set(stopwords.words('english'))
# stops

def text_transformation(df_col):
corpus = []
for item in df_col:
new_item = re.sub('[^a-zA-Z]',' ',str(item)) #get rid of symbols($, Rs.) and only
new_item = new_item.lower()
new_item = new_item.split()
new_item = [lm.lemmatize(word) for word in new_item if word not in stops]
corpus.append(' '.join(str(x) for x in new_item))
return corpus

corpus = text_transformation(df['text'])

cv = CountVectorizer(ngram_range=(1,2))
traindata = cv.fit_transform(corpus)
X = traindata
y = df.label

New speaker: Aditya Shah

rfc = RandomForestClassifier(max_features='auto',
max_depth=None,
n_estimators=500,
min_samples_split=5,
min_samples_leaf=1)
rfc.fit(X,y)

RandomForestClassifier(min_samples_split=5, n_estimators=500)

test_df = pd.read_csv('/content/test.txt',delimiter=';',names=['text','label'])

X_test,y_test = test_df.text,test_df.label
#encode the labels into two classes , 0 and 1
test_df = custom_encoder(y_test)
#pre-processing of text
test_corpus = text_transformation(X_test)
#convert text data into vectors
testdata = cv.transform(test_corpus)
#predict the target
predictions = rfc.predict(testdata)

plot_confusion_matrix(y_test,predictions)
acc_score = accuracy_score(y_test,predictions)
pre_score = precision_score(y_test,predictions)
rec_score = recall_score(y_test,predictions)
print('Accuracy_score: ',acc_score)
print('Precision_score: ',pre_score)
print('Recall_score: ',rec_score)
print("-"*50)
cr = classification_report(y_test,predictions)
print(cr)
Accuracy_score: 0.9615
Precision_score: 0.9616648411829135
Recall_score: 0.9543478260869566
--------------------------------------------------
precision recall f1-score support

0 0.96 0.97 0.96 1080

1 0.96 0.95 0.96 920

accuracy 0.96 2000

macro avg 0.96 0.96 0.96 2000
weighted avg 0.96 0.96 0.96 2000

def expression_check(prediction_input):
if prediction_input == 0:
print("Input statement has Negative Sentiment.")
elif prediction_input == 1:
print("Input statement has Positive Sentiment.")
else:
print("Invalid Statement.")

# function to take the input statement and perform the same transformations we did earlier
def sentiment_predictor(input):
input = text_transformation(input)
transformed_input = cv.transform(input)
prediction = rfc.predict(transformed_input)
expression_check(prediction)

input1 = ["Worst laptop I have ever seen"]

input2 = ["Synapse is good"]

sentiment_predictor(input1)
sentiment_predictor(input2)

Input statement has Negative Sentiment.

Input statement has Positive Sentiment.

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6387)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (634)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1160)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (983)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4/5 (8302)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (633)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1254)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (933)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4/5 (10337)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (887)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1007)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4/5 (3237)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (581)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (297)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (5058)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4346)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (458)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Tóibín
3.5/5 (2091)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (1993)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (278)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2283)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1077)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2780)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2032)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2838)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (141)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (692)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (1912)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4086)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (76)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (830)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (906)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (143)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2544)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M L Stedman
4.5/5 (813)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (277)

Workshop - NLP - Ipynb - Colaboratory

Uploaded by

Workshop - NLP - Ipynb - Colaboratory

Uploaded by

#import all necessary libraries

!pip install scikit-plot

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/p

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

0 i didnt feel humiliated sadness

1 i can go from feeling so hopeless to so damned... sadness

2 im grabbing a minute to post i feel greedy wrong anger

3 i am ever feeling nostalgic about the fireplac... love

4 i am feeling grouchy anger

... ... ...

17995 im having ssa examination tomorrow in the morn... sadness

17996 i constantly worry about their fight against n... joy

array(['sadness', 'anger', 'love', 'surprise', 'fear', 'joy'],

/usr/local/lib/python3.7/dist-packages/seaborn/_decorators.py:43: FutureWarning: Pass

New speaker: Aditya Shah

0 0.96 0.97 0.96 1080

accuracy 0.96 2000

input1 = ["Worst laptop I have ever seen"]

Input statement has Negative Sentiment.

You might also like