0% found this document useful (0 votes)

19 views

Amazon Assignment Ex

The document discusses performing sentiment analysis on Amazon fine food reviews. It describes the dataset containing over 500,000 reviews with 10 features each. It then performs preprocessing steps like filtering reviews with a score of 3, partitioning scores above 3 as positive and below 3 as negative, dropping duplicate entries, and transforming the text into count vectors for analysis.

Uploaded by

Ram R

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

Amazon Assignment Ex

Uploaded by

Ram R

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Sentiment Classification on Amazon Fine Food Reviews

Objective : Objective of this exercise is to perform the sentiment analysis on Amazon Fine Food Reviews.

Data Analysis : Amazon Fine Food Review is a large data set with around 568K reviews given by the customers on the Amazon food
products. Each review has got the following 10 features.

1. ID
2. ProductId - Unique Identifier on the product.
3. UserId - Unique Identifier for the user.
4. ProfileName
5. HelpfulnessNumerator - Number of users who found the review helpful.
6. HelpfulnessDenominator - Number of users who indicated whether the review they found helpful.
7. Score - Rating between 1 to 5.
8. Time - Timestamp for the review.
9. Summary - Brief summary of the review.
10. Text - Text of the review.

In [4]:
import os
os.getcwd()
os.chdir(r"C:\Users\Sujatha\Applied AI Course\Ramesh\Data")
os.getcwd()

Out[4]:
'C:\\Users\\Sujatha\\Applied AI Course\\Ramesh\\Data'

In [2]:
%matplotlib inline

import sqlite3
import pandas as pd
import numpy as np
import nltk
import string
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.metrics import confusion_matrix
from sklearn import metrics
from sklearn.metrics import roc_curve, auc
from nltk.stem.porter import PorterStemmer

In [3]:

#Reading the SQL data using SQLite

con = sqlite3.connect('./amazon-fine-food-reviews/database.sqlite')

filtered_data = pd.read_sql_query(""" SELECT * FROM Reviews where Score != 3 """, con)

filtered_data = filtered_data[0:3500]

In [4]:

filtered_data.head()

Out[4]:

Id ProductId UserId ProfileName HelpfulnessNumerator HelpfulnessDenominator Score Time Summary

Id ProductId UserId ProfileName HelpfulnessNumerator HelpfulnessDenominator Score Time Summary
Id ProductId UserId ProfileName HelpfulnessNumerator HelpfulnessDenominator Score Time Summary

Good
0 1 B001E4KFG0 A3SGXH7AUHU8GW delmartian 1 1 5 1303862400 Quality
Dog Food

Not as
1 2 B00813GRG4 A1D87F6ZCVE5NK dll pa 0 0 1 1346976000
Advertised

Natalia
Corres "Delight"
2 3 B000LQOCH0 ABXLMWJIXXAIN 1 1 4 1219017600
"Natalia says it all
Corres"

Cough
3 4 B000UA0QIQ A395BORC6FGVXV Karl 3 3 2 1307923200
Medicine

Michael D.
4 5 B006K2ZZ7K A1UQRSCLF8GW1T Bigham "M. 0 0 5 1350777600 Great taffy
Wassir"

I limited my data to be 3500 reviews only as my machine configuration is small and is not supporting for large data.

In [5]:
filtered_data = filtered_data[0:3500]
filtered_data.shape

Out[5]:
(3500, 10)

The Score above 3 is taken as Positive and below 3 is taken as negative and equal to 3 is neglected.

In [6]:

def partition(y):
if y > 3:
return "postive"
return "negative"

In [7]:
actualscore = filtered_data["Score"]
positivenegative = actualscore.map(partition)
filtered_data['Score'] = positivenegative

In [8]:
filtered_data.shape

Out[8]:
(3500, 10)

In [9]:
filtered_data.head()

Out[9]:

Id ProductId UserId ProfileName HelpfulnessNumerator HelpfulnessDenominator Score Time

Good
0 1 B001E4KFG0 A3SGXH7AUHU8GW delmartian 1 1 postive 1303862400 Quality
Dog Foo

Not as
1 2 B00813GRG4 A1D87F6ZCVE5NK dll pa 0 0 negative 1346976000
Advertis

Natalia
Corres "Delight"
2 3 B000LQOCH0 ABXLMWJIXXAIN 1 1 postive 1219017600
"Natalia says it a
Corres"

Cough
3 4 B000UA0QIQ A395BORC6FGVXV Karl 3 3 negative 1307923200
Medicine

Michael D.
4 5 B006K2ZZ7K A1UQRSCLF8GW1T Bigham "M. 0 0 postive 1350777600 Great ta
Wassir"

In [10]:
display = pd.read_sql_query("""SELECT * FROM Reviews where score!=3 and UserId like '%AR5J8U%' ord
er by ProductID""",con)

In [11]:
display.head()

Out[11]:

Id ProductId UserId ProfileName HelpfulnessNumerator HelpfulnessDenominator Score Time

Geetha
0 78445 B000HDL1RQ AR5J8UI46CURR 2 2 5 1199577600
Krishnan

Geetha
1 138317 B000HDOPYC AR5J8UI46CURR 2 2 5 1199577600
Krishnan
Krishnan
Id ProductId UserId ProfileName HelpfulnessNumerator HelpfulnessDenominator Score Time

Geetha
2 138277 B000HDOPYM AR5J8UI46CURR 2 2 5 1199577600
Krishnan

Geetha
3 73791 B000HDOPZG AR5J8UI46CURR 2 2 5 1199577600
Krishnan

Geetha
4 155049 B000PAQ75C AR5J8UI46CURR 2 2 5 1199577600
Krishnan

In [12]:
sorted_data = filtered_data.sort_values("ProductId",axis = 0,ascending = True)

In [13]:
#Duplication of entries.
final = sorted_data.drop_duplicates(subset = {"UserId","ProfileName","Time",'Text'}, keep = "first"
,inplace = False)

In [14]:
final.shape

Out[14]:
(3491, 10)

In [15]:

(final['Id'].size*1.0)/(filtered_data['Id'].size*1.0)

Out[15]:
0.9974285714285714

In [16]:

final = final[final.HelpfulnessNumerator <= final.HelpfulnessDenominator]

final.shape

Out[16]:
(3491, 10)

In [17]:
final['Score'].value_counts()

Out[17]:
postive 2909
negative 582
Name: Score, dtype: int64

In [18]:
count_vect = CountVectorizer()
final_counts = count_vect.fit_transform(final['Text'].values)

In [19]:
final_counts.shape

Out[19]:
(3491, 11010)

In [20]:
import re
i=0
for sent in final['Text'].values:
if(len(re.findall("<.*?",sent))):
print(i)
print(sent)
break;
i+=1

0
Why is this $[...] when the same product is available for $[...] here? http://www.amazon.com/VICTOR-FLY-MAGNET-BAIT-REFILL/dp/B00004RBDY The Victor M380 and
M502 traps are unreal, of course -- total fly genocide. Pretty stinky, but only right nearby.

In [21]:
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to

[nltk_data] C:\Users\Sujatha\AppData\Roaming\nltk_data...
[nltk_data] Package stopwords is already up-to-date!

Out[21]:
True

In [22]:

import string
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
from nltk.stem.wordnet import WordNetLemmatizer

stop = set(stopwords.words('english')) #set of stopwords

sno = nltk.stem.SnowballStemmer('english') #initialising the snowball stemmer

print(stop)

{'their', 'own', 'just', "wasn't", 'hadn', 'has', 'each', 'i', 'by', "haven't", 'again',
'yourselves', "hasn't", "she's", 's', 'why', "should've", 'will', 'ma', 'into', 'him', 'its', 'are
', 'off', 'd', 'his', 'where', 'mustn', 'didn', 'an', 'both', "that'll", 'doesn', 'such', 'were',
'with', 'ain', "aren't", 'do', 'himself', 'being', 'is', 'any', 'our', 'was', 'on', 'needn', 'for'
, 'which', 'these', 'few', 'between', "wouldn't", 'ours', 'all', 'they', 'or', 'very', 'if',
'now', 'doing', 'other', 'after', 'won', "weren't", "you've", 'ourselves', 'have', 'yours', 'then'
, 'whom', "mightn't", 'he', 'hers', "hadn't", 'what', 'she', 'those', 'yourself', 'in', 'but', 'as
', 'above', 'while', 'below', "you'd", 'from', 'most', 'up', 'me', 'when', 'themselves', 'wasn', "
didn't", 'myself', 'be', 'her', 't', 'because', 'itself', 'o', 'your', 're', 'haven', 'how',
'not', 'shan', 'this', 'can', 'same', 'm', 'theirs', 'been', 'more', 'some', 'shouldn', 'my',
'during', "shouldn't", 'so', "couldn't", 'we', 'mightn', 'hasn', 'through', 'wouldn', 'until', "yo
u're", "you'll", "it's", 'out', 'should', 'that', 'only', 'weren', 'a', 'once', "mustn't", 'aren',
'having', "needn't", 'it', 'herself', 've', "shan't", 'before', 'who', "won't", "isn't", 'and', 't
he', 'to', 'nor', 'than', 'against', 'under', 'over', 'had', 'y', 'them', 'll', 'at', 'about', 'th
ere', 'don', 'am', 'you', "doesn't", 'further', 'no', 'down', 'too', 'isn', 'here', 'couldn',
'of', 'did', "don't", 'does'}

Cleaning the data like special characters,html tags etc..

In [23]:
def cleanhtml(sentence):
cleanr = re.compile('<.*?')
cleantext = re.sub(cleanr, ' ', sentence)
return cleantext

def cleanpunc(sentence):
cleaned = re.sub(r'[?|!|\|"|#]',' ',sentence)
cleaned = re.sub(r'[.|,|)|(|\|/]',' ',cleaned)
return cleaned

print(sno.stem('tasty'))

tasti

In [24]:
i=0
strl = ' '
final_string = []
all_positive_words = [ ]
all_negative_words = [ ]
s = ' '
for sent in final['Text'].values:
filtered_sentence = []
sent = cleanhtml(sent)
for w in sent.split():
for cleaned_words in cleanpunc(w).split():
if(((cleaned_words.isalpha()& (len(cleaned_words)>2)))):
if(cleaned_words.lower() not in stop):
s = (sno.stem((cleaned_words.lower()))).encode('utf8')
filtered_sentence.append(s)
if(final['Score'].values)[i]== 'positive':
all_positive_words.append(s)
if(final['Score'].values)[i]== 'negative':
all_negative_words.append(s)
else:
continue
else:
continue
strl = b" ".join(filtered_sentence)
final_string.append(strl)
i+=1

In [25]:
final['CleanedText'] = final_string

final.head()

Out[25]:

Id ProductId UserId ProfileName HelpfulnessNumerator HelpfulnessDenominator Score Time

2546 2774 B00002NCJC A196AJHU9EASJN Alex Chaffee 0 0 postive 1282953600

2547 2775 B00002NCJC A13RRPGE79XFFH reader48 0 0 postive 1281052800

Id B00002Z754
1145 1244 ProductId A3B8RCEI0FXFI6
UserId ProfileName
B G Chase HelpfulnessNumerator
10 HelpfulnessDenominator
10 Score 962236800
postive Time

1146 1245 B00002Z754 A29Z5PI9BW2PU3 Robbie 7 7 postive 961718400

Glenna E.
Bauer
2941 3203 B000084DVR A3DKGXWUEP1AI2 3 3 postive 1163030400
"Puppy
Mum"

In [26]:

con = sqlite3.connect('final_sqlite')
c = con.cursor()
con.text_factory = str
final.to_sql('Reviews',con, if_exists='replace', index=True, index_label=None, chunksize=None,
dtype=None)

Bigram
In [27]:

#Bigrams
count_vect = CountVectorizer(ngram_range = (1,2))
final_bigram_counts = count_vect.fit_transform(final['Text'].values)
final_bigram_counts.get_shape()

Out[27]:
(3491, 111023)

In [30]:
#t-SNE for Bigrams data.
from sklearn.manifold import TSNE
import seaborn as sn

model = TSNE(n_components = 2, random_state = 0)

data_3500 = final_bigram_counts[0:4500,:]
data_3500 = data_3500.toarray()
lbl1 = final['Score']
lbl1 = lbl1[0:4500]
tsne_data =model.fit_transform(data_3500)
tsne_data = np.vstack((tsne_data.T, lbl1)).T

tsne_df = pd.DataFrame(data=tsne_data, columns=("Dim_1", "Dim_2", "lbl1"))

sn.FacetGrid(tsne_df, hue="lbl1", size=6).map(plt.scatter, 'Dim_1', 'Dim_2').add_legend()
plt.show()
TF IDF
In [31]:

#TF IDF

tf_idf_vect = TfidfVectorizer(ngram_range = (1,2))

final_tf_idf = tf_idf_vect.fit_transform(final['Text'])
final_tf_idf.shape

Out[31]:
(3491, 111023)

In [32]:

#t-SNE for TF IDF data.

from sklearn.manifold import TSNE
import seaborn as sn
model = TSNE(n_components = 2, random_state = 0)
data_3500 = final_tf_idf[0:3000,:]
data_3500 = data_3500.toarray()
lbl1 = final['Score']
lbl1 = lbl1[0:3000]
tsne_data =model.fit_transform(data_3500)
tsne_data = np.vstack((tsne_data.T, lbl1)).T
#print(lbl1.shape)
#print(data_3500.shape)
#print(tsne_data.shape)
tsne_df = pd.DataFrame(data=tsne_data, columns=("Dim_1", "Dim_2", "lbl1"))
sn.FacetGrid(tsne_df, hue="lbl1", size=6).map(plt.scatter, 'Dim_1', 'Dim_2').add_legend()
plt.show()
In [33]:

import gensim
i= 0
list_of_sent = []
for sent in final['Text']:
filtered_sentence = []
sent = cleanhtml(sent)
for w in sent.split():
for cleaned_words in cleanpunc(w).split():
if(cleaned_words.isalpha()):
filtered_sentence.append(cleaned_words.lower())
else:
continue
list_of_sent.append(filtered_sentence)
print(final['Text'].values[0])
print(list_of_sent[0])

C:\Users\Sujatha\Anaconda3\lib\site-packages\gensim\utils.py:1197: UserWarning: detected Windows;

aliasing chunkize to chunkize_serial
warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")

Why is this $[...] when the same product is available for $[...] here? http://www.amazon.com/VICTOR-FLY-MAGNET-BAIT-REFILL/dp/B00004RBDY The Victor M380 and
M502 traps are unreal, of course -- total fly genocide. Pretty stinky, but only right nearby.
['why', 'is', 'this', 'when', 'the', 'same', 'product', 'is', 'available', 'for', 'here', 'br',
'www', 'amazon', 'com', 'dp', 'br', 'br', 'victor', 'and', 'traps', 'are', 'unreal', 'of',
'course', 'total', 'fly', 'genocide', 'pretty', 'stinky', 'but', 'only', 'right', 'nearby']

Word2Vec
In [34]:

from gensim.models import Word2Vec

w2v_model = gensim.models.Word2Vec(list_of_sent,min_count = 5,size = 30,workers =4)
w2v_model

Out[34]:
<gensim.models.word2vec.Word2Vec at 0x1e5e6dd5b38>

In [35]:

words=list(w2v_model.wv.vocab)
print(len(words))

10307

In [36]:
from sklearn.manifold import TSNE
import seaborn as sn

#words1 = gensim.W2VTransformer(final['Text'])
model = TSNE(n_components = 2, random_state = 0)

count_vect = CountVectorizer()
w2v = count_vect.fit_transform(words)
#w2v.get_shape()
#data_3500 = w2v[0:50,:]
#data_3500 = data_3500.toarray()

w2v.shape

Out[36]:
(10307, 10281)

In [42]:
In [42]:
#t-SNE for Word2Vec data.
from sklearn.manifold import TSNE
import seaborn as sn

model = TSNE(n_components = 2, random_state = 0)

data_3500 = w2v[0:3200,:]
data_3500 = data_3500.toarray()

lbl1 = final['Score']
lbl1 = lbl1[0:3200]

tsne_data =model.fit_transform(data_3500)
#print(tsne_data.shape)
tsne_data = np.vstack((tsne_data.T, lbl1)).T

#print(lbl1.shape)
#print(data_3500.shape)
#print(tsne_data.shape )

tsne_df = pd.DataFrame(data=tsne_data, columns=("Dim_1", "Dim_2", "lbl1"))

sn.FacetGrid(tsne_df, hue="lbl1", size=6).map(plt.scatter, 'Dim_1', 'Dim_2').add_legend()
plt.show()

In [ ]:
#list_of_sent = list_of_sent[0:10000]
#list_of_sent.size

Average Word2Vec
In [43]:

sent_vectors = []; # the avg-w2v for each sentence/review is stored in this list
for sent in list_of_sent: # for each review/sentence
#print(sent)
sent_vec = np.zeros(30) # as word vectors are of zero length
cnt_words =0; # num of words with a valid vector in the sentence/review
for word in sent:# for each word in a review/sentence
try:
vec = w2v_model.wv[word]
sent_vec1 = sent_vec + vec
cnt_words += 1
except:
pass
sent_vec /= cnt_words
sent_vec /= cnt_words
sent_vectors.append(sent_vec)

In [51]:

from sklearn.manifold import TSNE

import seaborn as sn

model = TSNE(n_components = 2, random_state = 0)

#data_3500 = w2v[0:2700,:]
data_35 = sent_vectors[0:3200]

lbl1 = final['Score']
lbl1 = lbl1[0:3200]

tsne_data =model.fit_transform(data_35)
#print(tsne_data.shape)
tsne_data = np.vstack((tsne_data.T, lbl1)).T
print(tsne_data.shape)

tsne_df = pd.DataFrame(data=tsne_data, columns=("Dim_1", "Dim_2", "lbl1"))

sn.FacetGrid(tsne_df, hue="lbl1", size=6).map(plt.scatter, 'Dim_1', 'Dim_2').add_legend()
plt.show()

(3200, 3)

Software Testing Principles and Practices
100% (1)
Software Testing Principles and Practices
674 pages
Glove
100% (1)
Glove
10 pages
Functional Specification Document Template
No ratings yet
Functional Specification Document Template
10 pages
Amazon Food Review Notes
No ratings yet
Amazon Food Review Notes
37 pages
03 Amazon Fine Food Reviews Analysis - KNN
No ratings yet
03 Amazon Fine Food Reviews Analysis - KNN
71 pages
28.1 - 28.16 Real World Problem - Predict Rating Given Product Reviews On Amazon
No ratings yet
28.1 - 28.16 Real World Problem - Predict Rating Given Product Reviews On Amazon
19 pages
11 Amazon Fine Food Reviews Analysis - Truncated SVD - WIP - Jupyter Notebook
No ratings yet
11 Amazon Fine Food Reviews Analysis - Truncated SVD - WIP - Jupyter Notebook
22 pages
T SNE Visualization of Amazon Reviews With Polarity Based Color Coding+
No ratings yet
T SNE Visualization of Amazon Reviews With Polarity Based Color Coding+
29 pages
Text Analytics - Capstone Project
No ratings yet
Text Analytics - Capstone Project
19 pages
Beginner's Guide To Data Cleaning and Feature Extraction in NLP - by Enes Gokce - Towards Data Science
No ratings yet
Beginner's Guide To Data Cleaning and Feature Extraction in NLP - by Enes Gokce - Towards Data Science
20 pages
Sentiment Analysis On Amazon Fine Food Reviews by Using Linear Machine Learning Models
No ratings yet
Sentiment Analysis On Amazon Fine Food Reviews by Using Linear Machine Learning Models
6 pages
British_Airways_Forage_Report
No ratings yet
British_Airways_Forage_Report
12 pages
basenlp
No ratings yet
basenlp
5 pages
9 Feature Engineering Text Data
No ratings yet
9 Feature Engineering Text Data
7 pages
Apply Naive Bayes To Amazon Reviews (M)
No ratings yet
Apply Naive Bayes To Amazon Reviews (M)
6 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
4 pages
Natural Language Processing
No ratings yet
Natural Language Processing
22 pages
Aped For Fake News
No ratings yet
Aped For Fake News
6 pages
Adithiyaa BR 23MBA0018 SMA DA Text Mining PDF
No ratings yet
Adithiyaa BR 23MBA0018 SMA DA Text Mining PDF
6 pages
NLP LAB_MANUAL (1)
No ratings yet
NLP LAB_MANUAL (1)
33 pages
Information Retrival
No ratings yet
Information Retrival
43 pages
1_5089492269589857342(1)
No ratings yet
1_5089492269589857342(1)
7 pages
Untitled28.ipynb - Colaboratory
No ratings yet
Untitled28.ipynb - Colaboratory
16 pages
Q 3
No ratings yet
Q 3
2 pages
Ir Practical Manual 2
No ratings yet
Ir Practical Manual 2
24 pages
AI Phash3
No ratings yet
AI Phash3
11 pages
vertopal.com_Nokia Positive and Negative TM
No ratings yet
vertopal.com_Nokia Positive and Negative TM
8 pages
Python CA 4
No ratings yet
Python CA 4
9 pages
4aeee7-Ba25-Ff2e-30d7-63d306a7270 Open Ai Playground Example Prompts - Google Sheets
No ratings yet
4aeee7-Ba25-Ff2e-30d7-63d306a7270 Open Ai Playground Example Prompts - Google Sheets
8 pages
NLP Tushar
No ratings yet
NLP Tushar
21 pages
NLP_Sentimental_Analysis__1736351356
No ratings yet
NLP_Sentimental_Analysis__1736351356
32 pages
IR practical
No ratings yet
IR practical
24 pages
IR
No ratings yet
IR
12 pages
Truncated SVD
No ratings yet
Truncated SVD
27 pages
Sentiment Analysis With NLP Deep Learning
No ratings yet
Sentiment Analysis With NLP Deep Learning
8 pages
notebookai
No ratings yet
notebookai
22 pages
AI Lab - Manual - 136
No ratings yet
AI Lab - Manual - 136
17 pages
Section 6 - Jupyter Notebook
No ratings yet
Section 6 - Jupyter Notebook
11 pages
NLP Manual
No ratings yet
NLP Manual
21 pages
Problem Statement
No ratings yet
Problem Statement
10 pages
Ment Analysis Text Classification
No ratings yet
Ment Analysis Text Classification
9 pages
IR Pract
No ratings yet
IR Pract
7 pages
R002 KrishAhuja BDA Lab9.Ipynb - Colab
No ratings yet
R002 KrishAhuja BDA Lab9.Ipynb - Colab
3 pages
ASTW RA03 PracticalManual
No ratings yet
ASTW RA03 PracticalManual
18 pages
Detailed Report
No ratings yet
Detailed Report
6 pages
Pricing Mercari
No ratings yet
Pricing Mercari
41 pages
Code
No ratings yet
Code
6 pages
4. Chapter 8 Text Analytics
No ratings yet
4. Chapter 8 Text Analytics
42 pages
Py_Qus1
No ratings yet
Py_Qus1
3 pages
Quiz 2
No ratings yet
Quiz 2
11 pages
Report On - Social Media Research Topic Modeling
No ratings yet
Report On - Social Media Research Topic Modeling
26 pages
Amazon Sentiment Analysis Documentation
No ratings yet
Amazon Sentiment Analysis Documentation
4 pages
NLP Soc
No ratings yet
NLP Soc
15 pages
Lab2 IR
No ratings yet
Lab2 IR
16 pages
Lab Manual
No ratings yet
Lab Manual
10 pages
Arnav MLlab04
No ratings yet
Arnav MLlab04
7 pages
NLP Cheatsheet
100% (2)
NLP Cheatsheet
18 pages
(Assignment 1 & 2) Regular Expression
No ratings yet
(Assignment 1 & 2) Regular Expression
3 pages
ML Week10.1
No ratings yet
ML Week10.1
5 pages
Amazon-Fine-Food-Review - K-Means, Agglomerative & DBSCAN Clustering
No ratings yet
Amazon-Fine-Food-Review - K-Means, Agglomerative & DBSCAN Clustering
79 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
81 pages
EB05 PivotTableCalculationsLayout
No ratings yet
EB05 PivotTableCalculationsLayout
564 pages
Unit 1.1(Application_Security)
No ratings yet
Unit 1.1(Application_Security)
27 pages
Lesson 3 Database
No ratings yet
Lesson 3 Database
21 pages
RHCE (RHEL7) Lab Step 3
No ratings yet
RHCE (RHEL7) Lab Step 3
6 pages
What Is Azure Active Directory - PDF
No ratings yet
What Is Azure Active Directory - PDF
1,447 pages
Stack:: Note 3: Stack and Queue Concept in Data Structure For Application
No ratings yet
Stack:: Note 3: Stack and Queue Concept in Data Structure For Application
7 pages
System Requirements Specification Document
No ratings yet
System Requirements Specification Document
5 pages
Database Management Systems: CS/B.TECH (CSE) /SEM-5/CS-502/2011-12
No ratings yet
Database Management Systems: CS/B.TECH (CSE) /SEM-5/CS-502/2011-12
7 pages
Snowflake SnowPro Core Certification Exam Questions - Page 25 of 27 - SkillCertPro
No ratings yet
Snowflake SnowPro Core Certification Exam Questions - Page 25 of 27 - SkillCertPro
1 page
Sales Cloud Study Guide
No ratings yet
Sales Cloud Study Guide
10 pages
Testing Automation and Tools
No ratings yet
Testing Automation and Tools
9 pages
Class Item K of BOM in Variants
No ratings yet
Class Item K of BOM in Variants
14 pages
Core (1 2 3)
No ratings yet
Core (1 2 3)
80 pages
Report EDITED Jurassic Park
No ratings yet
Report EDITED Jurassic Park
6 pages
Sekurak Offline 2 Final
No ratings yet
Sekurak Offline 2 Final
65 pages
ERP - Procurement Cloud - I V2-1
No ratings yet
ERP - Procurement Cloud - I V2-1
3 pages
E-Library Management
No ratings yet
E-Library Management
27 pages
EclipseLink JPA
No ratings yet
EclipseLink JPA
4 pages
E MudhraDownload Hard
No ratings yet
E MudhraDownload Hard
17 pages
Komal Gupta Resumes
No ratings yet
Komal Gupta Resumes
2 pages
Kinjal Mistry Resume (CS) PDF
No ratings yet
Kinjal Mistry Resume (CS) PDF
2 pages
Practical Security For Agile and DevOps - Watermark
No ratings yet
Practical Security For Agile and DevOps - Watermark
236 pages
ITIL Sample Resume 2
No ratings yet
ITIL Sample Resume 2
8 pages
Global App Testing - The Ultimate QA Testing Handbook
No ratings yet
Global App Testing - The Ultimate QA Testing Handbook
71 pages
2021 0035 Niagara4 Brochure PDF
No ratings yet
2021 0035 Niagara4 Brochure PDF
8 pages
HP UX Reference
No ratings yet
HP UX Reference
692 pages
BMIT5103 Full Version Study Guide PDF
100% (1)
BMIT5103 Full Version Study Guide PDF
83 pages
Swift Services Factsheet Usage and Removal 0
No ratings yet
Swift Services Factsheet Usage and Removal 0
2 pages

Amazon Assignment Ex

Uploaded by

Amazon Assignment Ex

Uploaded by

Sentiment Classification on Amazon Fine Food Reviews

from sklearn.feature_extraction.text import CountVectorizer

#Reading the SQL data using SQLite

filtered_data = pd.read_sql_query(""" SELECT * FROM Reviews where Score != 3 """, con)

Id ProductId UserId ProfileName HelpfulnessNumerator HelpfulnessDenominator Score Time Summary

Id ProductId UserId ProfileName HelpfulnessNumerator HelpfulnessDenominator Score Time

Id ProductId UserId ProfileName HelpfulnessNumerator HelpfulnessDenominator Score Time

final = final[final.HelpfulnessNumerator <= final.HelpfulnessDenominator]

[nltk_data] Downloading package stopwords to

stop = set(stopwords.words('english')) #set of stopwords

Cleaning the data like special characters,html tags etc..

Id ProductId UserId ProfileName HelpfulnessNumerator HelpfulnessDenominator Score Time

2546 2774 B00002NCJC A196AJHU9EASJN Alex Chaffee 0 0 postive 1282953600

2547 2775 B00002NCJC A13RRPGE79XFFH reader48 0 0 postive 1281052800

1146 1245 B00002Z754 A29Z5PI9BW2PU3 Robbie 7 7 postive 961718400

model = TSNE(n_components = 2, random_state = 0)

tsne_df = pd.DataFrame(data=tsne_data, columns=("Dim_1", "Dim_2", "lbl1"))

tf_idf_vect = TfidfVectorizer(ngram_range = (1,2))

#t-SNE for TF IDF data.

C:\Users\Sujatha\Anaconda3\lib\site-packages\gensim\utils.py:1197: UserWarning: detected Windows;

from gensim.models import Word2Vec

model = TSNE(n_components = 2, random_state = 0)

tsne_df = pd.DataFrame(data=tsne_data, columns=("Dim_1", "Dim_2", "lbl1"))

from sklearn.manifold import TSNE

model = TSNE(n_components = 2, random_state = 0)

tsne_df = pd.DataFrame(data=tsne_data, columns=("Dim_1", "Dim_2", "lbl1"))

You might also like