0% found this document useful (0 votes)

71 views22 pages

Sentiment Analysis - Comparing Algorithms Accuracy

The document discusses comparing the accuracy of different algorithms for sentiment analysis of tweets. It introduces importing necessary libraries in Python and extracting tweet data from Twitter using APIs. The data is then loaded and cleaned before analyzing sentiment using various algorithms and evaluating their performance.

Uploaded by

Michael O. Adegbenro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views22 pages

Sentiment Analysis - Comparing Algorithms Accuracy

Uploaded by

Michael O. Adegbenro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

2/7/22, 5:06 PM Sentiment Analysis - Comparing Algorithms Accuracy

Author: Adegbenro Michael Olusola:

Note:
From the machine learning point of view, raw text is useless. Only if we manage to transform
it into meaningful numbers, can we feed it into our machine-learning algorithms such as clustering.
The same is true for more mundane operations on text,
such as similarity measurement

This project can pull data from Tweeter but to do that you need to request for your own API keys
specified below (I removed mine):

my_api_key = "xxxxxxxxx"
my_api_secret = "yyyyyyy"

If you don't have API keys already, you may use "Raw Data" which i pulled from tweeter using:

You can specifiy amount of tweets you want to pull. Here I pulled 100

Import Necessary Libraries

In [4]:
import pandas as pd

import numpy as np

import re

import string

from nltk.corpus import stopwords

from wordcloud import WordCloud,STOPWORDS

from nltk.stem.porter import PorterStemmer

import nltk

from nltk.corpus import stopwords

import matplotlib

import matplotlib.pyplot as plt

from pandas.plotting import scatter_matrix

%matplotlib inline

from nltk.stem import WordNetLemmatizer

import seaborn as sns

sns.set(style="white",color_codes=True)

from sklearn.metrics import classification_report

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import LabelEncoder,StandardScaler

sns.set(font_scale=1.5)

from nltk.sentiment.vader import SentimentIntensityAnalyzer as SIA

from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import train_test_split

from textblob import TextBlob

from sklearn.metrics import confusion_matrix

from sklearn.metrics import classification_report

from sklearn.metrics import accuracy_score

from sklearn.metrics import precision_score

from sklearn.metrics import recall_score

from sklearn.metrics import f1_score

from sklearn.linear_model import LinearRegression,Ridge,Lasso,ElasticNet

from sklearn.tree import DecisionTreeRegressor

from sklearn import metrics

from nltk import word_tokenize

from wordcloud import WordCloud,STOPWORDS

stopword = set(stopwords.words('english'))

import tweepy as tw

localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/Sentiment Analysis/ Sentime… 1/22

2/7/22, 5:06 PM Sentiment Analysis - Comparing Algorithms Accuracy
import warnings

warnings.filterwarnings('ignore')

from matplotlib.axes._axes import _log as matplotlib_axes_logger

matplotlib_axes_logger.setLevel('ERROR')

Extract Data fromm Twitter

remove this cell if you don't have API key

In [ ]:
my_api_key = "xxxxxxxxxxxxxxxxxxx"

my_api_secret = "xxxxxxxxxxxxxxxxxxxxxxxx"

# authenticate

auth = tw.OAuthHandler(my_api_key, my_api_secret)

api = tw.API(auth, wait_on_rate_limit=True)

search_query = "#Ottawa -filter:retweet"

# tweets = tw.Cursor(api.search_tweets,q=search_query,lang="en",since="2015-09-16").item

tweets = tw.Cursor(api.search_tweets,q=search_query,lang="en").items(500)

# store the API responses in a list

tweets_copy = []

for tweet in tweets:

tweets_copy.append(tweet)

print("Total Tweets fetched:", len(tweets_copy))

# intialize the dataframe

data= pd.DataFrame()

# populate the dataframe

for tweet in tweets_copy:

hashtags = []

try:

for hashtag in tweet.entities["hashtags"]:

hashtags.append(hashtag["text"])

text = api.get_status(id=tweet.id, tweet_mode='extended').full_text

except:

pass

data = data.append(pd.DataFrame({'user_name': tweet.user.name,'ID': tweet.id_str,

'user_location': tweet.user.location,

'user_description': tweet.user.description,

'user_verified': tweet.user.verified,

'date': tweet.created_at,

'text': text,

'language': tweet.lang,

'favourites-count': tweet.favorite_count,

'author': tweet.user.screen_name,

'retweet-count': tweet.retweet_count,

'hashtags': [hashtags if hashtags else None],

'source': tweet.source}))

Total Tweets fetched: 500

Affter importing all libraries above, run this cell to load data.

localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/Sentiment Analysis/ Sentime… 2/22

2/7/22, 5:06 PM Sentiment Analysis - Comparing Algorithms Accuracy

In [2]: #load data from local drive

data = pd.read_csv("Raw Data.csv")

data.head()

Out[2]: Unnamed:
user_name ID user_location user_description user_verified
0

I'm a mushroom
Alexandria spore floating 2022-0
0 0 Horatio 1490817580352819203 False
,MN around Central 22:39:43+0
M...

Moscow, 2022-0
1 0 jeremy t 1490817570399666180 NaN False
Russia 22:39:41+0

Marie 2022-0
2 0 1490817565362327552 NaN NaN False
williams 22:39:40+0

el ch'val a 2022-0
3 0 1490817548153196546 NaN Ti Tannant False
coukse 22:39:36+0

#Christian | #Gay
Honk Honk Toronto, 2022-0
4 0 1490817547926708225 | #Torontonian False
🚛🚚🛻!!! Ontario 22:39:36+0
Philippians 4:13

In [ ]:
print(data.skew())

#If the value is closer to zero, then it shows less skew.

In [5]:
def clean_text(text):

'''Make text lowercase, remove text in square brackets,remove links,remove punctuat

and remove words containing numbers.'''

text = str(text).lower()

text = re.sub('\[.*?\]', '', text)

text = re.sub('https?://\S+|www\.\S+', '', text)

text = re.sub('<.*?>+', '', text)

text = re.sub('[%s]' % re.escape(string.punctuation), '', text)

text = re.sub('\n', '', text)

text = re.sub('\w\d\w', '', text)

return text

def process_tweets(tweet):

#tokenizing words

tokens = word_tokenize(tweet)

#Removing Stop Words

final_tokens = [w for w in tokens if w not in stopword]

#reducing a word to its word stem

localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/Sentiment Analysis/ Sentime… 3/22

2/7/22, 5:06 PM Sentiment Analysis - Comparing Algorithms Accuracy
wordLemm = WordNetLemmatizer()

finalwords=[]

for w in final_tokens:

if len(w)>1:

word = wordLemm.lemmatize(w)
finalwords.append(word)

return ' '.join(finalwords)

#Apply to relevant columns

data["user_description"] = data["user_description"].apply(lambda x:clean_text(x))

data["user_name"] = data["user_name"].apply(lambda x:clean_text(x))

data["text"] = data["text"].apply(lambda x:clean_text(x))

data['text'] = data["text"].apply(lambda x: process_tweets(x))

In [6]:
# Now we have cleaned data for three features: user_description, text, and user_name

# Although, we don't need more than text to perform our analysis

pd.DataFrame(data).head()

Out[6]:
user_name ID user_location user_description user_verified date

im a mushroom
rt po
Alexandria spore floating 2022-02-07
0 horatio 1490817580352819203 False ma
,MN around central 22:39:43+00:00
mi...

rt v
Moscow, 2022-02-07
0 jeremy t 1490817570399666180 False worke
Russia 22:39:41+00:00

rt k
marie 2022-02-07
0 1490817565362327552 False
williams 22:39:40+00:00

🖕
el chval a 2022-02-07
0 1490817548153196546 ti tannant False
coukse 22:39:36+00:00 🎶truc

christian gay rt v
honk honk Toronto, 2022-02-07
0 1490817547926708225 torontonian False worke
🚛🚚🛻 Ontario 22:39:36+00:00
philippians

In [7]:
data.to_csv("Clean Data.csv")

data.head()

Out[7]:
user_name ID user_location user_description user_verified date

im a mushroom
rt po
Alexandria spore floating 2022-02-07
0 horatio 1490817580352819203 False ma
,MN around central 22:39:43+00:00
mi...

localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/Sentiment Analysis/ Sentime… 4/22

2/7/22, 5:06 PM Sentiment Analysis - Comparing Algorithms Accuracy

user_name ID user_location user_description user_verified date

rt v
Moscow, 2022-02-07
0 jeremy t 1490817570399666180 False worke
Russia 22:39:41+00:00

rt k
marie 2022-02-07
0 1490817565362327552 False
williams 22:39:40+00:00

🖕
el chval a 2022-02-07
0 1490817548153196546 ti tannant False
coukse 22:39:36+00:00 🎶truc

christian gay rt v
honk honk Toronto, 2022-02-07
0 1490817547926708225 torontonian False worke
🚛🚚🛻 Ontario 22:39:36+00:00
philippians

Vader Sentiment Analysis

VADER sentimental analysis relies on a dictionary that maps lexical features to emotion intensities
known as sentiment scores. The sentiment score of a text can be obtained by summing up each
word's intensity in the text.

For example,- Words like 'love,' 'enjoy,' 'happy,' 'like' all convey a positive sentiment. Also, VADER is
intelligent enough to understand these words' basic context, such as "did not love" as a negative
statement. It also understands the emphasis of capitalization and punctuation, such as "ENJOY."

In [8]:
## Added "Sentiment" column and categorized in positive, negative and neutral

In [9]:
sid = SIA()

data['Sentiments'] = data['text'].apply(lambda x: sid.polarity_scores(' '.joi

data['Positive Sentiment'] = data['Sentiments'].apply(lambda x: x['pos']+1*(10**-6))

data['Neutral Sentiment'] = data['Sentiments'].apply(lambda x: x['neu']+1*(10**-6))

data['Negative Sentiment'] = data['Sentiments'].apply(lambda x: x['neg']+1*(10**-6))

In [10]:
# drop sentiments column... not needed

data.drop(columns=['Sentiments'],inplace=True)

data.head()

Out[10]:
user_name ID user_location user_description user_verified date

im a mushroom
rt po
Alexandria spore floating 2022-02-07
0 horatio 1490817580352819203 False ma
,MN around central 22:39:43+00:00
mi...

localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/Sentiment Analysis/ Sentime… 5/22

2/7/22, 5:06 PM Sentiment Analysis - Comparing Algorithms Accuracy

user_name ID user_location user_description user_verified date

rt v
Moscow, 2022-02-07
0 jeremy t 1490817570399666180 False worke
Russia 22:39:41+00:00

rt k
marie 2022-02-07
0 1490817565362327552 False
williams 22:39:40+00:00

🖕
el chval a 2022-02-07
0 1490817548153196546 ti tannant False
coukse 22:39:36+00:00 🎶truc

christian gay rt v
honk honk Toronto, 2022-02-07
0 1490817547926708225 torontonian False worke
🚛🚚🛻 Ontario 22:39:36+00:00
philippians

In [11]:
#Number of Words

data['Number of Words'] =data.text.apply(lambda x:len(x.split(' ')))

#Average Word Length

data['Mean Word Length'] = data.text.apply(lambda x:np.round(np.mean([len(w) for w in x

data.head()

Out[11]:
user_name ID user_location user_description user_verified date

im a mushroom
rt po
Alexandria spore floating 2022-02-07
0 horatio 1490817580352819203 False ma
,MN around central 22:39:43+00:00
mi...

rt v
Moscow, 2022-02-07
0 jeremy t 1490817570399666180 False worke
Russia 22:39:41+00:00

rt k
marie 2022-02-07
0 1490817565362327552 False
williams 22:39:40+00:00

🖕
el chval a 2022-02-07
0 1490817548153196546 ti tannant False
coukse 22:39:36+00:00 🎶truc

christian gay rt v
honk honk Toronto, 2022-02-07
0 1490817547926708225 torontonian False worke
🚛🚚🛻 Ontario 22:39:36+00:00
philippians

In [12]:
# WordCloud using atual clean data

#allWords = ' '.join( [cmts for cmts in data.text])

#wordCloud = WordCloud(width = 500, height = 300, random_state = 21, max_font_size = 11

localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/Sentiment Analysis/ Sentime… 6/22

2/7/22, 5:06 PM Sentiment Analysis - Comparing Algorithms Accuracy

#plt.imshow(wordCloud, interpolation= 'bilinear')

#plt.axis('off')

#plt.show

Sentimental Analysis

Polarity and Subjectivity

In starting with the analysis we will create the new columns namely Polarity and Subjectivity and
acquire the very values of each comment. Polarity ranges from -1 to 1 and measures how positive or
negative a comment is. It simply means emotions expressed in a sentence. Subjectivity expresses
some personal feelings, views, or beliefs. A subjective sentence may not express any sentiment.

In [13]:
# get subjectivity

def getSubjectivity(txt):

return TextBlob(txt).sentiment.subjectivity

# get polarity

def getPolarity(txt):

return TextBlob(txt).sentiment.polarity

#Columns

data['Subjectivity'] = data['text'].apply(getSubjectivity)

data['Polarity'] = data['text'].apply(getPolarity)

data.head()

Out[13]:
user_name ID user_location user_description user_verified date

im a mushroom
rt po
Alexandria spore floating 2022-02-07
0 horatio 1490817580352819203 False ma
,MN around central 22:39:43+00:00
mi...

rt v
Moscow, 2022-02-07
0 jeremy t 1490817570399666180 False worke
Russia 22:39:41+00:00

rt k
marie 2022-02-07
0 1490817565362327552 False
williams 22:39:40+00:00

🖕
el chval a 2022-02-07
0 1490817548153196546 ti tannant False
coukse 22:39:36+00:00 🎶truc

christian gay rt v
honk honk Toronto, 2022-02-07
0 1490817547926708225 torontonian False worke
🚛🚚🛻 Ontario 22:39:36+00:00
philippians

In [14]:
# function to compute analysis

def getAnalysis(score):

localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/Sentiment Analysis/ Sentime… 7/22

2/7/22, 5:06 PM Sentiment Analysis - Comparing Algorithms Accuracy

if score < 0 :

return 'Negative'

elif score == 0:

return 'Neutral'

else:

return
'Positive'

data['Analysis'] = data['Polarity'].apply(getAnalysis)

In [15]:
data.head()

Out[15]:
user_name ID user_location user_description user_verified date

im a mushroom
rt po
Alexandria spore floating 2022-02-07
0 horatio 1490817580352819203 False ma
,MN around central 22:39:43+00:00
mi...

rt v
Moscow, 2022-02-07
0 jeremy t 1490817570399666180 False worke
Russia 22:39:41+00:00

rt k
marie 2022-02-07
0 1490817565362327552 False
williams 22:39:40+00:00

🖕
el chval a 2022-02-07
0 1490817548153196546 ti tannant False
coukse 22:39:36+00:00 🎶truc

christian gay rt v
honk honk Toronto, 2022-02-07
0 1490817547926708225 torontonian False worke
🚛🚚🛻 Ontario 22:39:36+00:00
philippians

5 rows × 21 columns

In [16]:
# % Percentages:

pcomments = data[data.Analysis == 'Positive']

pcomments = pcomments['text']

print('Positive: ' +str(round((pcomments.shape[0]/data.shape[0])*100, 1))+ '%')

ncomments = data[data.Analysis == 'Negative']

ncomments = ncomments['text']

print('Negative: ' +str(round((ncomments.shape[0]/data.shape[0])*100, 1))+ '%')

nucomments = data[data.Analysis == 'Neutral']

nucomments = nucomments['text']

print('Nuetral: ' +str(round((nucomments.shape[0]/data.shape[0])*100, 1))+ '%')

Positive: 28.4%

localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/Sentiment Analysis/ Sentime… 8/22

2/7/22, 5:06 PM Sentiment Analysis - Comparing Algorithms Accuracy

Negative: 33.6%

Nuetral: 38.0%

In [17]:
# the below function will create a word cloud

def wordcloud_draw(data, color = 'black'):

words = ' '.join(data)

cleaned_word = " ".join([word for word in words.split()

if 'http' not in word # double check for nay links

and not word.startswith('#') # removing hash tags

and word != 'rt'

])

wordcloud = WordCloud(stopwords=STOPWORDS, # using stopwords provided by Word cloud

background_color=color,

width=2500,

height=2000

).generate(cleaned_word)

# using matplotlib to display the images in notebook itself.

plt.figure(1,figsize=(5, 7))

plt.imshow(wordcloud)

plt.axis('off')

plt.show()

In [18]:
wordcloud_draw(data.text, 'black')

In [19]:
print("Positive words are", pcomments.count())

wordcloud_draw(pcomments, 'black')

Positive words are 142

localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/Sentiment Analysis/ Sentime… 9/22

2/7/22, 5:06 PM Sentiment Analysis - Comparing Algorithms Accuracy

In [20]:
print("Negative words are", ncomments.count())

wordcloud_draw(ncomments)

Negative words are 168

In [21]:
print("Neutral words are", nucomments.count())

wordcloud_draw(nucomments, 'black')

Neutral words are 190

In [22]:
# Value Count

data['Analysis'].value_counts

localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/Sentiment Analysis/ Sentim… 10/22

2/7/22, 5:06 PM Sentiment Analysis - Comparing Algorithms Accuracy

# Plot

plt.title('Sentiment Analysis')

plt.xlabel('Sentiment')

plt.ylabel('Counts')

data['Analysis'].value_counts().plot(kind= 'bar')

plt.show()

More on sentiment analysis: https://www.projectpro.io/article/sentiment-analysis-project-ideas-

with-source-code/518

Check Analysis Accuracy

In [23]:
data.isnull().sum()

user_name 0

Out[23]:
ID 0

user_location 0

user_description 0

user_verified 0

date 0

text 0

language 0

favourites-count 0

author 0

retweet-count 0

hashtags 112

source 0

Positive Sentiment 0

Neutral Sentiment 0

Negative Sentiment 0

Number of Words 0

Mean Word Length 0

Subjectivity 0

localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/Sentiment Analysis/ Sentim… 11/22

2/7/22, 5:06 PM Sentiment Analysis - Comparing Algorithms Accuracy
Polarity 0

Analysis 0

dtype: int64

In [24]:
data.shape

(500, 21)
Out[24]:

In [25]:
data.dropna(inplace=True)

data.isnull().sum()

user_name 0

Out[25]:
ID 0

user_location 0

user_description 0

user_verified 0

date 0

text 0

language 0

favourites-count 0

author 0

retweet-count 0

hashtags 0

source 0

Positive Sentiment 0

Neutral Sentiment 0

Negative Sentiment 0

Number of Words 0

Mean Word Length 0

Subjectivity 0

Polarity 0

Analysis 0

dtype: int64

In [26]:
data.shape

(388, 21)
Out[26]:

In [27]:
data.columns

Index(['user_name', 'ID', 'user_location', 'user_description', 'user_verified',

Out[27]:
'date', 'text', 'language', 'favourites-count', 'author',

'retweet-count', 'hashtags', 'source', 'Positive Sentiment',

'Neutral Sentiment', 'Negative Sentiment', 'Number of Words',

'Mean Word Length', 'Subjectivity', 'Polarity', 'Analysis'],

dtype='object')

In [28]:
# drop irrelevant data

data = data.drop(['user_name', 'ID','language', 'author','Positive Sentiment',

'Neutral Sentiment', 'Negative Sentiment', 'Number of Words','Polarity',

'Mean Word Length','hashtags'], axis=1)

In [29]:
# check data types and encode object type

data.dtypes

localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/Sentiment Analysis/ Sentim… 12/22

2/7/22, 5:06 PM Sentiment Analysis - Comparing Algorithms Accuracy

user_location object

Out[29]:
user_description object

user_verified bool

date datetime64[ns, UTC]

text object

favourites-count int64

retweet-count int64

source object

Subjectivity float64

Analysis object

dtype: object

In [30]:
enco = LabelEncoder()

data['user_location'] = enco.fit_transform(data['user_location'])

data['user_description'] = enco.fit_transform(data['user_description'])

data['user_verified'] = enco.fit_transform(data['user_verified'])

data['text'] = enco.fit_transform(data['text'])

data['date'] = enco.fit_transform(data['date'])

data['source'] = enco.fit_transform(data['source'])

data['Analysis'] = enco.fit_transform(data['Analysis'])

In [31]:
data.head()

Out[31]: favourites- retweet-

user_location user_description user_verified date text source Subjectivity
count count

0 7 127 0 318 94 0 1889 2 0.000000

0 0 0 0 317 78 0 790 1 0.357143

0 0 241 0 316 117 0 0 2 0.000000

0 94 176 0 315 73 0 21 4 0.000000

0 132 91 0 314 61 0 3492 1 0.400000

In [32]:
X = data.drop(["Analysis"], axis=1)

y= data.Analysis

In [33]:
from sklearn.decomposition import PCA

pca = PCA(n_components=3)

fit = pca.fit(X)

fit.explained_variance_ratio_

print(fit.components_)

[[ 3.74714044e-05 2.79055233e-03 -2.44727802e-06 -5.05181496e-03

-9.07495858e-04 -2.10093852e-04 9.99982911e-01 3.53673754e-05

3.97336839e-06]

[ 1.55773098e-01 9.59450987e-01 5.36402446e-05 2.34913945e-01

-7.54200772e-04 -8.93911860e-04 -1.49736983e-03 -6.07769299e-04

4.75840743e-05]

[ 3.68001614e-02 2.31975421e-01 1.55584824e-05 -9.71914540e-01

localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/Sentiment Analysis/ Sentim… 13/22

2/7/22, 5:06 PM Sentiment Analysis - Comparing Algorithms Accuracy
-1.31315953e-02 3.33579373e-03 -5.56999131e-03 8.29295615e-04

-3.09667478e-05]]

In [34]:
x_train,x_test,y_train,y_test=train_test_split(X,y,test_size=0.10,random_state=1)

In [35]:
#Feature Scaling/Standardize (not important step but it boost accuracy)

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()

x_train = sc.fit_transform(x_train)

x_test = sc.transform(x_test)

In [36]:
print (x_train.shape, y_train.shape)

print (x_test.shape, y_test.shape)

(349, 9) (349,)

(39, 9) (39,)

In [64]:
#Gaussian Naive Bayes model

from sklearn.naive_bayes import GaussianNB # import library

classifier = GaussianNB() # initilaise

classifier.fit(x_train,y_train) # fit train dataset

y_pred = classifier.predict(x_test) # predict

# check for accuracy

from sklearn.metrics import confusion_matrix, accuracy_score

cm = confusion_matrix(y_test, y_pred)

print(cm)

accuracy_score(y_test, y_pred)

accuracy = classifier.score(x_test, y_test)

Gaussian_Naive_Bayes = ("Gaussian_Naive_Bayes Accuracy: {:.2f}%".format(accuracy*100))

# put prediction and actual values side-by-side

pd.DataFrame(data={'predictions': y_pred, 'actual': y_test}).head()

[[15 0 0]

[ 0 15 0]

[ 9 0 0]]

Out[64]: predictions actual

0 0 2

0 0 0

0 1 1

0 0 0

In [65]:
#DecisionTree Classifier

from sklearn.tree import DecisionTreeClassifier

localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/Sentiment Analysis/ Sentim… 14/22

2/7/22, 5:06 PM Sentiment Analysis - Comparing Algorithms Accuracy
classifier = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)

classifier.fit(x_train, y_train)

y_predict = classifier.predict(x_test)

# check for accuracy

from sklearn.metrics import confusion_matrix, accuracy_score

cm = confusion_matrix(y_test, y_predict)
print(cm)

accuracy_score(y_test, y_predict)
accuracy = classifier.score(x_test, y_test)

DecisionTree=("DecisionTree Accuracy: {:.2f}%".format(accuracy*100))

# put prediction and actual values side-by-side

pd.DataFrame(data={'predictions': y_predict, 'actual': y_test}).head()

[[14 0 1]

[ 0 15 0]

[ 0 0 9]]

Out[65]: predictions actual

0 2 2

0 0 0

0 1 1

0 0 0

In [66]:
#K-Nearest Neighbors (K-NN)

from sklearn.neighbors import KNeighborsClassifier

classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)

classifier.fit(x_train, y_train)

# check for accuracy

from sklearn.metrics import confusion_matrix, accuracy_score

cm = confusion_matrix(y_test, y_predict)
print(cm)

accuracy_score(y_test, y_predict)
accuracy = classifier.score(x_test, y_test)

K_Nearest_Neighbor=("K_Nearest_Neighbor Accuracy: {:.2f}%".format(accuracy*100))

# put prediction and actual values side-by-side

pd.DataFrame(data={'predictions': y_predict, 'actual': y_test}).head()

[[14 0 1]

[ 0 15 0]

[ 0 0 9]]

Out[66]: predictions actual

0 2 2

0 0 0

0 1 1

localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/Sentiment Analysis/ Sentim… 15/22

2/7/22, 5:06 PM Sentiment Analysis - Comparing Algorithms Accuracy

predictions actual

0 0 0

In [67]:
#Kernel SVM

from sklearn.svm import SVC

classifier = SVC(kernel = 'rbf', random_state = 0)

classifier.fit(x_train, y_train)

# check for accuracy

from sklearn.metrics import confusion_matrix, accuracy_score

cm = confusion_matrix(y_test, y_predict)
print(cm)

accuracy_score(y_test, y_predict)
accuracy = classifier.score(x_test, y_test)

Kernel_SVM=("Kernel_SVM Accuracy: {:.2f}%".format(accuracy*100))

# put prediction and actual values side-by-side

pd.DataFrame(data={'predictions': y_predict, 'actual': y_test}).head()

[[14 0 1]

[ 0 15 0]

[ 0 0 9]]

Out[67]: predictions actual

0 2 2

0 0 0

0 1 1

0 0 0

In [68]:
#Logistic Regression

from sklearn.linear_model import LogisticRegression

classifier = LogisticRegression(random_state = 0)

classifier.fit(x_train, y_train)

# check for accuracy

from sklearn.metrics import confusion_matrix, accuracy_score

cm = confusion_matrix(y_test, y_predict)
print(cm)

accuracy_score(y_test, y_predict)
accuracy = classifier.score(x_test, y_test)

Logistic_Regression=("Logistic_Regression Accuracy: {:.2f}%".format(accuracy*100))

[[14 0 1]

[ 0 15 0]

[ 0 0 9]]

In [69]:
#Random Tree Regression

from sklearn.ensemble import RandomForestClassifier

localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/Sentiment Analysis/ Sentim… 16/22

2/7/22, 5:06 PM Sentiment Analysis - Comparing Algorithms Accuracy
classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_st
classifier.fit(x_train, y_train)

# check for accuracy

from sklearn.metrics import confusion_matrix, accuracy_score

cm = confusion_matrix(y_test, y_predict)
print(cm)

accuracy_score(y_test, y_predict)
accuracy = classifier.score(x_test, y_test)

Random_Tree_Regression=("Random_Tree_Regression Accuracy: {:.2f}%".format(accuracy*100)

# put prediction and actual values side-by-side

pd.DataFrame(data={'predictions': y_predict, 'actual': y_test}).head()

[[14 0 1]

[ 0 15 0]

[ 0 0 9]]

Out[69]: predictions actual

0 2 2

0 0 0

0 1 1

0 0 0

In [70]:
print(Gaussian_Naive_Bayes)

print(DecisionTree)

print(Kernel_SVM)

print(Logistic_Regression)
print(Random_Tree_Regression)

Gaussian_Naive_Bayes Accuracy: 76.92%

DecisionTree Accuracy: 97.44%

Kernel_SVM Accuracy: 92.31%

Logistic_Regression Accuracy: 92.31%

Random_Tree_Regression Accuracy: 94.87%

In [71]:
from sklearn.decomposition import PCA

pca = PCA(n_components=2)

pca.fit_transform(X)

x_pca = pca.transform(X)

In [72]:
plt.figure(figsize=(9,6))

plt.scatter(x_pca[:,0],x_pca[:,1],c=y,cmap='viridis')

plt.xlabel('First Principal Component')

plt.ylabel('Second Principal Component')

Text(0, 0.5, 'Second Principal Component')

Out[72]:

localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/Sentiment Analysis/ Sentim… 17/22

2/7/22, 5:06 PM Sentiment Analysis - Comparing Algorithms Accuracy

In [45]:
from sklearn.metrics import confusion_matrix, accuracy_score

def plot_confusion_matrix(cm, classes,

normalize=False,

title='Confusion matrix',

cmap=plt.cm.Blues):

"""

This function prints and plots the confusion matrix.

Normalization can be applied by setting `normalize=True`.

"""

plt.imshow(cm, interpolation='nearest', cmap=cmap)

plt.title(title)

plt.colorbar()

tick_marks = np.arange(len(classes))

plt.xticks(tick_marks, classes, rotation=45)

plt.yticks(tick_marks, classes)

if normalize:

cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]

print("Normalized confusion matrix")

else:

print('Confusion matrix, without normalization')

print(cm)

thresh = cm.max() / 2.

for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):

plt.text(j, i, cm[i, j],

horizontalalignment="center",

color="white" if cm[i, j] > thresh else "black")

plt.tight_layout()

plt.ylabel('True label')

plt.xlabel('Predicted label')

localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/Sentiment Analysis/ Sentim… 18/22

2/7/22, 5:06 PM Sentiment Analysis - Comparing Algorithms Accuracy
# Compute confusion matrix

cnf_matrix = confusion_matrix(y_test, y_pred)

In [57]:
import itertools

plt.figure(figsize=(7,5))

plot_confusion_matrix(cnf_matrix, classes=['1','2','3'],title='Confusion matrix, withou

accuracy_score(y_test, y_pred)

accuracy = classifier.score(x_test, y_test)

print()

print("Accuracy: {:.2f}%".format(accuracy*100))

Confusion matrix, without normalization

[[16 0 2]

[ 0 0 17]

[ 3 0 1]]

Accuracy: 89.74%

In [47]:
plt.figure(figsize=(20,7))

sns.heatmap(data.corr(), annot = True)

<AxesSubplot:>
Out[47]:

localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/Sentiment Analysis/ Sentim… 19/22

2/7/22, 5:06 PM Sentiment Analysis - Comparing Algorithms Accuracy

In [48]:
print(classification_report(y_test,y_pred))

precision recall f1-score support

0 0.84 0.89 0.86 18

1 0.00 0.00 0.00 17

2 0.05 0.25 0.08 4

accuracy 0.44 39

macro avg 0.30 0.38 0.32 39

weighted avg 0.39 0.44 0.41 39

In [49]:
data.columns

Index(['user_location', 'user_description', 'user_verified', 'date', 'text',

Out[49]:
'favourites-count', 'retweet-count', 'source', 'Subjectivity',

'Analysis'],

dtype='object')

In [50]:
from sklearn.metrics import mean_squared_error,r2_score

from math import sqrt

In [51]:
classifier = LogisticRegression(random_state = 0)

classifier.fit(x_train, y_train)

coeff = list(classifier.coef_[0])

labels = list(data[['user_location', 'user_description', 'user_verified', 'date', 'text

'favourites-count', 'retweet-count', 'source',

'Analysis']])

features = pd.DataFrame()

features['Features'] = labels

features['importance'] = coeff

features.sort_values(by=['importance'], ascending=True, inplace=True)

features['positive'] = features['importance'] > 0

features.set_index('Features', inplace=True)

localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/Sentiment Analysis/ Sentim… 20/22

2/7/22, 5:06 PM Sentiment Analysis - Comparing Algorithms Accuracy
features.importance.plot(kind='barh', figsize=(11, 6),color = features.positive.map({Tr
plt.xlabel('Importance')

Text(0.5, 0, 'Importance')
Out[51]:

In [56]:
import statsmodels.formula.api as smf

model = smf.ols("Analysis ~ user_location+user_description+text+source", data = data).f

print(model.summary())

OLS Regression Results

==============================================================================

Dep. Variable: Analysis R-squared: 0.040

Model: OLS Adj. R-squared: 0.030

Method: Least Squares F-statistic: 3.963

Date: Mon, 07 Feb 2022 Prob (F-statistic): 0.00366

Time: 16:26:12 Log-Likelihood: -405.00

No. Observations: 383 AIC: 820.0

Df Residuals: 378 BIC: 839.7

Df Model: 4

Covariance Type: nonrobust

====================================================================================

coef std err t P>|t| [0.025 0.975]

------------------------------------------------------------------------------------

Intercept 0.5010 0.161 3.110 0.002 0.184 0.818

user_location 0.0006 0.001 0.854 0.393 -0.001 0.002

user_description -0.0003 0.000 -0.748 0.455 -0.001 0.000

text 0.0053 0.001 3.796 0.000 0.003 0.008

source -0.0322 0.028 -1.146 0.253 -0.088 0.023

==============================================================================

Omnibus: 35.208 Durbin-Watson: 1.964

Prob(Omnibus): 0.000 Jarque-Bera (JB): 31.702

Skew: 0.631 Prob(JB): 1.31e-07

Kurtosis: 2.374 Cond. No. 780.

==============================================================================

Notes:

[1] Standard Errors assume that the covariance matrix of the errors is correctly specifi
ed.

localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/Sentiment Analysis/ Sentim… 21/22

2/7/22, 5:06 PM Sentiment Analysis - Comparing Algorithms Accuracy

In [ ]:

localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/Sentiment Analysis/ Sentim… 22/22

Twitter Project2
No ratings yet
Twitter Project2
339 pages
Web Scrapping
100% (1)
Web Scrapping
5 pages
ChatGPT Twitter Sentiment Analyzer
No ratings yet
ChatGPT Twitter Sentiment Analyzer
50 pages
Twitter Sentiment Analysis (NLP) : This Photo CC By-Nc
100% (1)
Twitter Sentiment Analysis (NLP) : This Photo CC By-Nc
18 pages
Tweepy Functions (2)
No ratings yet
Tweepy Functions (2)
34 pages
Tweepy Functions
No ratings yet
Tweepy Functions
49 pages
Twitter Sentiment Analysis in Python
0% (1)
Twitter Sentiment Analysis in Python
9 pages
Data Science Project
No ratings yet
Data Science Project
34 pages
Part C - Assignment No. 2 Mini-Project On Twitter
No ratings yet
Part C - Assignment No. 2 Mini-Project On Twitter
7 pages
Analyzing Social Media Data in Python Chapter2
No ratings yet
Analyzing Social Media Data in Python Chapter2
30 pages
Cyberbullying Tweet Recognition Project 1677256740
No ratings yet
Cyberbullying Tweet Recognition Project 1677256740
17 pages
Russia Vs Ukraine Tweets Analysis
No ratings yet
Russia Vs Ukraine Tweets Analysis
20 pages
Sample_1
No ratings yet
Sample_1
22 pages
Business Data Management DISC 325
No ratings yet
Business Data Management DISC 325
21 pages
EXP5
No ratings yet
EXP5
15 pages
INDEXReport Ayush (1)
No ratings yet
INDEXReport Ayush (1)
38 pages
Sentiment Analysis On User-Generated Tweets
No ratings yet
Sentiment Analysis On User-Generated Tweets
15 pages
Sentiment Analysis On Twitter Data Using Machine Learning Algorithms in Python
No ratings yet
Sentiment Analysis On Twitter Data Using Machine Learning Algorithms in Python
14 pages
Unit - Iv - Mining Social Web
No ratings yet
Unit - Iv - Mining Social Web
13 pages
Manual de Mantenimiento Detector de Trazas
100% (2)
Manual de Mantenimiento Detector de Trazas
203 pages
NLP - (1) (1) .Ipynb - Colab
No ratings yet
NLP - (1) (1) .Ipynb - Colab
10 pages
Advance Data Mining Assignment
No ratings yet
Advance Data Mining Assignment
10 pages
2.1 Analysing Social Media in Python
No ratings yet
2.1 Analysing Social Media in Python
21 pages
Twitter Analysis
No ratings yet
Twitter Analysis
5 pages
640848728-Part-C-Assignment-No-2-Mini-project-on-Twitter-1
No ratings yet
640848728-Part-C-Assignment-No-2-Mini-project-on-Twitter-1
9 pages
SMA EXP 09 CODE PRINT
No ratings yet
SMA EXP 09 CODE PRINT
5 pages
DSDM Unit4
No ratings yet
DSDM Unit4
31 pages
NLP - Twitter Sentiment Analysis With Tensorflow - Sebastian Correa - Medium
No ratings yet
NLP - Twitter Sentiment Analysis With Tensorflow - Sebastian Correa - Medium
13 pages
Mlds5 Code
No ratings yet
Mlds5 Code
7 pages
Twitter API
No ratings yet
Twitter API
6 pages
Anand Institute of Higher Technology Department of Computer Science and Engineering ACADEMIC YEAR: 2018-19 Mini Project Report
No ratings yet
Anand Institute of Higher Technology Department of Computer Science and Engineering ACADEMIC YEAR: 2018-19 Mini Project Report
9 pages
Sentiment Analysis On Twitter Data Using Machine Learning Algorithms in Python
No ratings yet
Sentiment Analysis On Twitter Data Using Machine Learning Algorithms in Python
15 pages
document-dsbda-codes-for-mini-project
No ratings yet
document-dsbda-codes-for-mini-project
9 pages
Notes
No ratings yet
Notes
6 pages
Activity. Streaming Twitter Data
No ratings yet
Activity. Streaming Twitter Data
4 pages
Lab No 6 - Twitter - Neuro
No ratings yet
Lab No 6 - Twitter - Neuro
2 pages
Part C - Assignment No. 2 Mini-Project On Twitter
No ratings yet
Part C - Assignment No. 2 Mini-Project On Twitter
7 pages
Twitter Sentiment Analysis
No ratings yet
Twitter Sentiment Analysis
13 pages
CS5228 Project 2 Twitter Sentiment Analysis Group No.: 29: 1 Problem Statement
No ratings yet
CS5228 Project 2 Twitter Sentiment Analysis Group No.: 29: 1 Problem Statement
15 pages
Template For The First Slide of PPT Presentation1
No ratings yet
Template For The First Slide of PPT Presentation1
18 pages
AminaRahmanK DL Lab5
No ratings yet
AminaRahmanK DL Lab5
11 pages
Twitter Sentiment Analysis Using Python
No ratings yet
Twitter Sentiment Analysis Using Python
21 pages
Twitter Python Assignment
No ratings yet
Twitter Python Assignment
8 pages
VLSI Programs
No ratings yet
VLSI Programs
159 pages
174e-Boiler VFD Panel-01 (For Ib, FD and Pa Fan) - As Built
No ratings yet
174e-Boiler VFD Panel-01 (For Ib, FD and Pa Fan) - As Built
23 pages
Cybersource Credit Card Services Using The Simple Order API
No ratings yet
Cybersource Credit Card Services Using The Simple Order API
93 pages
Module3 ProgrammingConcepts
No ratings yet
Module3 ProgrammingConcepts
36 pages
Analyzing Social Media Data in Python Chapter1
No ratings yet
Analyzing Social Media Data in Python Chapter1
21 pages
Individual Assignment #1: Data Source and Libraries
No ratings yet
Individual Assignment #1: Data Source and Libraries
2 pages
Simplify LAN segmentation with ISE and Cisco DNA Center
No ratings yet
Simplify LAN segmentation with ISE and Cisco DNA Center
37 pages
Mcs Uat Form
No ratings yet
Mcs Uat Form
15 pages
Import Tweepy
No ratings yet
Import Tweepy
4 pages
Week 6 Windows GDI - Text Output
No ratings yet
Week 6 Windows GDI - Text Output
29 pages
ArduCAM ESP32 UNO DS
No ratings yet
ArduCAM ESP32 UNO DS
17 pages
FND123 Assignment Week 2
No ratings yet
FND123 Assignment Week 2
10 pages
Summary
No ratings yet
Summary
35 pages
5th Sem QN Paper 2022
No ratings yet
5th Sem QN Paper 2022
15 pages
Sentiment Analysis Python
No ratings yet
Sentiment Analysis Python
3 pages
Senti bp1
No ratings yet
Senti bp1
2 pages
Assignment / Tugasan HBMS4903 Assessment in Early Childhood Music Education/ January 2022 Semester
No ratings yet
Assignment / Tugasan HBMS4903 Assessment in Early Childhood Music Education/ January 2022 Semester
11 pages
Spiced Academy Data Science Curriculum
No ratings yet
Spiced Academy Data Science Curriculum
15 pages
Belize Xir p8600 Acc Ss Af4-06-019 100512
No ratings yet
Belize Xir p8600 Acc Ss Af4-06-019 100512
4 pages
Delixi Industrial-Control Product Type Selection Manual: Delixi Hangzhou Inverter Co.,Ltd
No ratings yet
Delixi Industrial-Control Product Type Selection Manual: Delixi Hangzhou Inverter Co.,Ltd
14 pages
Whole House Water Softener Iom
No ratings yet
Whole House Water Softener Iom
32 pages
NS2 Simulation Tutorial
No ratings yet
NS2 Simulation Tutorial
14 pages
Basic Tweet Preprocessing in Python: 1. Hashtag Extraction Using Regex
No ratings yet
Basic Tweet Preprocessing in Python: 1. Hashtag Extraction Using Regex
2 pages
Swisscom Stands Up New Gen AI Banking Use Case Within 12 Weeks With MongoDB _ Case Study _ MongoDB
No ratings yet
Swisscom Stands Up New Gen AI Banking Use Case Within 12 Weeks With MongoDB _ Case Study _ MongoDB
6 pages
Implementationofcyclic Sust
No ratings yet
Implementationofcyclic Sust
11 pages
Signage Solution
No ratings yet
Signage Solution
1 page
Engineer Computer Graphics
No ratings yet
Engineer Computer Graphics
4 pages
1 Introduction PDF
No ratings yet
1 Introduction PDF
60 pages
Twitter Scraping Streamlit - Py
No ratings yet
Twitter Scraping Streamlit - Py
2 pages
Permutation (Part 2)
No ratings yet
Permutation (Part 2)
3 pages
1 PB
No ratings yet
1 PB
11 pages
1 s2.0 S1877050920308218 Main
No ratings yet
1 s2.0 S1877050920308218 Main
8 pages
Difference Between Binary and ASCII
No ratings yet
Difference Between Binary and ASCII
1 page
Ispring Suite Max: An Extremely Fast Elearning Course Authoring Tool For Teams
No ratings yet
Ispring Suite Max: An Extremely Fast Elearning Course Authoring Tool For Teams
3 pages
Flyer DTS N62 Series
No ratings yet
Flyer DTS N62 Series
2 pages
Market Trends - Assessment 1 - v6.2
No ratings yet
Market Trends - Assessment 1 - v6.2
5 pages
Essential n8n Playbook
From Everand
Essential n8n Playbook
Leandro Calado
No ratings yet
Angular Generative AI: Building an intelligent CV enhancer with Google Gemini
From Everand
Angular Generative AI: Building an intelligent CV enhancer with Google Gemini
Abdelfattah Ragab
No ratings yet
Stripe Integration in Angular: A Step-by-Step Guide to Creating Payment Functionality
From Everand
Stripe Integration in Angular: A Step-by-Step Guide to Creating Payment Functionality
Abdelfattah Ragab
No ratings yet
C# Interview Questions, Answers, and Explanations: C Sharp Certification Review
From Everand
C# Interview Questions, Answers, and Explanations: C Sharp Certification Review
equitypress
4.5/5 (3)
Fresher PyQt5: A Beginner’s Guide to PyQt5
From Everand
Fresher PyQt5: A Beginner’s Guide to PyQt5
Edward Chang
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
ETHICAL HACKING GUIDE-Part 3: Comprehensive Guide to Ethical Hacking world
From Everand
ETHICAL HACKING GUIDE-Part 3: Comprehensive Guide to Ethical Hacking world
POONAM DEVI
No ratings yet
Microsoft Visual Basic Interview Questions: Microsoft VB Certification Review
From Everand
Microsoft Visual Basic Interview Questions: Microsoft VB Certification Review
Equity Press
No ratings yet
Inspiring Powershell Articles
From Everand
Inspiring Powershell Articles
Murat Yildirimoglu
No ratings yet
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Spring Boot Intermediate Microservices: Resilient Microservices with Spring Boot 2 and Spring Cloud
From Everand
Spring Boot Intermediate Microservices: Resilient Microservices with Spring Boot 2 and Spring Cloud
Jens Boje
No ratings yet

Sentiment Analysis - Comparing Algorithms Accuracy

Uploaded by

Sentiment Analysis - Comparing Algorithms Accuracy

Uploaded by

2/7/22, 5:06 PM Sentiment Analysis - Comparing Algorithms Accuracy

Author: Adegbenro Michael Olusola:

Import Necessary Libraries

from nltk.corpus import stopwords

from wordcloud import WordCloud,STOPWORDS

from nltk.stem.porter import PorterStemmer

from nltk.corpus import stopwords

import matplotlib.pyplot as plt

from pandas.plotting import scatter_matrix

from nltk.stem import WordNetLemmatizer

import seaborn as sns

from sklearn.metrics import classification_report

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import LabelEncoder,StandardScaler

from nltk.sentiment.vader import SentimentIntensityAnalyzer as SIA

from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import train_test_split

from textblob import TextBlob

from sklearn.metrics import confusion_matrix

from sklearn.metrics import classification_report

from sklearn.metrics import accuracy_score

from sklearn.metrics import precision_score

from sklearn.metrics import recall_score

from sklearn.metrics import f1_score

from sklearn.linear_model import LinearRegression,Ridge,Lasso,ElasticNet

from sklearn.tree import DecisionTreeRegressor

from sklearn import metrics

from nltk import word_tokenize

from wordcloud import WordCloud,STOPWORDS

localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/Sentiment Analysis/ Sentime… 1/22

from matplotlib.axes._axes import _log as matplotlib_axes_logger

Extract Data fromm Twitter

auth = tw.OAuthHandler(my_api_key, my_api_secret)

api = tw.API(auth, wait_on_rate_limit=True)

search_query = "#Ottawa -filter:retweet"

# store the API responses in a list

for tweet in tweets:

print("Total Tweets fetched:", len(tweets_copy))

# intialize the dataframe

# populate the dataframe

for tweet in tweets_copy:

for hashtag in tweet.entities["hashtags"]:

text = api.get_status(id=tweet.id, tweet_mode='extended').full_text

data = data.append(pd.DataFrame({'user_name': tweet.user.name,'ID': tweet.id_str,

'hashtags': [hashtags if hashtags else None],

Total Tweets fetched: 500

localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/Sentiment Analysis/ Sentime… 2/22

In [2]: #load data from local drive

data = pd.read_csv("Raw Data.csv")

#If the value is closer to zero, then it shows less skew.

'''Make text lowercase, remove text in square brackets,remove links,remove punctuat

text = re.sub('\[.*?\]', '', text)

text = re.sub('https?://\S+|www\.\S+', '', text)

text = re.sub('<.*?>+', '', text)

text = re.sub('[%s]' % re.escape(string.punctuation), '', text)

text = re.sub('\n', '', text)

text = re.sub('\w*\d\w*', '', text)

#Removing Stop Words

final_tokens = [w for w in tokens if w not in stopword]

#reducing a word to its word stem

localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/Sentiment Analysis/ Sentime… 3/22

return ' '.join(finalwords)

#Apply to relevant columns

data["user_description"] = data["user_description"].apply(lambda x:clean_text(x))

data["user_name"] = data["user_name"].apply(lambda x:clean_text(x))

data["text"] = data["text"].apply(lambda x:clean_text(x))

data['text'] = data["text"].apply(lambda x: process_tweets(x))

# Although, we don't need more than text to perform our analysis

localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/Sentiment Analysis/ Sentime… 4/22

user_name ID user_location user_description user_verified date

Vader Sentiment Analysis

data['Sentiments'] = data['text'].apply(lambda x: sid.polarity_scores(' '.joi

data['Neutral Sentiment'] = data['Sentiments'].apply(lambda x: x['neu']+1*(10**-6))

data['Negative Sentiment'] = data['Sentiments'].apply(lambda x: x['neg']+1*(10**-6))

localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/Sentiment Analysis/ Sentime… 5/22

user_name ID user_location user_description user_verified date

text = re.sub('\w\d\w', '', text)