Natural Language Processing Assignment

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

NATURAL LANGUAGE PROCESSING ASSIGNMENT

TWITTER SENTIMENT ANALYSIS

PROBLEM STATEMENT:

The objective of this task is to detect tweets having hate, violence


provoking words . We can say a tweet contains hate speech if it contains
provoking comments against a religion, caste or region . Our work here
is to seperate these type of tweets from other tweets.

SOURCE CODE :

data = pd.read_csv('Sentiment Analysis


Dataset.csv',error_bad_lines=False)
data.columns = ['id','label','source','text']
data.head(2)
data = data.drop(['id','source'],axis=1)
data.head(10)

PREPROCESSING:

1. TOKENIZATION:

tokenized_tweet = combi[‘tidy_tweet’].apply(lambda x: x.split())


tokenized_tweet.head()

2. STEMING:

from nltk.stem.porter import *


stemmer = PorterStemmer()
tokenized_tweet = tokenized_tweet.apply(lambda x: [stemmer.stem(i) for
i in x]) # stemming
tokenized_tweet.head()

3. PUNCTUATION REMOVAL:

combi['tidy_tweet'] = combi['tidy_tweet'].str.replace("[^a-zA-Z#]", "


")
NAIVE BAYES CLASSIFICATION:

Naive Bayes Classifier is a classification algorithm that relies on


Bayes’ Theorem. This theorem provides a way of calculating a type or
probability called posterior probability, in which the probability of
an event A occurring is reliant on a probabilistic known background.

PROGRAM:

from sklearn.model_selection import train_test_split


from sklearn.metrics import f1_score

train_bow = bow[:31962,:]
test_bow = bow[31962:,:]

# splitting data into training and validation set


xtrain_bow, xvalid_bow, ytrain, yvalid = train_test_split(train_bow,
train['label'], random_state=42, test_size=0.3)

lreg = LogisticRegression()
lreg.fit(xtrain_bow, ytrain) # training the model

prediction = lreg.predict_proba(xvalid_bow) # predicting on the


validation set
prediction_int = prediction[:,1] >= 0.3 # if prediction is greater than
or equal to 0.3 than 1 else 0
prediction_int = prediction_int.astype(np.int)

f1_score(yvalid, prediction_int) # calculating f1 score

Output: 0.53

SCORE CALCULATION:

test_pred = lreg.predict_proba(test_bow)
test_pred_int = test_pred[:,1] >= 0.3
test_pred_int = test_pred_int.astype(np.int)
test['label'] = test_pred_int
submission = test[['id','label']]
submission.to_csv('sub_lreg_bow.csv', index=False)

The score is 0.567.


RESULT:
Sentiment Analysis is an interesting way for the
applicability of Natural Language Processing in making automated
conclusions about text. It is being utilized in social media trend
analysis and, sometimes, for marketing purposes. Making a Sentiment
Analysis program in Python is not a difficult task, Now a days we have
so many ready-for-use libraries in Phython . This makes our task very
easy.This program is an explanation to how the application works.

You might also like