0% found this document useful (0 votes)
72 views

Natural Language Processing Assignment

The document describes a Twitter sentiment analysis project that aims to identify tweets containing hate speech or violence using machine learning techniques. It outlines preprocessing steps like tokenization, stemming, and punctuation removal. A naive Bayes classifier is then trained on the preprocessed tweet data and evaluated using a test set, achieving an F1 score of 0.53. The model is then used to classify tweets in a test set, achieving a score of 0.567. In conclusion, the document discusses how sentiment analysis can be applied to social media trend analysis and marketing using Python libraries.

Uploaded by

kuymancho
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views

Natural Language Processing Assignment

The document describes a Twitter sentiment analysis project that aims to identify tweets containing hate speech or violence using machine learning techniques. It outlines preprocessing steps like tokenization, stemming, and punctuation removal. A naive Bayes classifier is then trained on the preprocessed tweet data and evaluated using a test set, achieving an F1 score of 0.53. The model is then used to classify tweets in a test set, achieving a score of 0.567. In conclusion, the document discusses how sentiment analysis can be applied to social media trend analysis and marketing using Python libraries.

Uploaded by

kuymancho
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

NATURAL LANGUAGE PROCESSING ASSIGNMENT

TWITTER SENTIMENT ANALYSIS

PROBLEM STATEMENT:

The objective of this task is to detect tweets having hate, violence


provoking words . We can say a tweet contains hate speech if it contains
provoking comments against a religion, caste or region . Our work here
is to seperate these type of tweets from other tweets.

SOURCE CODE :

data = pd.read_csv('Sentiment Analysis


Dataset.csv',error_bad_lines=False)
data.columns = ['id','label','source','text']
data.head(2)
data = data.drop(['id','source'],axis=1)
data.head(10)

PREPROCESSING:

1. TOKENIZATION:

tokenized_tweet = combi[‘tidy_tweet’].apply(lambda x: x.split())


tokenized_tweet.head()

2. STEMING:

from nltk.stem.porter import *


stemmer = PorterStemmer()
tokenized_tweet = tokenized_tweet.apply(lambda x: [stemmer.stem(i) for
i in x]) # stemming
tokenized_tweet.head()

3. PUNCTUATION REMOVAL:

combi['tidy_tweet'] = combi['tidy_tweet'].str.replace("[^a-zA-Z#]", "


")
NAIVE BAYES CLASSIFICATION:

Naive Bayes Classifier is a classification algorithm that relies on


Bayes’ Theorem. This theorem provides a way of calculating a type or
probability called posterior probability, in which the probability of
an event A occurring is reliant on a probabilistic known background.

PROGRAM:

from sklearn.model_selection import train_test_split


from sklearn.metrics import f1_score

train_bow = bow[:31962,:]
test_bow = bow[31962:,:]

# splitting data into training and validation set


xtrain_bow, xvalid_bow, ytrain, yvalid = train_test_split(train_bow,
train['label'], random_state=42, test_size=0.3)

lreg = LogisticRegression()
lreg.fit(xtrain_bow, ytrain) # training the model

prediction = lreg.predict_proba(xvalid_bow) # predicting on the


validation set
prediction_int = prediction[:,1] >= 0.3 # if prediction is greater than
or equal to 0.3 than 1 else 0
prediction_int = prediction_int.astype(np.int)

f1_score(yvalid, prediction_int) # calculating f1 score

Output: 0.53

SCORE CALCULATION:

test_pred = lreg.predict_proba(test_bow)
test_pred_int = test_pred[:,1] >= 0.3
test_pred_int = test_pred_int.astype(np.int)
test['label'] = test_pred_int
submission = test[['id','label']]
submission.to_csv('sub_lreg_bow.csv', index=False)

The score is 0.567.


RESULT:
Sentiment Analysis is an interesting way for the
applicability of Natural Language Processing in making automated
conclusions about text. It is being utilized in social media trend
analysis and, sometimes, for marketing purposes. Making a Sentiment
Analysis program in Python is not a difficult task, Now a days we have
so many ready-for-use libraries in Phython . This makes our task very
easy.This program is an explanation to how the application works.

You might also like