0% found this document useful (0 votes)
28 views11 pages

Youtube Comments Sentiment Analysis 2

Uploaded by

gara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views11 pages

Youtube Comments Sentiment Analysis 2

Uploaded by

gara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

International Journal of Scientific Research in Engineering and Management (IJSREM)

Volume: 05 Issue: 05 | May - 2021 ISSN: 2582-3930

YOUTUBE COMMENTS SENTIMENT ANALYSIS


Ritika Singh#1 ,Ayushka Tiwari *2
1.Assistant Professor, Department of CSE, SRM Institute of Science and Technology, Ghaziabad
2.Department of CSE, SRM Institute of Science and Technology, Ghaziabad

ABSTRACT (DT), K-Nearest Neighbor (KNN) and Random


Over time, textual information has increased Forest (RF) are implemented. Then the accuracy of
exponentially, resulting to the potential research the system is evaluated using different evaluation
within the field of machine learning (ML) and metrics e.g. F-score and Accuracy score.
natural language processing (NLP). Sentiment
analysis of you-tube comments is a very interesting Keywords— Sentimental analysis; citations; machine
topic nowadays.While many of these videos have a learning; classification;
significant number of user comments and reviews,
1. INTRODUCTION
little work has been done so far in extracting trends
from these comments due to their low information In this work, we will collect the data from the you-tube
consistency and quality. In this paper we perform comments of the public and measures the attitude of
sentiment analysis on the YouTube comments the user towards the aspects of a video which they
related to popular topics using machine learning describe in a text.
techniques/algorithms. We demonstrate that an Sentiment analysis is useful for quickly gaining
analysis of the sentiments to spot their trends, the whole idea by using large number of text data and
seasonality and forecasts can provide a transparent it will be helpful to understand the user’s opinion.
picture of the influence of real-world events on Sentimental analysis is additionally referred as opinion
public sentiments. Results show that the trends in mining that means to find out or identify the positive,
users’ sentiments are well correlated to the real- negative, neutral opinions, views, attitudes,
world events associated with the respective impressions, emotions and feelings indicated in the text.
keywords . The main purpose of this research is to Current YouTube usage statistics indicate the
facilitate researchers to identify quality research approximate scale of the site: at the time of this writing
papers on their sentiment analysis. In this research, there are quite 1 billion unique users viewing video
sentiment analysis of you-tube comments using content, watching over 6 billion hours of video each
citation sentences is carried out using an existing month. Also, YouTube accounts for 20% of web traffic
constructed annotated corpus. This corpus is and 10% of total internet traffic.
consisted of 1500 citation sentences. The noise was YouTube provides many social mechanisms to
cleaned from data using different data gauge user opinion and views a few video by means of
normalization rules in order to clean the comments voting, rating, favourites, sharing and negative
from the corpus. To perform classification on this comments, etc. It’s important to notice that YouTube
data set we developed a system in which six provides more than just video sharing; beyond
different machine learning algorithms including uploading and viewing videos, users can subscribe
Naïve-Bayes (NB), Support Vector Machine video channels and may interact with other users
(SVM),Logistic Regression (LR), Decision Tree through comments. YouTube is generally a comprise

© 2021, IJSREM | www.ijsrem.com Page 1


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 05 Issue: 05 | May - 2021 ISSN: 2582-3930

between comments, views, comment ratings and topic


of implicit and explicit user-user interaction. This user-
to-user social aspect of YouTube (the YouTube social categories. The authors show promising leads to
network) has been cited together key differentiating predicting the comment ratings of latest unrated
factor compared to other traditional content providers. comments by building prediction models using the
Text analytics is that the analysis of “unstructured” already rated comments. Pang, Lee and Vaithyanathan
data contained in natural language text using various perform sentiment analysis on 2053 movie reviews
methods machine learning tools, and techniques. Text collected from the web Movie Database (IMDb). They
analysis offers a very low-cost method to gauge public examined the hypothesis that sentiment analysis are
opinion. often treated as a special case of topic-based text
In this research work, we have done classification. Their work depicted that standard
sentimental analysis of public comments by using an machine learning techniques such as Naive Bayes or
annotated corpus consists of citation sentences. The Support Vector Machines (SVMs) outperform manual
corpus is made up of 1500 citation sentences. The classification techniques that involve human
corpus is annotated using some rules to assign the intervention.
polarity to citation sentences. However, the accuracy of sentiment classification
We’ve developed a system based on six different falls in need of the accuracy of ordinary topic-based
machine learning algorithms including Naïve-Bayes, text categorization that uses such machine learning
Support Vector Machine, Logistic Regression, techniques. They reported that the simultaneous
Decision Tree, KNearestNeighbor and Random presence of positive and negative expressions
Forest.Accuracy of the classification algorithms has (thwarted expectations) within the reviews make it
been evaluated using different evaluations measures difficult for the machine learning techniques to
e.g., F-Score and Accuracy score to evaluate the accurately predict the emotions .
classification system’ correctness. To improve our Another work on the YouTube comments was done
system’ performance, we’ve used different features by Smita Shree and Josh Brolin where the authors
selection techniques like lemmatization, n-gaming, proposed an unsupervised lexicon-based approach to
tokenization, stop words and punctuation removal. detect sentiment polarity of user comments in
YouTube. They adopted a knowledge driven approach
2. RELATED WORK and ready a social media list of terms and phrases
expressing 6 the user sentiment and opinion. But their
Several researchers have performed sentiment results also showed that recall of negative sentiment is
analysis of social networks like Twitter and poorer compared to the positives, which can flow from
YouTube .These works affect comments, tweets and to the wide linguistic variation utilized in expressing
other metadata collected from the social network frustration and dissatisfaction.
profiles of users or of public events that are collected Other works have performed sentiment analysis of
and analyzed to get significant and interesting insights social networks like Twitter to point out that there
about the usage of these social network websites by the exists a relationship between the moods of individuals
overall mass of individuals . The work most closely to the result of events within the social, political,
associated with ours is by Siersdorfer et al. They cultural and economic spheres. Another research on the
analyzed quite 6 million comments collected from social media sentiment analysis is completed by
67,000 YouTube videos to identify the connection A.Kowcika et al. In their paper they propose a system

© 2021, IJSREM | www.ijsrem.com Page 2


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 05 Issue: 05 | May - 2021 ISSN: 2582-3930

which is in a position to gather useful information from ROC curve. They also show that the modified version
the twitter website and efficiently perform sentiment of MNB is extremely closely associated with the
analysis of tweets regarding the Smart phone war. The straightforward centroid-based classifier and compare
system uses efficient scoring system for predicting the the 2 methods empirically.
user’s age. The user’s gender is predicted employing a Another work on the sentiment analysis of social
well-trained Naïve Bayes Classifier. Sentiment media is completed using multimodal approach,
Classifier Model labels the tweet with a sentiment. discussed within the paper by Diana Maynard et al.[47].
KrisztianBalog et al. proposed in his paper a way to They examine a specific use case, which is to assist
gather useful information from the twitter website and archivists select material for inclusion in an archive of
efficiently perform sentiment analysis of tweets social media for preserving community memories,
regarding the Smart phone war. The system uses moving towards structured preservation around
efficient rating system for predicting the user’s age. semantic categories. The textual approach they take is
Twitter Sentiment 8 Analysis: the great the Bad and rule-based and builds on variety of subcomponents,
therefore the OMG!, paper by EfthymiosKouloumpis taking under consideration issues inherent in social
et al. deals with the utility of linguistic features for media like noisy ungrammatical text, use of swear
detecting the sentiment of Twitter messages. They words, sarcasm etc[1]Athar, A. (2014). Sentiment
evaluate the usefulness of existing lexical resources analysis of scientific citations (No.UCAMCL-TR-856).
also as features that capture information about the University of Cambridge, Computer Laboratory. The
informal and artistic language utilized in micro- author used NB and SVM classifier and compute the
blogging. Another sentiment analysis of web text was accuracies of the system using an F-score. Macro F-
done using the blog posts by Gilad Mishne et al. scores using uni-gram mentioned within the research
One of the foremost prominent works in website work is 48 percent. [2] Pang, B., Lee, L.,
classification was done by Daniele Riboni in the paper Vaithyanathan, S. (2002, July). Thumbs up? sentiment
“Feature Selection for website Classification”[44]. classification using machine learning techniques.
They conducted various experiments on a corpus of Author used label data for the purpose of classification,
8000 documents belonging to 10 Yahoo! categories they preferred the supervised learning approach. For
using Kernel Perception and Naive Bayes classifiers. the purpose of classification, the Na¨ıve Bayes
Their experiments show the usefulness of classifier is used.
dimensionality reduction and of a replacement In this work, they need used a dataset of movie
structured oriented weighing technique. They also reviews. [3]Sentiment analysis and opinion mining
introduce a replacement method for representing linked (Liu, 2012):- Sentiment analysis and opinion mining is
pages using local information that creates hypertext the field of study that analyses people’s opinions,
categorization feasible for real-time applications. sentiments, evaluations, attitudes, and emotions from
written language. It is one among the foremost active
Other classification works are just like the one research areas on natural language processing and is
done by Eibe Frank et al.[46] In their paper they also widely studied in data pre-processing, Web
propose an appropriate correction by adjusting attribute mining, and text mining. [4]Deep Learning for Hate
priors. This correction are often implemented as Speech Detection in Tweets by Pinkesh Badjatiya
another data normalization step, and that they show (IIIT-H) , Shashank Gupta (IIIT-H), Manish Gupta
that it can significantly improve the world under the (Microsoft), Vasudeva Varma (IIIT-H) (June 1st,

© 2021, IJSREM | www.ijsrem.com Page 3


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 05 Issue: 05 | May - 2021 ISSN: 2582-3930

2017):- One of the most useful applications of and compute accuracies using F-score and accuracy
sentiment classification models is that the detection of score. Secondly, in order to improve the accuracy
hate speech. Recently, there are numerous reports of scores, we apply other features like (stop words &
the tough lives of content moderation staff. Our punctuation removal, lemmatization, etc.) along with
experiments on a benchmark dataset of 16K annotated n-grams and then again compute the accuracies. The
tweets show that such deep learning methods latter approach helps to reduce the noise and
outperform state-of-the-art char/word n-gram methods. complexity of the data. Thirty iterations of each
[5]Mehmood, K., Essam, D., Shafi, K. (2018, experiment were conducted to compute average results
July).Sentiment Analysis System for Roman Urdu. In and a total of six experiments were performed. After
Science and Information Conference (pp. 29- computing the accuracies of each phase.
42) .Springer, Cham. They used their data set which is We then select the best feature which is giving the
based on Urdu reviews related to movies, politics, best result and which classifier is better in a specific
mobile, dramas and miscellaneous domains extracted scenario.
using scrapers as well as manual. The data-set was then
classified using different supervised learning classifiers
and compare their results with each other.

3. METHODOLOGY
The purpose of the methodology is defined in this
section. Our methodology is depicted in Fig. 1. First of
all, we used the annotated dataset. We used python
based machine learning library named Scikit-Learn for
implementing the system. Scikit-Learn is a well-known
machine learning library tightly integrated with Python
language and provides easy-to-interact interface.
First of all our system reads the data stored in the file
having (Tab Separated Values) format. After reading,
pre-processing phase is applied to clean and prepare
the data for the use of machine learning algorithms.
Directly text data cannot be given to machine learning
algorithms, it should be converted into a suitable type.
Using Scikit-Learn module named “count-
vectorizer ”, the text data firstly convert into numeric
format and prepare the matrix of tokens count.
Now the data is ready for machine learning
algorithms. Then 60% of data is splitted randomly to
train the classifier and 40% for testing the classifier’
accuracy.
We perform our experiments in two phases, firstly ,
we just apply N-grams (Length 1-3) features on data Fig. 1. Step by Step Flow of System Working.

© 2021, IJSREM | www.ijsrem.com Page 4


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 05 Issue: 05 | May - 2021 ISSN: 2582-3930

3.1 EVALUATION METRICS Lemmatization is a process of normalizing the


inflected forms of words. Homographic words
The evaluation of any research product decides the
cause ambiguity that disturbs searching
status and quality of that specific research work. This
accuracy and this ambiguity may also occur
section briefly describes about the metrics used to
due to inflectional word forms . For instance,
evaluate the sentimental analysis system we developed.
words like “Talking”, “Talks” and “Talked” are
The performance of sentimental analysis system is
the inflected forms of the word “Talk”. The
evaluated by computing the accuracy of the
process of lemmatization and stemming is
classification results given by the system. Accuracy of
similar with minor changes, while the benefits
the system is to be mentioned in the form of some units
of both approaches are the same. We have
that include F-score and Accuracy score.
applied only lemmatization and avoid
stemming due to the problems of stemming
In our evaluation phase, we have calculated both
process. The stemming process is worthwhile
Macro-F Score as well as Micro-F Score. Where FP is
for short retrieval lists, while our system has to
considered an error of type-1(false positive) and FN is
deal with large data set and processing lists so
considered an error of type-2(false negative). F-score is
we did apply stemming. Stemming performs
commonly used, a harmonic mean between precision
normalization of inflected words by keeping
and recall.
different variations of words along with their
derivation process

3.2 DATA PRE-PROCESSING C. Stop Words and Punctuation


English text contains a lot of meaningless and
As corpus used for sentimental analysis classification non informative words called stop words.
is prepared or constructed. This data set is comprised These are not required in classification because
of a total of 1500 citation sentences annotated as their presence just increase the size of data. So
positive, negative, and neutral after applying rules. we applied stop words removal technique in
From total citation sentences, 60% of sentences were order to cleanse the data for better and efficient
chosen randomly for training the classifier and the rest classification . Some research works support
of 40% data was used for classifier’ testing. The data the stop words removal from the data set to
set was cleaned to get the highest accuracy of the reduce the dimensions of data.
system.

A. Features Selection
For the sake of developing a system for 3.3 ALGORITHMS USED
sentiment analysis, different features are
provided by ML .We have used various
This work attempted to utilize six machine learning
features e.g. lemmatization, n-grams, stop
techniques for the task of sentiment analysis. The
words and term-document frequency to
modeling of all techniques is briefly discussed below.
evaluate the classifier’ accuracy. Later the
evaluation results will be displayed.
 classification classifiers
B. Lemmatization

© 2021, IJSREM | www.ijsrem.com Page 5


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 05 Issue: 05 | May - 2021 ISSN: 2582-3930

After pre-processing and features selection the very


next step is to apply classification algorithms. Many c) Decision Tree: In various fields of text
text classifiers have been purposed in literature. We classification the use of decision tree classifier can be
have used 6 algorithms of machine learning including seen and analysed. Its popularity is based on the nature
Naïve-Bayes (NB), Support Vector Machine (SVM), of classification rules that make it interesting for NLP
Logistic Regression (LR), Decision Tree (DT), K- researchers. The decision is constructed by selecting
Nearest Neighbor (KNN), and Random Forest (RF). the data from the data-set randomly.
a)Naïve Bayes: Naïve- Bayes is the most popular Advantages are-Understandable prediction rules are
classification algorithm due to its simplicity and created from the training dataBuilds the fastest
effectiveness. This classifier works according to the treeBuilds a short treeOnly need enough attributes until
concept of Bayes theorem. It’s a kind of module all data is classified and the disadvantages are-
classifier that follows the idea of probabilities for the Data may be over-fitted or over-classified, if a small
purpose of classification. sample is tested, Only one attribute at a time is tested
The benefit of using Naïve Bayes on text for making a decision, Does not handle numeric
classification is that it needs less dataset for training. attributes and missing valuesTo prevent overfishing,
removal of numeric, foreign words, html tags and we optimize the hyper-parameters of Decision Trees
special symbols yielding the set of words. This pre- like max_features, min_samples_split,max_depth, etc.
processing produces word-category pairs for training
set. Consider a word ‘y’ from test set (unlabeled word d)Random Forest: mentioned the importance of a
set) and a window of n-words (x1, x2, …… xn) from a random forest classifier and compared its performance
document. with the other classifiers claimed that the random
forest algorithm provides efficient and discriminative
The conditional probability of given data point ‘y’ to classification, as a result, it is considered an interesting
be in the category of n-words from training set is given classifier.
by:
e) K-th Nearest Neighbour: KNN is a simple and
P(y/x1,x2,……xn)=P(y)×∏i=1nP(xi/y)P(x1, efficient classifier. Called lazy learner because its
training phase contains nothing but storing all the
x2,……xn)
training examples as classifiers. KNN requires a lot of
memory while storing the training values.
the K-nearest neighbor algorithm essentially said that
b) Support Vector Machine: In the world of for a given value of K algorithm will find the K nearest
machine learning one such supervised learning neighbor of unseen data point and then it will assign the
algorithm that achieves enough improvements on a class to unseen data point by having the class which has
variety of tasks is a Support vector machine classifier. the highest number of data points out of all classes of K
Particularly in the case of analysing the neighbors.
sentiments.SVM algorithms had made excellent
classifiers because the more complex the data will be
the more accurate the prediction will be.

© 2021, IJSREM | www.ijsrem.com Page 6


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 05 Issue: 05 | May - 2021 ISSN: 2582-3930

3.4 FEATURES USED review = [ps.stem(word) for word in review if


Word clouds or tag clouds are the graphical not word in set(all_stopwords)]
representations of word frequency that give greater review = ' '.join(review)
prominence to words that appear more frequently in a corpus.append(review)
source text. The larger the word in the visual the more
common the word was in thecomments. This type of #Creating the Bag of Words model
visualization can assist evaluators with exploratory
textual analysis by identifying words that frequently from sklearn.feature_extraction.text import
appear in a set of interviews, documents, or other text. CountVectorizer
It can also be used for communicating the most salient cv = CountVectorizer(max_features = 1500)
points or themes in the reporting stage. X = cv.fit_transform(corpus).toarray()
y = dataset.iloc[:, -1].values
3.5 SOURCE CODE AND
IMPLEMENTATION
#importing liberaries #Splitting the dataset into the Training set and
Test set
import numpy as np
import matplotlib.pyplot as plt from sklearn.model_selection import
import pandas as pd train_test_split
import seaborn as sns X_train, X_test, y_train, y_test =
train_test_split(X, y, test_size = 0.20,
random_state = 0)
#cleaning the dataset(removing stopwords,
stemming).

import re #Training the Naive Bayes, SVM , K-NN


import nltk model , Decision Tree model , Random Forest
nltk.download('stopwords') model on the dataset
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer from sklearn.naive_bayes import GaussianNB
corpus = [] classifier = GaussianNB()
for i in range(0, 1000): classifier.fit(X_train, y_train)
review = re.sub('[^a-zA-Z]', ' ',
dataset['Review'][i]) from sklearn.svm import SVC
review = review.lower() classifier = SVC(kernel = 'linear',
review = review.split() random_state = 0)
ps = PorterStemmer() classifier.fit(X_train, y_train)
all_stopwords = stopwords.words('english')
all_stopwords.remove('not') from sklearn.linear_model import
LogisticRegression

© 2021, IJSREM | www.ijsrem.com Page 7


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 05 Issue: 05 | May - 2021 ISSN: 2582-3930

classifier = LogisticRegression(random_state = # for the visualization of positive sentiments on


0) wordcloud we will store all the comments
classifier.fit(X_train, y_train) having polarity 1 in comments_positive.

from sklearn.neighbors import Here the source code-


KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors comments_positive=comments[comments['pola
= 5, metric = 'minkowski', p = 2) rity']==1]
classifier.fit(X_train, y_train) !pip install wordcloud
total_comments=''.
from sklearn.tree import
join(comments_positive['comment_text'])
DecisionTreeClassifier wordcloud=WordCloud(width=1000,height=50
classifier = DecisionTreeClassifier(criterion = 0,stopwords=stopwords).generate(total_comme
'entropy', random_state = 0) nts)
classifier.fit(X_train, y_train) plt.figure(figsize=(15,5))
plt.imshow(wordcloud)
from sklearn.ensemble import plt.axis('off')
RandomForestClassifier
4.RESULTS
classifier =
RandomForestClassifier(n_estimators = 10,
criterion = 'entropy', random_state = 0) Different machine learning algorithms used for the
classifier.fit(X_train, y_train) classification. The evaluation metrics were used to
validate the system. The detailed description of
#finally ,Making the Confusion Matrix of all the experimental results using evaluation metrics
the classificaion models is defined in Table I, and Table II. In these tables
terms, A1, B1, C1 denotes simply unigram,
from sklearn.metrics import confusion_matrix, bigram, trigram features while A2, B2, and C2
accuracy_score denote the application of unigram, bigram, trigram
cm = confusion_matrix(y_test, y_pred) along with other features. Table I shows that
print(cm) Overall DT using n-grams gives the best F-score
accuracy_score(y_test, y_pred) in macro while RF is best in case of micro average.
LR is also overall best in the micro average
here, will be creating ,wordcloudvisualizations of without applying extra features. Uni-gram
the comments in our dataset. playssupport in better performance of LR and DT,
Basicallywordcloud is a data visualization uni-gram along with other features plays
technique used for representing text data in which significant performance in NB, KNN, and RF. DT
the size of each word indicates its frequency or gives better performance in the case of uni-grams,
importance. Significant textual data points can be bi-grams, and tri-grams. LR performance is
highlighted using awordcloud. significant in case of uni-grams only, k-th nearest
neighbor outperforms in the case of n-grams along
with the other features and give worst performs

© 2021, IJSREM | www.ijsrem.com Page 8


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 05 Issue: 05 | May - 2021 ISSN: 2582-3930

without other features while RF performs best as gives the best accuracy using uni-gram, LR
same as KNN. performance is significant in case of bigrams and
tri-grams, KNN outperforms in case of n-grams
The overall discussion describes that uni-gram, bi- without other features and gives worst performs
gram, and tri-gram without other features perform with other features. The overall discussion
best where unigram is at first position. Table II describes that uni-gram, bigrams, and tri-grams
shows that Overall SVM, LR, and RF performed without other features performs best and give
very best with the highest accuracy scores. N- significant accuracy scores.
grams play significant performance in NB, SVM

table I- f- scores

table-II- Accuracy scores

© 2021, IJSREM | www.ijsrem.com Page 9


International Journal of Scientific Research in Engineering and Management
(IJSREM)
Volume: 05 Issue: 05 | May - 2021 ISSN: 2582-
3930

Figure-II Results of word cloud visualizations implement alsoused the n-grams approach together with the
on our dataset lemmatization process to reduce data dimensions .
In this paper, we have implemented six classifiers
5.CONCLUSION LR, DT, KNN, and RF including NB, SVM . We
In this research work, we presented a sentiment computed the accuracies by increasing the number
analysis system for you-tube comments. We have of evaluation metrics F-score and accuracy
used different machine learning classifiers which including F-score to evaluate the accuracies with
are NB, SVM, DT, LR, KNN and RF along with the base system. Our results showed significant
different features to process the data and optimize improvement like in the case of Naïve Bayes using
the classification results. Experiments are uni-gram feature we achieved micro-F 87% while
performed on the data set . Data set is partitioned the base system described the result of micro-F = 78%
into training and testing sets according to the ratio and our results are approximately 9 % better. we
of 80:20 and. Accuracies of the classifiers are achieved the macro-F = 49% by reducing the data
computed by using various evaluation metrics like dimensions by using the lemmatization process and
F-score, and Accuracy score. The result shows that stop words removal mechanism. Based on bi-gram
SVM performs better than other classifiers. After and tri-gram features our system achieved the same
SVM Naïve Bayes performs well. In the case of the result of micro-F = 87%. In our case, the micro-F
macro average, the performance of SVM classifier based on bi-gram and tri-gram features increased by
is best while computing Fscore, and accuracy 11 %. while we improved our results to extant and
measures while the random forest is best in case of achieved maximum of (micro-F = 87%, macro-F =
micro average. Uni-grams, bi-grams, and tri-gram 49%) that shows the significant improvement of our
features performed very well and support the work.
classifiers to achieve highest accuracy scores..We

© 2021, IJSREM | www.ijsrem.com Page 10


International Journal of Scientific Research in Engineering and Management
(IJSREM)
Volume: 05 Issue: 05 | May - 2021 ISSN: 2582-
3930

6.REFERENCES workshop on computational approaches to


[1] Athar, A. (2014). Sentiment analysis of subjectivity, sentiment and social media analysis.
scientific citations (No. UCAMCL-TR-856). [9] Sobhani P, Mohammad S, Kiritchenko S (2016)
University of Cambridge, Computer Laboratory. Detecting stance in tweets and analyzing its
[2] Athar, A., Teufel, S. (2012, July). Detection of interaction with sentiment. In: Proceedings of the
implicit citations for sentiment detection. In 5th joint conference on lexical and computational
Proceedings of the Workshop on Detecting semantics.
Structure in Scholarly Discourse (ppp. 18-26). [10] Saif, H., He, Y., & Alani, H. (2012,
[3]Pedregosa, F., Varoquaux, G., Gramfort, A., November). Semantic sentiment analysis of twitter.
Michel, V., Thirion, B., Grisel, O., ... &Vanderplas In International semantic web conference (pp. 508-
J. (2011). Scikit-learn: Machine learning in Python. 524). Springer, Berlin, Heidelberg.
Journal of machine learning research. [11] Dashtipour K, Poria S, Hussain A, Cambria E,
[4] Poria S, Cambria E, Gelbukh A, Bisio F, Hawalah AY, Gelbukh A, Zhou Q (2016)
Hussain A (2015) Sentiment data flow analysis by Multilingual sentiment analysis: state of the art and
means of dynamic linguistic patterns independent comparison of techniques
[5] Turney PD, Mohammad SM (2014) 12] EfthymiosKouloumpis, Theresa Wilson, and
Experiments with three approaches to recognizing Johanna Moore. Twitter Sentiment Analysis: The
lexical entailment. Good the Bad and the OMG! In Proceedings of the
[6] Parvathy G, Bindhu JS (2016) A probabilistic Fifth International Conference on Weblogs and
generative model for mining cybercriminal network Social Media, ICWSM, (2011).
from online social media: a review. [13]. Cambria E, White B (2014) Jumping NLP
[7] Qazvinian, V., &Radev, D. R. (2010, July). curves: a review of natural language processing
Identifying non-explicit citing sentences for research.
citation-based summarization. In Proceedings of the [14]Mohammad SM, Zhu X, Kiritchenko S, Martin
48th annual meeting of the association for J (2015) Sentiment, emotion, purpose, and style in
computational linguistics (pp. 555-564). electoral tweets.
Association for Computational Linguistics.
[8]. Socher R (2016) deep learning for sentiment
analysis—invited talk. In: Proceedings of the 7th

© 2021, IJSREM | www.ijsrem.com Page 11

You might also like