Base 1
Base 1
Abstract—This research paper proposes an application of media data and then categorizing the fetched data after
sentiment analysis that works on the principle of machine performing the operations thereafter giving the final output of
learning. The proposed application provides a comparative all the data in short.
analysis of web series and movies of different genres of a
Social media sentiment analysis is often a wonderful supply of
particular time period on the basis of sentiments of the viewers.
knowledge and may offer insights that may: -
Data is fetched from twitter through API keys and twitter access
tokens. The movies and web series from the year 2017 to 2019 of 1. Determine to promote strategy.
four different genres were taken and sentiment analysis was 2. Improve success during the campaign.
performed on each web series and movie, which gives result in 3. Improve electronic messaging
the form of positive reviews and negative reviews. The famous 4. Improve the services of the clients
hashtag for each movie and web series are determined. The total
number of tweet counts is 3000. A Table of each genre was A. Types O f Sentiment Analysis
formed that contained the name of movie and web series, 1) Manual Processing
percentage of positive sentiments of corresponding web series or Manual processing basically means analyzing data
movie and percentage of negative sentiments of corresponding manually. In this process after collection o f data from
movie or web series. The graphical representation of each genre
a platform, analysis is performed manually on each
was done to analyze the results graphically. The combined
analysis was performed after calculating the average percentage
and every party of data and the result is also
reviews of a positive and negative sentiment of all the movies and generated manually. This concept is more time
web series of each genre. The graphical representation of the consuming and as the data is processed manually so it
combined analysis is done to analyze the final results. Through is difficult to perform analysis on a large amount of
the proposed application results were analyzed concluding that data.
whether movies or web series of a particular genre in the year 2) Keyword Processing
2017-19 were more liked by the viewers. Keyword processing means data is analyzed through
keywords. For example. Data containing words like
Keywords—Sentiment analysis; web series; movies; twitter;
extraordinary and love w ill be analyzed as positive
reviews
sentiments and data containing words like awful and
I. In t r o d u c t io n detest w ill be analyzed as negative sentiments
through keyword processing.
Sentiment analysis is an area of research that deals with the
3) NLP (Natural Language Processing)
emotions and sentiments of an individual. The data is analyzed
through a particular platform and the results come in the form It is basically a concept that acts as a relation
of positive data and negative data. It is basically analyzing the between computer language and human language.
sentiments or feelings of people all over the world on a There is a lexicon dictionary through which the
particular social networking platform and giving the output of analysis is done. Basically, this dictionary is a
the analysis in short by categorizing the data into positive data collection of data that contains a large amount of
and negative data. Sentiment analysis is widely applied to the information. This dictionary helps in analyzing the
voice of the client materials like reviews and survey data or sentiments into positive and negative ones.
responses. It is basically the concept of NLP which is a field Nowadays the concept of NLP and machine learning is used to
of computer science, linguistics concerned with the interaction perform sentiment analysis. This technology has made
between computer and natural languages of humans. analyzing data easier. In fact, these concepts give access to a
Sentiment Analysis is done to fetch and analyze any social large amount of data in one go due to which results that come
through analysis are more accurate.
Authorized licensed use limited to: University of Durham. Downloaded on May 16,2020 at 02:22:50 UTC from IEEE Xplore. Restrictions apply.
B. Working O f Sentiment Analysis Suhariyanto et al. in the year 2018 in the paper [2] proposed a
1. Get terms-Decrease each review(text) to the rundown method to predict the sentiment of the movie by taking data
of words (Using NLP Concepts of the tokenization). from rotten tomatoes by combining the sentiment from
2. Filtering-Expel pointless words that won't include an sentiwordnet to the score of a website. The basic idea was to
incentive for sentiment analysis - is, among, yet, and, filter the movie. The merits and demerits of this article is
it, that (NLP ideas of Stop words handling). presented in the Table 2.
3. Base Words-Convert all articulations to their root
TABLE n. M er it s a n d D e me r it s o f [2]
word.
4. Negation detection-Recognizing the invalidation MERIT DEMERIT
setting. The method helped in Count of data taken was
5. Feature Generation-Utilize the words subsequently overcoming the flaw of only 300, which is very less
extricated from a survey as highlights to show the sentiwordnet. to perform analysis to get an
energy or negativizes of that audit. A balance between data was accurate result.
6. Statistical classifier-Train an A I classifier to there which helped in Data was taken only from
anticipate energy. generating good results. rotten tomatoes. It could
Section I introduces the concept, type, and working of There F measure was better have been taken from more
sentiment analysis. Section II discusses the papers related to than other methods by 0.97 platforms like IMDB.
sentiment analysis from 2016 to 2019. Here the papers are
discussed with a brief description followed by merits and Tirath Prasad Sahu and Sanjeev Ahuja in the year 2016 in the
demerits. Section III explains the proposed application using a paper [3] proposed a strategy to classify movies on the various
flowchart and step by step description. Section IV analysis and scale and extracting features that affect the polarity of the
discusses the results and Section V concludes the paper with reviews of the movie. The merits and demerits of this article is
future scope. presented in the Table 3.
II. L it e r at u r e Su r v ey TABLE HI. M er it s a n d D e me r it s o f [3]
Many different approaches and work had been performed to
MERIT DEMERIT
analyze data and perform sentiment analysis. Work has been
Sentiments were classified Only 1 dataset was chosen
done in many different languages, in many different fields,
onto the scale ranging from due to which there was a
and in many different branches. Some of the fields in which
0 to 40. lack of grammar which
sentiment analysis has played an important role are: -
The accuracy of the analysis created issues in performing
1. SocialMediaMonitoring
was good as it came out to the analysis.
2. Brand Monitoring
be 88.95%.
3. CustomerFeedback
4. Customer Support
Nazma Iqbal and Tanveer Ahsan in the year 2018 in the paper
Some of the previous works on sentiment analysis are
[4] proposed an idea to perform sentiment analysis by
described below:
combining words and finding the best combination so that
Charu Nanda et al. in the year 2018 in the Paper [1] had
polarity of the opinion can be measured. The merits and
proposed a strategy to perform sentiment analysis to find
demerits of this article is presented in the Table 4.
reviews of Hindi movie. Hindi sentiwordnet is used for the
analysis. Various machine learning algorithms are used and TABLE TV. M er it s a n d D e me r it s o f [4]
then the comparison between them is calculated. The merits
and demerits of this article is presented in the Table 1. MERIT DEMERIT
Combinations were fed to Only 2 datasets were used
TABLE I. M er it s a n d D e me r it s o f [1] various machine learning Stanford Twitter Sentiment
algorithms like Naive Baye, 140 dataset and IMDB
MERIT DEMERIT
Support Vector Machine movie review.
The algorithm of machine Only Hindi words were
(SVM), max ent. 4 factors No neutral sentiment.
learning was used and used for analysis.
were used to evaluate: - The center of attention was
accuracy was measured. Only two sentiments were
recall, precision, accuracy not determined and the
Algorithms like Support shown and neutral cases
and F score. overall review was
Vector Machine (SVM), were avoided.
The method increased the calculated.
Naive Baye and Random Dataset was taken from
accuracy by 2 -5%
Forest were used. Because various platforms.
Random Forest was used
Rasika Wankhede and Prof. A.N. Thakare in the year 2017 in
therefore there was accuracy
the paper [5] proposed an idea to perform analysis on the
in the work.
movie through the movie review dataset which was taken from
the ‘Times of India’ . Classification of movie review was done
10th International Conference on Cloud Computing, Data Science & Engineering (Confluence) 391
Authorized licensed use limited to: University of Durham. Downloaded on May 16,2020 at 02:22:50 UTC from IEEE Xplore. Restrictions apply.
by taking positive negative and neutral polarities. The new data set was used. The merits and demerits of this article is
feature was extracted that features analysis by estimating the presented in the Table 8.
information gain of each feature. The merits and demerits of
this article is presented in the Table 5. TABLE VUI. M er it s a n d D e me r it s o f [8]
MERIT DEMERIT
TABLE V. M er it s a n d D e me r it s o f [5]
The model proposed was Didn’t use NLP.
MERIT DEMERIT evaluated on 8 different Only one language was
Random Forest Didn’t use deep concepts of classifiers. involved whereas more
classification was used NLP for better polarity. Random Factor performed languages should have been
which gave 90% accuracy. Other platforms could have best whereas Ripper Rule involved for more results.
Use of lexical approach like been used to exact the Learning performed worst
sentiwordnet comparison dataset for more accuracy.
with other research paper More sentiments could have Jayashree Jagdale and Dr. Emmanuel M in the year 2019 in
was also done. been added like anger, sad, the paper [9] proposed a method to use hybrid corrective critic
happy, etc. neural network for sentiment analysis of movie reviews from
community media. The merits and demerits of this article is
Tejaswini M. Untawale and Prof. G. Choudhari in the year presented in the Table 9.
2019 in the paper [6] proposed an idea to perform sentiment
analysis which compared 2 best algorithms for sentiment TABLE IX. M er it s a n d D e me r it s o f [9]
analysis that is Naive Baye and Random Forest for measuring MERIT DEMERIT
sentiments of big amount of data. The merits and demerits of Hashtags, caps, emoticons, The classification was only
this article is presented in the Table 6. sentiment lexicons were done on the basis of positive
used in the dataset. and negative sentiments.
TABLE VI. M er it s a n d D e me r it s o f [6]
Output was developed with The neutral class was not
MERIT DEMERIT the help of technology like included, it could be
Naive Baye and Random Although Naive Baye has metrics. included for better results.
Forest has been utilized for been used but its working in CCNN showed better results
feature selection and terms of time and memory than SVM, NB and PSO-
execution. has not been shown NN algorithms.
properly.
Linguistic semantics has not Alaa F. Alsaqer and Sreela Sasi in the year 2017 in the paper
been used whereas it can be [10] proposed an idea and focused on operators of Rapid
used to know the human Miner to improve the accuracy of summarization in sentiment
expression. analysis. The merits and demerits of this article is presented in
the Table 10.
Abinash Tripathy et al. in the year 2016 in the paper [7]
proposed a method for classification of sentiment reviews TABLE X. M er it s a n d D e me r it s o f [10]
using the n-gram machine learning approach. Algorithms like MERIT DEMERIT
Vector Machine (SVM), Naive Baye, maximum entropy and Two models were used Data from social media
stochastic gradient descent for a movie review have been used. Aylien Text Analysis platforms were not extracted
The merits and demerits of this article is presented in the Extension and Text whereas it should be
Table 7. Processing Extension. extracted so that the
algorithm could have been
TABLE VII. M er it s a n d D e me r it s o f [V]
tested correctly.
MERIT DEMERIT
4 scales/factors were used to Only one dataset that is the
evaluate accuracy. IMDB dataset was taken, III. Pr o po s ed A p p l ic a t io n
Various combinations were more websites can humans A. Proposed Application Description
studied like unigram, upon and data could be
Step 1: Movies and web series from the year 2017to 2019 of 4
bigram and trigram to give extracted for more datasets.
different genres were taken. 4 genres that were taken are -
better results.
Crime, Horror, Comedy, and Romance. Data for 10 movies
and 10 web series of each genre was collected.
Mais Yasen and Sara Tedmori in the year 2019 in the paper
Step 2: Famous hashtags for each movie and each web series
[8] proposed a tokenization method to convert the string into
were selected online to access the twitter data through these
word vector and stemming to extract the root word. IMDB
hashtags. To access Twitter data, API keys and twitter access
tokens were generated through twitter developers. To get these
392 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence)
Authorized licensed use limited to: University of Durham. Downloaded on May 16,2020 at 02:22:50 UTC from IEEE Xplore. Restrictions apply.
tokens and API keys we had to seek permission from twitter B. Proposed Application Flowchart
and had to tell them why do we need these API keys and had
Y E A R (2017-2019)
to explain to them about the project.
Step 3: A code was formed to perform sentiment analysis. The
4 GENRES
language used for writing the code was python. The code
included a famous hashtag for each movie and web series and CRIM E HORROR COMEDY ROMANCE
count of tweets to determine sentiments of each movie or web
series was taken 3000. The hashtag for each movie and web
series were put into the code and sentiments were recorded as
positive and negative sentiments. WEB SERIES MOVIES
Step 4: A Table of each genre was formed that contained the
name of movie and web series, percentage of positive
sentiments of corresponding web series or movie and 10 IN NUMBER 10 IN NUMBER
percentage of negative sentiments of corresponding movie or
web series. FINDING FAMOUS HASHTAGS FOR EACH
Step 5: Average of positive sentiments and negative WEBSERIES AND MOVIE AND
sentiments of every movie of each genre and every web series ACCESSING TW ITTER DATA
of each genre was calculated.
Step 6: The comparison was carried out between each genre
on the basis of positive and negative sentiments. The
comparison was done to calculate whether a movie or web
series of a particular genre is liked more by viewers from the
year 2017 to 2019. PERFORMING SENTIMENT
Step 7: On the basis of the analysis is done through graphs and ANALYSIS
Tables final result was generated which concluded that either
movie or web series was liked more by the viewers o f a
particular genre from the year 2017to2019.
10th International Conference on Cloud Computing, Data Science & Engineering (Confluence) 393
Authorized licensed use limited to: University of Durham. Downloaded on May 16,2020 at 02:22:50 UTC from IEEE Xplore. Restrictions apply.
TABLE XI. Po s it iv e a n d N e g a t iv e R e v ie w s of We b s er ies a n d TABLE XIII. Po s it iv e a n d N e g a t iv e R e v ie w s of W eb s er ie s a n d
M o v ie s o f Co me d y Ge n r e M o v ie s o f H o r r o r Ge n r e
Web +ve -ve Movies +ve -ve Web +ve -ve Movies +ve -ve
series % % % % series % % % %
Comicstaan 53.84 12.82 Chhichhore 48.0 4.0 Ghoul 20.93 9.30 Mushkil 10.97 3.65
(2018) (2019) (2018) (2019)
Laakhon 22.22 11.11 Khandaani 15.38 3.84
mein ek shafakhana Tantra 8.0 6.0 Khamoshi 8.97 12.82
(2017) (2019) (2018) (2019)
Immature 30.12 27.71 Jhootha 33.33 1.32 High 87.69 4.61 Number game 23.33 63.33
(2019) kahin ka priestess (2018)
(2019) (2019)
Made In 32.81 9.37 Made in 43.90 19.51 Sheitaan 1.23 0.97 Pari (2018) 9.41 2.35
heaven china (2018)
(2019) (2019) Rain 11.68 42.85 Sanjana 21.87 14.58
Kota 29.41 9.80 Lootcase 30.30 10.60 (2018) (2018)
factory (2019) Zakhmi 9.47 4.21 Ghost house 29.68 35.93
(2019) (2018) (2017)
College 41.75 18.68 Dream girl 21.21 6.06 Dobaara 8.0 2.66 Amavas 13.93 2.32
romance (2019) (2017) (2019)
(2018) Untoucha 32.60 13.04 The past 30.58 61.17
Yeh meri 52.63 10.52 Badhaai ho 33.33 25.0 ble (2018)
family (2018) (2018)
(2018) Typewrit 36.66 15.55 2001 dead 33.33 25.0
Engineering 51.38 11.11 Padman 50 8.33 er (2019) one (2018)
girls (2018) (2017) Psysho 0.93 22.22 Tumbbad 32.78 9.83
Sarabhai vs 47.22 5.55 De de 30.55 5.55 (2018) (2018)
sarabhai 2 pyaar de
(2017) (2019) TABLE XIV. Po s it iv e a n d N e g a t iv e R e v ie w s of W eb s er ie s a n d
Unmarried 48.48 13.63 Luka chupi 15.0 5.0 M o v ie s o f Cr ime Ge n r e
(2018) (2019)
Web +ve -ve Movies +ve -ve
TABLE XII. Po s it iv e a n d N e g a t iv e R e v ie w s of We b s er ies a n d series % % % %
M o v ie s o f R o ma n c e Gen r e Delhi 25.47 12.72 Article 15 25.67 4.05
crime (2019)
Web +ve -ve movies +ve -ve
(2019)
series % % % % Criminal 27.27 45.45 Prassthan 43.58 0.57
Baarish 18.88 5.55 Kalank 25.75 15.15 justice am
(2019) (2019) (2019) (2019)
It 29.03 22.58 Shaadi mein 50.0 2.34 Hostages 18.33 20.0 Badla 26.92 1.92
happened zaroor aana (2019) (2019)
in hong (2017) City of 44.0 24.0 Sonchiria 36.23 10.14
kong dreams (2019)
(2018) (2019)
Spotlight 16.66 4.76 Love per 98.11 0.37 Mirzapu( 35.86 2.17 End 36.58 18.29
2(2018) square foot 2018) counter
(2018) (2019)
Puncch 27.71 21.68 Kabir singh 21.25 11.25 Scared 47.05 28.23 Black 33.80 52.11
beat (2019) game Bud
(2019) (2018) (2018)
Broken 84.71 13.28 Photograph 35.18 3.70 Bard of 50 7.14 Baazaar 13.11 6.55
but (2019) blood (2018)
beautiful (2019)
(2018) The 35.71 27.14 Section 43.33 16.66
Haq se 17.33 6.66 Loveyatri 43.13 13.72 investigat 375
(2018) (2018) ion (2019)
Fuh se 2.37 0.31 Notebook 27.55 6.12 (2019)
fantasy (2019) The 47.45 20.33 Setters 44.04 9.52
(2019) family (2019)
Time out 53.93 28.21 Meri pyaari 36.36 9.09 man
(2017) bindu (2017) (2019)
Side hero 39.68 11.11 The sky is 42.85 40.81 The final 43.28 14.92 Racket 26.47 23.52
(2018) pink (2019) call (2018)
Flames 37.14 17.14 Malaal 41.09 6.84 (2019)
(2018) (2019)
A. Analysis of Positive and Negative Reviews
394 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence)
Authorized licensed use limited to: University of Durham. Downloaded on May 16,2020 at 02:22:50 UTC from IEEE Xplore. Restrictions apply.
1. Comedy
The graph shown below represents the percentage of positive
reviews of web series and movies and percentage of negative
reviews of web series and movies of the comedy genre, drawn
from the data represented in the above given Table 11. 100
1 2 3 4 5 6 7 8 9 10
Fig. 4. Analysis of positive and negative reviews of the horror genre
4. Crime
The graph shown below represents the percentage of positive
reviews of web series and movies and percentage of negative
Fig. 2. Analysis of positive and negative reviews of the comedy genre reviews of web series and movies of the crime genre, drawn
from the data represented in the above given Table 14.
2. Romance
The graph shown below represents the percentage of positive
reviews of web series and movies and percentage of negative
reviews of web series and movies of the romance genre, drawn
from the data represented in the above given Table 12.
-web series positive web series negative
-movies positive -movies negative
1 2 3 4 5 6 7 8 9 10
Fig. 5. Analysis of positive and negative reviews of the crime genre
10th International Conference on Cloud Computing, Data Science & Engineering (Confluence) 395
Authorized licensed use limited to: University of Durham. Downloaded on May 16,2020 at 02:22:50 UTC from IEEE Xplore. Restrictions apply.
movies while romantic movies are more liked by the audience
■ POSITIVE WEB SERIES I POSITIVE MOVIES
than web series and horror movies and web series are equally
NEGATIVE WEB SERIES NEGATIVE MOVIES
liked. And web series of comedy, romance and crime are more
disliked by the audience than the movies of these genres and
50 horror web series are more liked by the audience than the
horror movies.
The presented work in the paper has a vast and never-ending
analytical future scope. The dataset can be increased. More
genres can be incorporated. The paper shows the positive and
negative reviews analysis. The remaining reviews that are
neither positive nor negative can be further processed using
some new ideas.
Re f e r e n c e s
[1] Charu Nanda, Mohit Dua and Garima Nanda, "Sentiment Analysis on
Movies Reviews in Hindi Language using Machine Learning”, IEEE,
Fig. 6. Graphical representation of average positive and negative reviews of 2018(ICCSP), Chennai, India, 3-5 April 2018.
web series and movies of all genre [2] Suhariyanto, Ari Firmanto, Riyanarto Sarno, "Prediction of Movie
Sentiment based on Reviews and Score on Rotten Tomatoes using
The proposed application performs the sentiment analysis for SentiWordnet", IEEE, 2018 International Seminar on Application for
analyzing reviews of web series and movies of different Technology of Information and Communication, Semarang,
Indonesia,21-22September 2018 .
genres from the duration of 2017 to 2019. As per the analysis
[3] Tirath Prasad Sahu, Sanjeev Ahuja, "Sentiment Analysis of Movie
of average positive reviews in Table 15, it is found that the Reviews: A study on Feature Selection & Classification Algorithms",
web series of comedy genres is more liked by the audience IEEE, 2016 International Conference on Microelectronics, Computing
than the movies of the comedy genre. In the romance genre, it and Communications (MicroCom), Durgapur, India,23-25 Jan 2016.
is found that the movies are more liked by the audience than [4] Nazma Iqbal, Afifa Mim Chowdhuryand Tanveer Ahsan, "Enhancing
the Performance of Sentiment Analysis by Using Different Feature
the web series. In the horror genre, it is found that both the
Combinations “ ,IEEE, 2018(IC4ME2), Rajshahi, Bangladesh,8-9 Feb
movies and web series are equally liked by the audience. In 2018.
the crime genre, it is found that the web series is slightly liked [5] Rasika Wankhede, A.N.Thakare, "Design Approach for Accuracy in
by the audience than the movies. Movies Reviews Using Sentiment Analysis",IEEE,vol.1, 2017 (ICECA),
As per the analysis of average negative reviews in Table 15, it Coimbatore, India,20-22 April 2017.
is found that the web series of comedy genre is more disliked [6] Tejaswini M. Untawale, G. Choudhari, "Implementation of Sentiment
Classification of Movie Reviews by Supervised Machine Learning
by the audience than the movies of the comedy genre. In the Approaches",IEEE, 2019 3rd (ICCMC), Erode, India,27-29 march 2019.
romance genre, it is found that the movies are less liked by the [7] Abinash Tripathy, Ankit Agrawal, Santanu Kumar Rath, "Classification
audience than the web series but the difference is quite less. In of sentiment reviews using n-gram machine learning approach", Expert
the horror genre, it is found that the movies are more disliked Systems with Applications,Volume 57, 15 September 2016, Pages 117-
by the audience than the web series of the horror genre and the 126.
dislike difference is quite high. In the crime genre, it is found [8] Mais Yasen, Sara Tedmori, "Movies Reviews Sentiment Analysis and
Classification",IEEE, 2019 (JEEIT), At Amman, Jordan,10 April 2019.
that the web series is more disliked by the audience than the
[9] Jayashree Jagdale, M Emmanuel, "Hybrid Corrective Critic Neural
movies and the dislike difference is quite high. Network for Sentiment Classification in Community Media", 2019 3rd
(ICECA),2019.
V. Co n c l u s io n a n d f u t u r e s c o p e [10] Alaa F.Alsaqer, Sreela Sasi, "Movie review summarization and
The paper proposed an application that performs the sentiment sentiment analysis using rapidminer",IEEE, 2017 International
Conference on Networks & Advances in Computational Technologies
analysis for analyzing reviews of web series and movies of (NetACT), Thiruvanthapuram, India,20-22july 2017.
different genres from the duration of 2017 to 2019. This
application works on the norm of machine learning. The
application has the capability to provide the facility to analyze
the comparative positive and negative reviews of web series
and movies of different genres using twitter data. Data was
fetched from twitter through API keys and twitter access
tokens. There are total 4 different genres taken i.e. comedy,
romance, horror, and crime. The graphical representation of
each genre is done to analyze the results for a better
understanding. Combined analysis is performed after
calculating the average percentage reviews of positive and
negative sentiment of all the movies and web series of each
genre that explains that in duration 2017-2019, web series of
comedy and crime genre are more liked by the audience than
396 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence)
Authorized licensed use limited to: University of Durham. Downloaded on May 16,2020 at 02:22:50 UTC from IEEE Xplore. Restrictions apply.