Mtech Thesis 2020-22
Mtech Thesis 2020-22
Mtech Thesis 2020-22
MASTER OF TECHNOLOGY
in
COMPUTER SCIENCE & ENGINEERING
by
Gurpreet Kaur Grewal
(200060705004)
Under the supervision of
To the
College Of Engineering Roorkee (COER), Roorkee
October, 2022
CANDIDATE’S DECLARATION
I hereby declare that the work which is presented in the thesis named,
“Sentiment analysis of Article 370” submitted by me in partial fulfilment for
the award of degree of Master of Technology (M. Tech.) submitted in
Department of Computer Science & Engineering, Uttarakhand Technical
University is an authentic record of my thesis carried out under the supervision
of Prof. Mrs. Supriya Shukla Department of Computer Science and
Engineering, College Of Engineering Roorkee.
Date:
Gurpreet Kaur Grewal
M. Tech(CSE)
Enrolment No.: 200060705004
College Of Engineering Roorkee, Roorkee
Approved By:
Dr. Taresh Singh
Head Of Department
(Computer Science & Engineering)
College Of Engineering Roorkee, Roorkee
(i)
CERTIFICATE
I hereby submit that the work which is presented in the thesis name,”
Sentiment Analysis of Article 370” in fulfilments for the award of degree of
Master of Technology in Computer Science is a record of my own work
under the supervision of Mrs. Supriya Shukla.
(ii)
1.3 ABSTRACT
After many days and weeks of much research about the scenes and situations in Jammu &
Kashmir, Narendra Modi government finally has revealed their papers and law. Our respected
Home Minister Mr. Amit Shah announced in the Parliament that the Article 370 has to be
followed for rest of the days.
There has been a lot of functional activities which took place, specially on twitter where
people use to share their views and opinions. So in the thesis I am going to elaborate how we
can analyse what people are sharing on twitter on the particular topic. On the basis of twitter
we can share the report regarding the positive and negative impacts of people and their
thinking. Using this technique, we can better understand that the decision is good or bad.
Python is very simple powerful, high-level, interpreted and dynamic programming language,
which is known for its efficient functionality of processing natural language data.
The goal of this thesis is to clarify the twitter data into positive or negative comments by
using different supervised machine learning classifiers on data collected for different Indian
political parties and to finalize the political party is performing best for public.
(iii)
ACKNOWLEDGEMENT
I would like to thanks my guide Mrs. Supriya Shukla Assistant Professor, Computer Science
and Engineering Department, College Of Engineering Roorkee, Roorkee for helping me
submit my thesis and also to complete my work. I am very thankful to Dr. Taresh, HOD
Computer Science Engineering Department, College Of Engineering Roorkee, Roorkee for
setting good standards for his students and encouraging them time to time .
Last but not the least I would like to thank my parents for their years of unyielding love and
encourage. They wanted the best for me and I admire their sacrifice and determination.
Date:
Gurpreet Kaur Grewal
Enrolment No.: 200060705004
(iv)
DISSERTATION APPROVAL SHEET
This is to certify that the dissertation titles
TWITTER SENTIMENT ANALYSIS ARTICLE 370
By
(v)
Table Of Contents
Declaration i
Certificate ii
Abstract iii
Acknowledgement iv
Approval Sheet v
Table of Contents vi
List of Abbreviations vii
Chapter 1 (1-17)
1.9.5 Applications 3
1.10 Literature review 5
1.11 Motivation 6
1.12 Problem Statement 7
1.13 Training Data 8
1.14 Data Storage 9
1.15 Objective 11
1.16 Implementation Details 13
1.17 Classifier Accuracy for Training Data 14
1.18 Conclusion 16
1.19 Future Scope 17
(vi)
LIST OF ABBREVIATIONS
NLTK: Natural Language Toolkit
NLP: Natural Language Processing
NB: Naïve Bayes
SVM: Support Vector Machines
MAP: Maximum A Posterior
BJP: Bhartiya Janta Paty
AAP: Aam Aadmi Party
INC: Indian National Congress
API: Application programming Interface
(vii)
LIST OF FIGURES
Figure 1: Hyper-plane in SVM
Figure 2: Applications
Figure 3: Twitter Analysis
Figure 4: Positive tweets of BJP in different Indian states
Figure 5: Example of reactions
Figure 6: Process to classify tweets using build classify
Figure 7: Code of Execution
Figure 8: Reviews
Figure 9: Data Storage for import
Figure 10: Code for extracting features from tweets
Figure 11: Classifiers accuracy for training data
Figure 12: Sentiment Analysis for BJP, APP and INC in 2016
(viii)
CHAPTER 1
INTRODUCTION
In this chapter we will discuss about the introductions on Sentiment Analysis, Python and
Natural Language Toolkit (NLTK). After that we will focus on objective of our thesis. The
requirement of sentiment analysis and the applications of Sentiment Analysis are used in our
daily life.
1.9.1 Introduction to Sentiment Analysis
Sentiment Analysis is process of collecting and analysing data which is based upon
the feelings personally, reviews and thoughts. Sentimental analysis is often called as
opinion mining because it mines the important feature from people opinions.
Sentimental Analysis is performed by using many machine learning techniques, such
as statistical models and Natural Language Processing (NLP) for the extraction of
feature from a huge data.
Twitter is a mini blogging platform where anyone can read or write short form of
messages which are called tweets. The quantity of data gathered on twitter is very
huge. This data is not in a structured manner and written in natural language. Twitter
Sentimental Analysis is the process of accessing tweets for a particular topic and
predicts the emotions.
Sentimental Analysis has various applications. Sentiment Analysis is domain centred, i.e.
results of one domain that cannot be applied to other domain. Sentimental Analysis is used in
many real-life situations, to get reviews about any product or movies, to get the financial
report of any company, for predictions or marketing. It is used to generate opinions for
people of social media by analysing their feelings or thoughts which they provide in form of
text.
2
Figure 2: Applications
1.9.5 APPLICATIONS
Customer Support: It is very helpful in knowing whether the decision is good or bad
and more analysis could be done. It also helps the government to take future decision
according to people demand.
Hospital
Bus service
Movie review
Hotel review: It could help the hotel by knowing reviews from people who stay there
that for the services were good or bad for future services
Company Product review
3
Figure 3: Twitter Analysis
**Much research have been done and made on the subject of sentiment analysis in
past times. Mostly research on sentiment analysis depend on machine learning
algorithms, whose focus is to find that given text is in favour or against. Latest
research in this area is to make sentiment analysis on the generated data by user from
many websites like social networking websites such as Facebook, Twitter, Amazon,
etc. **
4
1.10 LITERATURE REVIEW
1. The benefit of social media platforms to know about the people decisions and take out
their emotions which are considered and explained that how twitter gives the
advantage in politics way during elections. And also, the concept of the hashtag is
used for classification of text as it expresses all the emotions in words.
2. This approach has decreased or we can say reduced the number of tweets or set of
training which further gets applied to Support Vector Machine and Naïve Bayes
classification algorithm to determine the polarity of tweets.
3. Multistage Classification approach was used where an entity classification receives
general tweets with respect to individual candidates for a good comparison.
4. The common approach which was found in almost all the related researches that
constitute data collection using twitter API, Pre-processing of data, filtering of data
and so on.
They are many researchers who proposed a system which is based on different locations.
According to them, Sentiment Analysis is brought out by the Natural Language Processing
(NLP) and some algorithms of Machine Learning. In Twitter, there is an area of tweet
location which can be easily accessed by a script and therefore, data or tweets from particular
location can be gathered for identifying patterns and sequence. They read many applications
of sentiment analysis based on location by using a data source in which data can be taken out
or extracted from different locations very easily.
6
1.12 PROBLEM STATEMENT
Sentiment analysis is very necessary in today’s world, as people always get affected by the
thinking and opinions other people. The onclusion of sentiment analysis is classification of
natural text into classes such as ‘+’, ‘-‘ and neutral. In Today’s world, if anyone wants to buy
a product or to give vote, etc. then that person would firstly want to know what other people
reviews, reactions and opinions about that product or candidate or on social media websites
like Twitter movie are, Facebook, Tumbler, etc.
The main objective of the thesis is to perform the sentiment analysis on Indian Political
Parties like BJP, INC and AAP, such that people opinions about these parties progress,
workers, policies, etc. are monitored.
There are many methodologies which are used mentioned as follows:
∑ A thorough study of existing approaches and techniques in field of sentiment analysis.
∑ Collection of relatable data from Twitter with the help of Twitter API
∑ Prior processing of data collected from Twitter so that it can be best for mining.
∑ To build a classifier based on different supervised machine learning techniques.
∑ Training and testing the classifier builder using huge datasets
∑ Computing the result of different classifier using dataset collected from Twitter.
∑ Comparing results of classifiers and plotting a graph that show the trend of ‘+’ and ‘–‘
sentiment for various political parties.
7
Figure 7: Code Of Execution
8
Figure 8: Reviews
9
Figure 9 : Data Storage for import
10
Table: Removed and modified content
1.15 OBJECTIVE
1) The main objective of Sentiment Analysis in the thesis is to look forward for the feedback
of people for Article 370 which was passed by the government
2) Our basic motive is to make analysis and research on whether the article has followed the
NICE principle, which is :
N= Need, I=Interest, C=Concern, E=Expectation
Implementation details
The steps for implementation of Sentiment Analysis are :
Load twitter API
Load Word Dictionaries
Search twitter feeds
Defining text cleaning functions
Cleaning and splitting twitter feeds
Analysing twitter feeds
Plotting high frequency negative or positive words
12
1.16 STEPS IN DETAIL
1) Load twitter API
The first step is to get the registration done in twitter application developer portal and get
the authorization.
You need: Consumer Key
Twitter Consumer Secret Key
Access Key
2) Load Word Dictionaries
Next step is to stack the arrangement of positive and negative assumptions words into the
working catalogue. The words are then released to factors as positive or negative.
3) Search twitter feeds
The following step is to categorize twitter seek and relegating to a variable. Number of
tweets must be removed were allotted to another variable. An ideal opportunity to play
out the twitter hunt and extraction is impressed by this number. A moderate web
association as well as unpredictable inquiry that brings about extra components.
4) Getting text from feed
Twitter consists of huge amounts of extra fields and data. We utilize the gettext()
command to remove all the content fields. The capacity connected to every one out of all
the total tweets.
sustaintweetT=lappy(tweet,function(t)t$getText()).
5) Defining text cleaning functions
In this program, we compose a capacity which executes all the orders to clean the context,
remove punctuation, special characters, etc. this function changes capitalized characters
to lower down the cases of the string utilizing tolower() command. To use this we
compose a blunder getting capacity and install it in the code of content cleaning of the
function.
6) Cleaning and splitting twitter feeds
In this step we generally use to separate all the tweets and the resultant feeds are stored in
a list object called sentiment analysis.
7) Analysing twitter feeds
Here actually we get into the actual task of analysing feeds. We also do the comparison of
the twitter text storage with the word dictionaries and retrieve out all the matching words.
To do this, we first determine and describe a function to count all the positive and
negative words that are matching with our database.
13
1.17 RESULTS AND DECLARATION
14
Figure 12: Sentiment Analysis for BJP, APP and INC for April 2016
15
1.18 CONCLUSION
The thesis helps us to analyse huge amount of data and processes. The data will be gathered
by the API of twitter streaming. The data which got collected will be analysed, based on
score that we analyse how to check the user’s emotions. We can also visualize the user’s
opinion towards other products in the market by drawing it is the form of graph like bar
graph.
16
1.19 FUTURE SCOPE
Some of future scopes that can be collected in our research work are:
∑ Use of parser can be embedded into system for better results.
∑ A web-based application can be made for our good work in future days
∑ We can improve our system that can deal with sentences of multiple meanings.
∑ We can also increase the classification categories so that we can get better results.
∑ We can start work on multi language
17
REFERENCES
1) https://techsparks.co.in/how-to-write-m-tech-thesis-expert-guidelines/
2) http://www.iitk.ac.in/doaaold/thesisguide.pdf
3) http://www.tezu.ernet.in/dener/programme/Guideline_for_MTech_thesis_Writing.pdf
4) https://www.quora.com/What-is-an-M-Tech-thesis-all-about
5) https://www.davietjal.org/wp-content/uploads/2016/03/M.Tech-thesisrules.pdf
18