CCL MiniProject
CCL MiniProject
CCL MiniProject
Mini Project
Analysing Social Media Reaction on Political Issues using Machine
Learning
Abstract:
Currently, Twitter is one of the most popular social media platforms that enables its user to post
their thoughts on anything, commonly in the form of limited word length. The massive number of
Twitter users has made Twitter a valuable source of data in analysing people behaviour and
tendency in reacting to a certain political issue. Unfortunately, the textual postings are difficult to
analyse as the dimension of the data is too high to be clustered. One needs to find the most
appropriate method to cluster Twitter posting with an acceptable clustering result. This study
presents the clustering of Twitter users based on the most common words used by the users in
reacting to a trending political issue. A comparative study between hierarchical clustering and k-
means clustering methods are presented and discussed in this study, as well as the word trend or
main topic of the issue by histogram and word cloud.
List of Abbreviations:
AI: Artificial Intelligence
DFD: Data Flow Diagram
Et al: And Others
ML: Machine Learning
Sci-kit Learn: Library for Machine Learning.
UML: Unified Modelling Language
1. INTRODUCTION
1.1 Introduction
The fact that social media has massive users that share their thoughts on particular topic,
makes social media be the most valuable dataset to analyze human behavior and trend in
a certain time and place. The postings of the social media vary from individual facts and
opinion to facts and opinion cited from news. Twitter is considered as on appropriate site
to get dataset of what people share publicly, thanks to its large number of users and
datasets.
The consideration of choosing clustering as method is due to the fact that the variable
targets is not set or labelled. Thus, one needs to identify each target with the help of
unsupervised machine learning which is clustering. The traits of each text will be clustered
based on the most common similar words tweeted by the users.
Key Features:
i. To group Similar data points together and discover underlying pattern using
K-means.
ii. The most frequent words plot that will show the words which appear related
to particular topic.
iii. Wordcloud that will show the most frequent words in regard to the issue.
Those words that appear on the word cloud is the main topic used or trend
from the twitter user
iv. Dendrogram which shows relationships between similar set of data.
1.2 Motivation
Politics, in general, is the platform by which people create, maintain, and change the laws
that govern their lives. As a result, conflict and collaboration are inextricably connected
in politics. On the one hand, the presence of conflicting views, competing expectations,
competing needs, and competing interests is expected to result in conflict over the rules
under which people live.
Hardware Requirement:
Processor: Intel i3 or more
Ram: 8GB
Libraries Development:
Scikit-learn (SkLearn) is the most useful and robust library for machine learning in Python.
It provides a selection of efficient tools for machine learning and statistical modelling
including classification, regression, clustering and dimensionality reduction via a
consistence interface in Python. This library, which is largely written in Python, is built
upon NumPy, SciPy and Matplotlib.
Programming Language:
Python is an old and very popular language designed in 1991 by Guido van Rossum. It is
open source and is used for web and Internet development (with frameworks such as
Django, Flask, etc.), scientific and numeric computing (with the help of libraries such as
NumPy, SciPy, etc.), software development, and much more.
NumPy:
NumPy is a package that defines a multi-dimensional array object and associated fast math
functions that operate on it. It also provides simple routines for linear algebra and Fourier
transform and sophisticated random-number generations. NumPy replaces both Numeric
and Num array.
Pandas:
Pandas is a Python package providing fast, flexible, and expressive data structures
designed to make working with “relational” or “labelled” data both easy and intuitive. It
aims to be the fundamental high-level building block for doing practical, real-world data
analysis in Python. Additionally, it has the broader goal of becoming the most powerful
and flexible open-source data analysis/manipulation tool available in any language.
Matplotlib:
Matplotlib is an amazing visualization library in Python for 2D plots of arrays. Matplotlib
is a multi-platform data visualization library built on NumPy arrays and designed to work
with the broader SciPy stack. It was introduced by John Hunter in the year 2002.
Flask:
Flask is a web framework written in Python that provides developers with tools to build
web applications. It is based on Werkzeug’s (WSGI) toolkit and Jinja templating engine.
Render:
Render is a unified cloud to build and run all your apps and websites with free TLS
certificates, a global CDN, DDoS protection, private networks, and auto deploys from
Git.
2. On Render
• Create New Web Services
• Connect to GitHub Repository
3. Environment Setup
4. Deployed Status
Marks:
R1 R2 R3 Total
Sign
(3 Marks) (5 Marks) (7 Marks) (15 Marks)
R1 R2 Total
Sign
(5 Marks) (5 Marks) (10 Marks)