Detection of Fake Profiles On Twitter Us
Detection of Fake Profiles On Twitter Us
1051/e3sconf/202130901046
ICMED 2021
Sarangam Kodati1*, Kumbala Pradeep Reddy2, Sreenivas Mekala3, PL Srinivasa Murthy4, P Chandra Sekhar
Reddy5
1Associate Professor, Department of CSE, Teegala Krishna Reddy Engineering College, Hyderabad, Telangana.
2Associate Professor, Department of CSE, CMR Institute of Technology, Hyderabad, Telangana.
3Associate Professor, Department of IT, Sreenidhi Institute of Science and Technology, Hyderabad,Telangana.
4Professor CSE Department, Institute of Aeronautical Engineering, Hyderabad. Telangana
5Professor CSE Department, Gokaraju Rangaraju Institute of Engineering and Technology, Hyderabad.
© The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(http://creativecommons.org/licenses/by/4.0/).
E3S Web of Conferences 309, 01046 (2021) https://doi.org/10.1051/e3sconf/202130901046
ICMED 2021
can also be used to befriend a person in order to stalk controlled by fake user. Unsupervised Learning and
them. Another big issue associated with the fake supervised learning are two types of machine learning
accounts is the amount of data overload that they are methods. Input data is estimated or mapped with
resulting in [5]. With the number of fake accounts desired output by using the training data labeled set in
being in millions it has become impossible to supervised learning. But there is not providing labeled
manually detect them. Luckily the advancement in examples in unsupervised learning and during the
digital technology can benefit a lot in this situation. learning process there is no idea about output. Input
Methods like Machine Learning can help in making data of supervised learning is called as training data
the stratification process a lot easier and accurate [6]. and at a time it has result or known label as
This project involves use of machine learning model spam/not-spam [12].
to classify social media accounts as genuine or fake.
Spam URLs and spam tweets sending strategy use A training process prepares the model and make
the attack strategy of social engineering by the predictions when it required and make them
spammers. Irregular spam accounts proliferation uses correct if the predictions are wrong. Once the
an ideal arena of twitter. From defamatory actions a training data can achieves desired accuracy levels
model is developed by researchers from the then the training process stops. With the algorithm
simulation impacts and this method detects and of trained machine learning fake profiles can be
recovered the fault profiles. Number of fake spam detected and it is the main aim of machine
profiles is present in the twitter network which causes learning method [13]. The training data is having
the issues in providing security and privacy to normal the particulars of person as gender, age and
users. In this research one of the key parts is spam friends list. So the fake profiles are detected or
profiles identification which improves the safety of predicted with these particulars and data security
real users. is enhanced on social networking sites. Naïve
Bayes (NB), Decision Tree (DT) and Support
2. Role of Machine Learning in Vector Machine (SVM) are used in proposed
Detection of Fake profiles machine learning algorithms. From prediction
result account activities analysis is also provided
Since last twenty years, there is an enormous [14].
improvements are observed in OCIAL networking
phenomenon. So number of social networks is The researchers did so far make use of
introduced different online services which are attracts traditional machine learning algorithms like
huge amount of users. The increasing capacity of users random forest, naive bayes, SVM, and decision
is depending on information credibility on Online tree. These methods are incapable of doing feature
Social Networks (OSNs) [7]. Online social networks selection on their own. Thus the researcher has to
are being a part of every one social life in present study relation between the features and target
generation. Technology usage is widely increased in variable in order to decide which features are to be
nowadays. Online social networks are playing an considered and which can be rejected. Another
important role in modern society. Social networks are drawback is being their inability to adapt with the
dealing millions of users in present days all over the changing patterns in the input dataset which can
world. Facebook and twitter are two social networks make them insufficient at times. Hence these
in which the user interactions are more and daily life methods require constant monitoring. The
can be highly impacted with these social networks[16]. changing patterns in the input can cause them to
Large amount of fake account creation is the major give incorrect results thus reducing the accuracy.
problem of OSN networks. These fake accounts are Also one major issue with them is that they do
does not match with real profiles of humans. Spam, not perform well if the dataset is too large or is
web rating and fake news are representing some fakes unstructured. This makes traditional methods highly
[8]. The detection of different resources is currently unsuitable for real life scenarios as in such cases
expended by OSN operators and then fake accounts where the data is mostly unstructured and often too
are closed. Almost 46% of users are operating the large[17]. Owing to the drawbacks of the traditional
twitter account on the mobile phones only [9]. SMS methods it has become necessary to explore
text messages sending and e-mails sending are advance algorithms like deep neural network
publishes the tweets. Messages capacity of twitter is [15,18,20].
140 characters of message which is used for
exchanging and publishes on twitter directly from
smart phones using a wide array of Web-based
3. Twitter Fake Profile Detection Using
services [10]. Number of users is maintained by the SVM
twitter. Better social lives are maintained with these Fake profile detection model designing for twitter
social sites but also there are some disadvantages or presented in this paper uses the machine learning
issues are existed with these social networks. Online concept. Training and testing are two main stages in
harassment, privacy, trolling, potential for misuse, fake Machine learning framework. Fig. 1 shows the block
account creation and etc are some of the social diagram of proposed detection model for Twitter fake
networks issues [11]. We will implement machine profile detection using SVM.
learning algorithms to predict if an account is
2
E3S Web of Conferences 309, 01046 (2021) https://doi.org/10.1051/e3sconf/202130901046
ICMED 2021
Fig. 1: Framework of Twitter Fake Profile Detection 3.4 Spearman’s Rank-Order Correlation Most
Using SVM used feature selection method is Spearman’s Rank-
Order Correlation. In between X and Y
3.1 Twitter data collection quantitative variables there exists a monotonic
Collection of twitter data is the first process of this relationship and its direction and strength are
method. For research purpose publically available measured by this correlation method. If X and Y are
data or API streaming twitter data is used. independent then 0 is output measure of this
correlation and if the values are in between -1 and +1
3.2 Feature selection which indicates the direction and level of correlation.
The collected dataset feature selection is processed in Each and every variable correlation coefficient is the
this step. Spam account detection uses the different outputs of this algorithm which are represented in the
feature parameters, in that some are useless. So from form of table.
extracted features only useful features are selected.
Spam account detection effective results are 3.5 Multiple Linear Regression
dependent on the selected features. The estimated The relationship in between dependent variable or
threshold value is 0.8, and below this correlation independent variable as predictor input and response
levels to the class variable feature pairs are output is described by the models of Linear
eliminated by using Spearman’s Rank-Order Regression. Two linear variables are considered in
correlation. simple linear regression as x and y, in this one
variable is dependent on others as shown in below
Total 11 sets of correlated feature pairs are selected equation 1 as;
in this step as output. Relevance analysis step selects y = a + bx (1)
the features and are used as the inputs of redundancy
analysis step. If two values are correlated completely Where, a is a constant, regression coefficient is
then that features are said to be redundant each other denoted with b. two or more independent variables
but the features determination is not straightforward are considered in multiple linear regression in which
in reality when one feature is correlated with set of dependent variable value is predicted as shown in
features. below equation 2.
y = a + bx1 + cx2 + dx3 (2)
Hence, redundant features are eliminated by using
Markov Blanket technique. In a Bayesian network One independent variable with 16 dependent
Markov Blanket for a node A, MB (A) consists of variables dataset is used in multiple linear regressions.
group of nodes with A’s parents, its children and Multi co-linearity problem is raised in the multiple
other parents of its children. Neighboring nodes set linear regressions. Here multiple factors are correlated
forms the node Markov Blanket in Markov random not in terms of response variable but also to each
field. Non-redundant features of two output sets other. Standard coefficient errors are increased with
with two different versions are obtained when Multi co-linearity problem and making some
applying Markov blanket on correlated feature variables as statistically insignificant and some as
pairs of MB (Fi) and MB (Fj). These redundancy significant. Out of 16 predictions total 12 predictions
3
E3S Web of Conferences 309, 01046 (2021) https://doi.org/10.1051/e3sconf/202130901046
ICMED 2021
4
E3S Web of Conferences 309, 01046 (2021) https://doi.org/10.1051/e3sconf/202130901046
ICMED 2021
5
E3S Web of Conferences 309, 01046 (2021) https://doi.org/10.1051/e3sconf/202130901046
ICMED 2021