0% found this document useful (0 votes)
13 views6 pages

Detection of Fake Profiles On Twitter Us

The document discusses a proposed method for detecting fake profiles on Twitter using a hybrid Support Vector Machine (SVM) algorithm, which achieves a classification accuracy of 98%. It highlights the growing issue of fake accounts on social media platforms, which are often used for malicious activities, and emphasizes the need for improved detection methods through machine learning. The paper outlines the process of feature selection, dimension reduction, and the application of various machine learning techniques to enhance the accuracy of fake profile detection.

Uploaded by

ah.slimani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views6 pages

Detection of Fake Profiles On Twitter Us

The document discusses a proposed method for detecting fake profiles on Twitter using a hybrid Support Vector Machine (SVM) algorithm, which achieves a classification accuracy of 98%. It highlights the growing issue of fake accounts on social media platforms, which are often used for malicious activities, and emphasizes the need for improved detection methods through machine learning. The paper outlines the process of feature selection, dimension reduction, and the application of various machine learning techniques to enhance the accuracy of fake profile detection.

Uploaded by

ah.slimani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

E3S Web of Conferences 309, 01046 (2021) https://doi.org/10.

1051/e3sconf/202130901046
ICMED 2021

Detection of Fake Profiles on Twitter Using Hybrid SVM


Algorithm

Sarangam Kodati1*, Kumbala Pradeep Reddy2, Sreenivas Mekala3, PL Srinivasa Murthy4, P Chandra Sekhar
Reddy5

1Associate Professor, Department of CSE, Teegala Krishna Reddy Engineering College, Hyderabad, Telangana.
2Associate Professor, Department of CSE, CMR Institute of Technology, Hyderabad, Telangana.
3Associate Professor, Department of IT, Sreenidhi Institute of Science and Technology, Hyderabad,Telangana.
4Professor CSE Department, Institute of Aeronautical Engineering, Hyderabad. Telangana
5Professor CSE Department, Gokaraju Rangaraju Institute of Engineering and Technology, Hyderabad.

Abstract: Establishing and management of social relationships among huge amount of


users has been provided by the emerging c ommunication medium called online social
networks (OSNs). The attackers have attracted because of the rapid increasing of OSNs
and the large amount of its subscriber’s personal data. Then they pretend to spread
malicious activities, share false news and even stolen personal data. Twitter is one of the
biggest networking platforms of micro blogging social networks in which daily more than
half a billion tweets are posted most of that are malware activities. Analyze, who are
encouraging threats in social networks is need to classify the social networks profiles of
the users. Traditionally, there are different classification methods for detecting the fake
profiles on the social networks that needed to improve their accuracy rate of
classification. Thus machine learning algorithms are focused in this paper. Therefore
detection of fake profiles on twitter using hybrid Support Vector Machine (SVM)
algorithm is proposed in this paper. The machine learning based hybrid SVM algorithm
is used in this for classification of fake and genuine profiles of Twitter accounts and
applied the dimension reduction techniques, feature selection and bots. Less number of
features is used in the proposed hybrid SVM algorithm and 98% of the accounts are
correctly classified with proposed algorithm.

phishing attacks [2].


1. Introduction
People often use these dummy accounts to spread
The growing popularity of social media platforms has
fake news which in the worst case can cause riot like
not only benefitted the people but also caught the
conditions. Some people make use of fake accounts
attention of scammers. On one hand social media is
to spread hate which can be directed at certain race,
bringing people together and on the other hand it has
religion, country or often at a particular person [3].
created a guarded space for fraudsters to carry out
This has increased the cases of cyber bullying leading
number of illegal activities. The absence of any
to rise in the cases of depression and anxiety in
authentication process has made it easy for anyone to
teenagers. The social media platforms have also seen
make a fake account. This serves as an advantage for
an increase in the number of accounts which provide
the scammers encouraging them to use fake account
services or products in exchange of money[19]. But
for illegal activities as there is a good chance that the
most of these accounts are fake as a result thousands
account holder will not get caught. Owing to this the
of people are sold fake products and are promised
popularity of fake accounts has increased. These
fake services by these accounts. Sometimes these fake
phone accounts can either operated by humans or
accounts are used by companies to build hype for
bots. The use of these phantom accounts to
their bad products and services [4]. Not only
impersonate someone in hope of defaming them has
scammers but also a lot of influencers also use fake
become a common issue. At times these accounts serve
bot followers to appear popular, which help them in
the bigger purpose of acting as a trusted acquaintance
gaining more offers from companies asking to
to get personal information from a person. This
publicise their products. At times the fake accounts
obtained information can be used to carry out
*
Corresponding author: [email protected]

© The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(http://creativecommons.org/licenses/by/4.0/).
E3S Web of Conferences 309, 01046 (2021) https://doi.org/10.1051/e3sconf/202130901046
ICMED 2021

can also be used to befriend a person in order to stalk controlled by fake user. Unsupervised Learning and
them. Another big issue associated with the fake supervised learning are two types of machine learning
accounts is the amount of data overload that they are methods. Input data is estimated or mapped with
resulting in [5]. With the number of fake accounts desired output by using the training data labeled set in
being in millions it has become impossible to supervised learning. But there is not providing labeled
manually detect them. Luckily the advancement in examples in unsupervised learning and during the
digital technology can benefit a lot in this situation. learning process there is no idea about output. Input
Methods like Machine Learning can help in making data of supervised learning is called as training data
the stratification process a lot easier and accurate [6]. and at a time it has result or known label as
This project involves use of machine learning model spam/not-spam [12].
to classify social media accounts as genuine or fake.
Spam URLs and spam tweets sending strategy use A training process prepares the model and make
the attack strategy of social engineering by the predictions when it required and make them
spammers. Irregular spam accounts proliferation uses correct if the predictions are wrong. Once the
an ideal arena of twitter. From defamatory actions a training data can achieves desired accuracy levels
model is developed by researchers from the then the training process stops. With the algorithm
simulation impacts and this method detects and of trained machine learning fake profiles can be
recovered the fault profiles. Number of fake spam detected and it is the main aim of machine
profiles is present in the twitter network which causes learning method [13]. The training data is having
the issues in providing security and privacy to normal the particulars of person as gender, age and
users. In this research one of the key parts is spam friends list. So the fake profiles are detected or
profiles identification which improves the safety of predicted with these particulars and data security
real users. is enhanced on social networking sites. Naïve
Bayes (NB), Decision Tree (DT) and Support
2. Role of Machine Learning in Vector Machine (SVM) are used in proposed
Detection of Fake profiles machine learning algorithms. From prediction
result account activities analysis is also provided
Since last twenty years, there is an enormous [14].
improvements are observed in OCIAL networking
phenomenon. So number of social networks is The researchers did so far make use of
introduced different online services which are attracts traditional machine learning algorithms like
huge amount of users. The increasing capacity of users random forest, naive bayes, SVM, and decision
is depending on information credibility on Online tree. These methods are incapable of doing feature
Social Networks (OSNs) [7]. Online social networks selection on their own. Thus the researcher has to
are being a part of every one social life in present study relation between the features and target
generation. Technology usage is widely increased in variable in order to decide which features are to be
nowadays. Online social networks are playing an considered and which can be rejected. Another
important role in modern society. Social networks are drawback is being their inability to adapt with the
dealing millions of users in present days all over the changing patterns in the input dataset which can
world. Facebook and twitter are two social networks make them insufficient at times. Hence these
in which the user interactions are more and daily life methods require constant monitoring. The
can be highly impacted with these social networks[16]. changing patterns in the input can cause them to
Large amount of fake account creation is the major give incorrect results thus reducing the accuracy.
problem of OSN networks. These fake accounts are Also one major issue with them is that they do
does not match with real profiles of humans. Spam, not perform well if the dataset is too large or is
web rating and fake news are representing some fakes unstructured. This makes traditional methods highly
[8]. The detection of different resources is currently unsuitable for real life scenarios as in such cases
expended by OSN operators and then fake accounts where the data is mostly unstructured and often too
are closed. Almost 46% of users are operating the large[17]. Owing to the drawbacks of the traditional
twitter account on the mobile phones only [9]. SMS methods it has become necessary to explore
text messages sending and e-mails sending are advance algorithms like deep neural network
publishes the tweets. Messages capacity of twitter is [15,18,20].
140 characters of message which is used for
exchanging and publishes on twitter directly from
smart phones using a wide array of Web-based
3. Twitter Fake Profile Detection Using
services [10]. Number of users is maintained by the SVM
twitter. Better social lives are maintained with these Fake profile detection model designing for twitter
social sites but also there are some disadvantages or presented in this paper uses the machine learning
issues are existed with these social networks. Online concept. Training and testing are two main stages in
harassment, privacy, trolling, potential for misuse, fake Machine learning framework. Fig. 1 shows the block
account creation and etc are some of the social diagram of proposed detection model for Twitter fake
networks issues [11]. We will implement machine profile detection using SVM.
learning algorithms to predict if an account is

2
E3S Web of Conferences 309, 01046 (2021) https://doi.org/10.1051/e3sconf/202130901046
ICMED 2021

step output features has further used in classification


section. Two feature sets are selected and 1
represents the contributing features whereas 0
represents the ignored features. The purpose of
training uses the labeling of set of samples which is
small with spam or non-spam. The services of
spam filtering or manual process the labeling action.
Only spam free instances are allowed by spam
filtering so label the instances as non-spam on the
other hand spam effective instances are eliminated
so label the instances as spam.

3.3 Principal Component Analysis “PCA” The


dimensions of feature vectors are reduced by using
the dimension reduction technique of principle
component analysis (PCA). The best features are
discovered and can efficiently describe the data as well
as unnecessary features are stripping by assignment
of lower weights which results that mining process
cannot impact. From 16 PCAs total 10 PCAs are
selected in this work. 92% of data is covered with
selected 10 PCAs.

Fig. 1: Framework of Twitter Fake Profile Detection 3.4 Spearman’s Rank-Order Correlation Most
Using SVM used feature selection method is Spearman’s Rank-
Order Correlation. In between X and Y
3.1 Twitter data collection quantitative variables there exists a monotonic
Collection of twitter data is the first process of this relationship and its direction and strength are
method. For research purpose publically available measured by this correlation method. If X and Y are
data or API streaming twitter data is used. independent then 0 is output measure of this
correlation and if the values are in between -1 and +1
3.2 Feature selection which indicates the direction and level of correlation.
The collected dataset feature selection is processed in Each and every variable correlation coefficient is the
this step. Spam account detection uses the different outputs of this algorithm which are represented in the
feature parameters, in that some are useless. So from form of table.
extracted features only useful features are selected.
Spam account detection effective results are 3.5 Multiple Linear Regression
dependent on the selected features. The estimated The relationship in between dependent variable or
threshold value is 0.8, and below this correlation independent variable as predictor input and response
levels to the class variable feature pairs are output is described by the models of Linear
eliminated by using Spearman’s Rank-Order Regression. Two linear variables are considered in
correlation. simple linear regression as x and y, in this one
variable is dependent on others as shown in below
Total 11 sets of correlated feature pairs are selected equation 1 as;
in this step as output. Relevance analysis step selects y = a + bx (1)
the features and are used as the inputs of redundancy
analysis step. If two values are correlated completely Where, a is a constant, regression coefficient is
then that features are said to be redundant each other denoted with b. two or more independent variables
but the features determination is not straightforward are considered in multiple linear regression in which
in reality when one feature is correlated with set of dependent variable value is predicted as shown in
features. below equation 2.
y = a + bx1 + cx2 + dx3 (2)
Hence, redundant features are eliminated by using
Markov Blanket technique. In a Bayesian network One independent variable with 16 dependent
Markov Blanket for a node A, MB (A) consists of variables dataset is used in multiple linear regressions.
group of nodes with A’s parents, its children and Multi co-linearity problem is raised in the multiple
other parents of its children. Neighboring nodes set linear regressions. Here multiple factors are correlated
forms the node Markov Blanket in Markov random not in terms of response variable but also to each
field. Non-redundant features of two output sets other. Standard coefficient errors are increased with
with two different versions are obtained when Multi co-linearity problem and making some
applying Markov blanket on correlated feature variables as statistically insignificant and some as
pairs of MB (Fi) and MB (Fj). These redundancy significant. Out of 16 predictions total 12 predictions

3
E3S Web of Conferences 309, 01046 (2021) https://doi.org/10.1051/e3sconf/202130901046
ICMED 2021

are obtained as result after removing the redundant


variables. Recall: Recall is the ratio of true positives to the
total number of positives. It is defined in Eq.4,
3.6 Support Vector Machine (SVM) Wrapper given as follows:
method is mostly used feature selection model. A 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
𝑅𝑒𝑐𝑎𝑙𝑙
learning model selects and qualifies the different 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒
feature subsets in this method. Highest predictive
performance with subset features is selected. Accuracy: the correct identification of accounts
Bit manipulation can calculates the all subsets and from corpus are determined by using the parameter
for given set 2 , n subsets are calculated; here n is accuracy.
the feature number for F set. For instance, there 𝑇𝑃 𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦
will be 2 3 are subsets of set {1, 2, 3}. The 𝑇𝑃 𝐹𝑃 𝑇𝑁 𝐹𝑁
feature set with best performance is provided in this The MIB Datasets are used which are of fake
method and there is a requirement intensive and legitimate Twitter accounts published by the
computation for large feature space. 16 feature Institute of Informatics and Telematics (IIT), Italian
vectors are existed in baseline dataset so 216 -1 = National Research Council's (CNR). The data set
65,535 subsets are possible without empty subset. included 11,737 Twitter accounts. This dataset is
Particular data instance class is identified by testing divided in to two parts as training set with 70% of
after the training of detection models based on data and testing set with 30% of data. Total
machine learning with labeled samples. Complex feature subsets in the dataset are then trained and
algorithms and models are developed by using also tested with the use of
machine learning in data analytics field for prediction the proposed hybrid SVM technique. The following
themselves. Detection model performance Table 1 shows the precision and Recall metrics of the
influences the prediction accuracy. Single classifier or different ML techniques such as Logistic Regression,
group of classifiers is present in detection model of random forest (RF), SVM, and proposed hybrid SVM
spam account detection system. Hybrid SVM in predicting of identity deception by humans on
(Support Vector Machine) is used in development Twitter.
process of these classifiers. Data mining algorithms
stability, different labeled instances and features Table 1: Comparative Analysis of Different Classified
proper selection are the factors which influence the Techniques
detection model performance.
Parameter Precision Recall
3.6 Results Evaluation
Detection rate, accuracy, false negative, true
positive, f-measures, precision, recall and etc are RF 58.52% 63.21%
evaluation parameters which are evaluates the
performance of detection models. Logistic
83.51% 86.63%
Regression
4. Results
Once the proposed SVM classifier had been
trained, their effectiveness was evaluated. Using SVM 73. 89% 76.98%
confusion matrix we can describe the performance
of the classification model. The most fundamental Hybrid SVM 98.16% 85.87%
terms used with a confusion matrix for a binary
classifier are:
 True-positive (TP): the number of These results indicate that the different supervised
accounts correctly identified as Faked. machine learning models such as SVM, and proposed
 False-positive (FP): the number of hybrid SVM in predicting of identity deception by
accounts incorrectly identified as Faked. humans on Twitter. At best, an Accuracy of 98.16%
 True-negative (TN): the number of was achieved from the proposed hybrid SVM machine
accounts correctly identified as Trusted. learning model.
 False-negative (FN): the number of
accounts incorrectly identified as Trusted
These can be further used to find following metrics
to determine the effectiveness of each model:

Precision: Precision is the ratio of true positives


to the values predicted correctly. It is defined in
Eq.3, given as follows:
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠

4
E3S Web of Conferences 309, 01046 (2021) https://doi.org/10.1051/e3sconf/202130901046
ICMED 2021

International Conference on Computer, Control,


Electrical, and Electronics Engineering (ICCCEEE),
( 2019)
6.Estée Van Der Walt, Jan Eloff, “Using Machine
Learning to Detect Fake Identities: Bots vs Humans”,
IEEE Access, Volume: 6, (2018)
7.Naeimeh Laleh, Barbara Carminati, Elena Ferrari,
“Risk Assessment in Social Networks Based on User
Anomalous Behaviors”, IEEE Transactions on
Dependable and Secure Computing, Volume: 15,
Issue: 2, (2018)
8.Md. Arafatur Rahman, Vitaliy Mezhuyev, Md
Fig. 2: Detection Accuracy of Different Machine Zakirul Alam Bhuiyan, S. M. Nazmus Sadat, Siti
Learning Techniques Aishah Binti Zakaria, Nadia Refat, “Reliable
Decision Making of Accepting Friend Request on
Online Social Networks”, IEEE Access, Volume:6,
5. Conclusion ( 2018)
The techniques of machine learning modules usage is 9.Myo Myo Swe, Nyein Nyein Myo,
increasing day to day. The usage of datasets with “Fake Accounts Detection on Twitter Using Blacklist”,
fake profiles efficiently eliminates the difficulty in 10.Estée Van Der Walt, Jan Eloff, “Using Machine
finding fake profiles. The detection technique of fake Learning to Detect Fake Identities: Bots vs Humans”,
accounts that are created by humans is described in IEEE Access, Volume: 6, (2018)
this paper. A new hybrid classification algorithm is 11.Nafiseh Sedaghat, Mahmood Fathy, Mohammad
introduced for detecting fake profiles efficiently on Hossein Modarressi, Ali Shojaie,
social networks. Multi linear regression method is “Combining Supervised and Unsupervised Learning for
used for finding the values of SVM trained model. Improved miRNA Target Prediction”, IEEE/ACM
Spearman’s Rank-Order correlation is used to reduce Transactions on Computational Biology and
the feature vector. Remarkable accuracy is then Bioinformatics, Volume: 15, Issue: 5, (2018)
obtained with correlation feature set among different 12.Naman Singh, Tushar Sharma, Abha Thakral,
features. Best features are selected in correlation Tanupriya Choudhury, “Detection of Fake Profile in
technique and redundancy is removed. From the Online Social Networks Using Machine Learning”,
result analysis it has been observed that the 2018 International Conference on Advances in
proposed hybrid SVM achieved an accuracy of 98% Computing and Communication Engineering
in fake profile detection on twitter and acquired a (ICACCE), (2018)
better performance a s w e l l a s efficient one 13.Simon Fong, Yan Zhuang, Jiaying He, “Not every
compared to the other existing machine learning friend on a social network can be trusted: Classifying
techniques. imposters using decision trees”, The First
International Conference on Future Generation
Communication Technologies, (2012)
References 14.Huaizu Jiang, Jinjun Wang, Yihong Gong, Na
1. Koyel Chakraborty, Siddhartha Bhattacharyya, Rong, Zhenhua Chai, Nanning Zheng, “Online Multi-
Rajib Bag, “A Survey of Sentiment Analysis Target Tracking With Unified Handling of Complex
from Social Media Data”, IEEE Transactions on Scenarios”, IEEE Transactions on Image Processing,
Computational Social Systems, Volume: 7, Issue: 2, Volume: 24, Issue: 11,( 2015)
( 2020) 15. Kui Wu, Xuancong Wang, Nina Zhou, AiTi Aw,
2..Muhammad Adil, Rahim Khan, M. Ahmad Nawaz Haizhou Li, “Joint Chinese word segmentation and
Ul Ghani, “Preventive Techniques of Phishing punctuation prediction using deep recurrent neural
Attacks in Networks”, 2020 3rd International network for social media data”, 2015 International
Conference on Advancements in Computational Conference on Asian Language Processing (IALP),
Sciences (ICACS), (2020) (2015).
3.Fatih Cagatay Akyon, M. Esat Kalfaoglu, 16. Raghunadha Reddy, T., Vishnu Vardhan, B.,
“Instagram Fake and Automated Account Detection”, Vijayapal Reddy, P,”A survey on Authorship Profiling
2019 Innovations in Intelligent Systems and techniques”, International Journal of Applied
Applications Conference (ASYU), (2019) Engineering Research, 11 (5), (2016),pp. 3092-3102.
4.Ranojoy Barua, Rajdeep Maity, Dipankar Minj, 17. Swaraja K , “Medical image region based
Tarang Barua, Ashish Kumar Layek, “F-NAD: An watermarking for secured telemedicine”, Multimedia
Application for Fake News Article Detection using Tools and Applications, 77 (21), (2018) ,pp. 28249-
Machine Learning Techniques”, 2019 IEEE Bombay 28280.
Section Signature Conference (IBSSC), (2019) 18. Kumar, P., Singhal, A., Mehta, S., Mittal, A,”Real-
5.Ebtihal A. Hassan, Farid Meziane, “A Survey on time moving object detection algorithm on high-
Automatic Fake News Identification Techniques for resolution videos using GPUs”, Journal of Real-Time
Online and Socially Produced Data”, 2019 Image Processing, 11 (1), (2016) , pp. 93-109.

5
E3S Web of Conferences 309, 01046 (2021) https://doi.org/10.1051/e3sconf/202130901046
ICMED 2021

19.Kumar, S.K., Reddy, P.D.K., Ramesh, G.,


Maddumala, V.R.”Image transformation technique
using steganography methods using LWT technique”,
Traitement du Signal (2019)
20. Mahalle, G., Salunke, O., Kotkunde, N., Gupta,
A.K., Singh, S.K,“Neural network modeling for
anisotropic mechanical properties and work hardening
behavior of Inconel 718 alloy at elevated
temperatures”, Journal of Materials Research and
Technology, 8 (2), pp. 2130-2140.(2019)

You might also like