0% found this document useful (0 votes)
21 views7 pages

Techniques To Detect Spammers in Twitter-A Survey: International Journal of Computer Applications December 2013

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views7 pages

Techniques To Detect Spammers in Twitter-A Survey: International Journal of Computer Applications December 2013

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/262992888

Techniques to Detect Spammers in Twitter- A Survey

Article in International Journal of Computer Applications · December 2013


DOI: 10.5120/14877-3279

CITATIONS READS

52 607

3 authors, including:

Divya Divya
Lethbridge College
1 PUBLICATION 52 CITATIONS

SEE PROFILE

All content following this page was uploaded by Divya Divya on 14 February 2020.

The user has requested enhancement of the downloaded file.


International Journal of Computer Applications (0975 – 8887)
Volume 85 – No 10, January 2014

Techniques to Detect Spammers in Twitter- A Survey


Monika Verma Divya, Ph.D
Ph.D. Scholar Associate Professor Sanjeev Sofat, Ph.D
Department of Computer Science Department of Computer Science Professor
PEC University of Technology PEC University of Technology, Department of Computer Science
Chandigarh, India Chandigarh, India PEC University of Technology,
Chandigarh, India

ABSTRACT more interesting targets for spammers/malicious users. Spam


With the rapid growth of social networking sites for can take different forms on social web sites and is not easy to
communicating, sharing, storing and managing significant be detected. Anyone who is familiar with Internet has faced
information, it is attracting cybercriminals who misuse the spam of some sort, be it e-mail spam, spam on forums,
Web to exploit vulnerabilities for their illicit benefits. Forged newsgroups etc. Spam [18] is defined as the use of electronic
online accounts crack up every day. Impersonators, phishers, messaging system to send unsolicited bulk messages. With the
scammers and spammers crop up all the time in Online Social rise of OSNs, it has become a platform for spreading spam.
Networks (OSNs), and are harder to identify. Spammers are the Spammers intend to post advertisements of products to
users who send unsolicited messages to a large audience with unrelated users. Some spammers post URLs as phishing
the intention of advertising some product or to lure victims to websites which are used to steal user’s sensitive data.
click on malicious links or infecting user’s system just for the
purpose of making money. A lot of research has been done to Many papers have been published on the detection of spam
detect spam profiles in OSNs. In this paper we have reviewed profiles in OSNs. But so far no review paper has been
the existing techniques for detecting spam users in Twitter published in this field which consolidated the existing research.
social network. Features for the detection of spammers could Our paper aims to provide a review of the academic research
be user based or content based or both. Current study provides and work done in this field by various researchers and highlight
an overview of the methods, features used, detection rate and the future research direction. In this paper the techniques
their limitations (if any) for detecting spam profiles mainly in available for detection of spammers in Twitter have been
Twitter. presented along with their analysis and comparison. This paper
is structured as follows: Section 2 describes methodology used
to carry out this review; followed security issues in OSNs
Categories and Subject Descriptors which have been briefed in Section 3; Section 4 presents
[General Literature]: Introductory and Survey definition of spammers and their motives; Introduction to
[Social Networks]: Security Twitter and its threats has been covered in Section 5; Section 6
is about the motivation behind this survey paper; Section 7
General Terms covers the attributes that can be used for detection purpose;
User based features, Content based features, Accuracy, Spam Section 8 reviews the work done by various researchers with a
profiles, Malicious users. comparative analysis; Section 9 gives research directions for
new researchers; finally Section 10 concludes the review.
Keywords
Online Social Networks (OSNs), Twitter, Spammers, 2. METHODOLOGY
Legitimate users. This survey of existing methods for detecting spam profiles in
OSNs has been done after a systematic review with principled
1. INTRODUCTION approach in which major research databases for Computer
According to Boyd et al. [5] a social networking site allows its Science have been searched like IEEE Xplore, ACM Digital
users to (a) construct a profile (b) befriend with a list of other Library, SpringerLink, Google Scholar, ScienceDirect for
users (c) analyze and traverse own and other’s list of friends. concerned topic. We focussed on papers after year 2009 only
These Online Social Networks (OSNs) use Web 2.0 as the concept of social networks came into existence only in
technology, which allows users to interact with each other. 1997 [1] and became popular only later. Then Facebook was
These social networking sites are growing rapidly and launched in the year 2004 [1] which became very popular. So it
changing the way people keep in contacts with each other. In took some time for people to get familiar with these networks
less than 8 years, these sites have shifted from a forte of online for communication and hence the attacks on these networks.
activity to a phenomenon in which millions of internet users This search from above mentioned 5 major databases returned
are engaged. Online communities bring people with same over 60 papers. Papers reviewed for this survey paper were
interests together which makes them easier to keep in contacts selected after reading titles and abstracts of all the papers. Only
with others easily. those papers were chosen that were found suitable for the
present study. Papers with titles and abstracts regarding spam
Social networking sites [5] started with sixdegrees.com in 1997 messages detection and other irrelevant topics are excluded for
and then came up makeoutclub.com in 2000. Sixdegrees.com the present paper so finally a total of 21 papers have been
and other such sites couldn’t survive much and disappeared selected for review. Mainly the papers have been categorized
very soon but new sites like MySpace, LinkedIn, Bebo, Orkut, on the basis of features used to detect spammers.
Twitter etc. became successful. Facebook-the very famous site Through this paper we are trying to compile a list of social
was launched in 2004 [5] and gained a lot of popularity in the networking papers on detection of spam profiles in Twitter that
world. With larger user databases in OSNs, they are becoming we have read. The list may likely be incomplete, but gives

27
International Journal of Computer Applications (0975 – 8887)
Volume 85 – No 10, January 2014

shape to the current research surrounding social network public by default and visible to all those who are following the
spammer detection. After going through this survey paper, new tweeter. Users share these tweets which may contain news,
researchers can easily evaluate what work has been done, in opinions, photos, videos, links, and messages. Following is the
which year and how the present work can be extended to make standard terminology used in Twitter and relevant to our work:
spam detection more accurate. Whenever appropriate, we have  Tweets [3]: A message on Twitter containing maximum
detailed the methodology followed; dataset used; features for length of 140 characters.
detection of spammers and accuracy of the techniques being  Followers & Followings [3]: Followers are the users who
used by various authors. are following a particular user and followings are the users
In particular, the papers cover how spammers engage with whom user follows.
social network users, their implications and existing techniques  Retweet [3]: A tweet that has been reshared with all
to detect these spammers. followers of a user.
 Hashtag [3]: The # symbol is used to tag keywords or
3. SECURITY ISSUES IN OSNs topics in a tweet to make it easily identifiable for search
Online Social Networking sites (OSNs) are vulnerable to purposes.
security and privacy issues because of the amount of user  Mention [3]: Tweets can include replies and mentions of
information being processed by these sites each day. Users of other users by preceding their usernames with @ sign.
social networking sites are exposed to various attacks:  Lists [3]: Twitter provides a mechanism to list users you
1) Viruses – spammers use the social networks as a platform follow into groups
[19] to spread malicious data in the system of users.  Direct Message [3]: Also called a DM, this represents
2) Phishing attacks - user’s sensitive information is acquired by Twitter's direct messaging system for private
impersonating a trustworthy third party [30]. communication amongst users.
3) Spammers - send spam messages to the users of social
networks [11]. As per Twitter policy [16], indicators of spam profiles are
4) Sybil (fake) attack - attacker obtains multiple fake the metrics such as following a large number of users in a
identities and pretends to be genuine in the system in short period of time1or if post consists mainly of links or if
order to harm the reputation of honest users in the network popular hashtags (#) are used when posting unrelated
[20]. information or repeatedly posting other user’s tweets as
5) Social bots- a collection of fake profiles which are your own. There is a provision for users to report spam
created to gather users’ personal data [32]. profiles to Twitter by posting a tweet to @spam. But in
6) Clone and identity theft attacks- where attackers create a Twitter policy [16] there is no clear indication of whether
profile of already existing user in the same network or across there are automated processes that look for these conditions
different networks in order to fool the cloned user’s friends or whether the administrators rely on user reporting,
[23]. If victims accept the friend requests sent by these cloned although it is believed that a combination approach is used.
identities, then attackers will be able to access their
information. These attacks consume extra resources from users 5.2 Threats on Twitter
and systems. 1. Spammed Tweets [13]: Twitter allows its users to
post tweets of maximum 140 characters but
4. TYPES OF SPAMMERS regardless of the character limit, cybercriminals have
Spammers are the malicious users who contaminate the found a way to actually use this limitation to their
information presented by legitimate users and in turn pose a advantage by creating short but compelling tweets
risk to the security and privacy of social networks. Spammers with links for promotions for free vouchers or job
belong to one of the following categories [22]: advertisement posts or other promotions.
1. Phishers: are the users who behave like a normal user 2. Malware downloads [13]: Twitter has been used by
to acquire personal data of other genuine users. cyber criminals to spread posts with links to malware
2. Fake Users: are the users who impersonate the download pages. FAKEAV and backdoor[13]
profiles of genuine users to send spam content to the applications are the examples of Twitter worm that
friends’ of that user or other users in the network. sent
3. Promoters: are the ones who send malicious links of direct messages, and even malware that affected both
advertisements or other promotional links to others Windows and Mac operating systems. The most
so as to obtain their personal information. tarnished social media malware is KOOBFACE [13],
which targeted both Twitter and Facebook.
3. Twitter bots [13]: Cybercriminals tend to use
Motives of Spammers: Twitter to manage and control botnets. These botnets
a) Disseminate pornography control the users’ accounts and pose a threat to their
b) Spread viruses security and privacy.
c) Phishing attacks
d) Compromise system reputation
6. Social Implications of OSNs
Along with the usual problems like spamming, phishing
5. TWITTER AS AN OSN attacks, malware infections, social bots, viruses etc., the greater
5.1 Introduction challenge
Twitter is a social network service launched in March 21, 2006 that social networking sites present for users is to keep private
[14] and has 500 million active users [14] till date who share data secure and confidential.
information. Twitter uses a chirping bird as its logo and hence
the name Twitter. Users can access it to exchange frequent
information called 'tweets' which are messages of up to 140 1
According to Twitter policy [17], if the number of
characters long that anyone can send or read. These tweets are followings of an account is exceeding 2,000, this number
is limited by the number of the account’s followers.

28
International Journal of Computer Applications (0975 – 8887)
Volume 85 – No 10, January 2014

5. Age of account- is obtained from current date and


The purpose of social networking sites is to make information account creation date. Spammers have generally new
easily available and accessible to others. But regrettably, cyber accounts so this feature has less value for spammers.
criminals use this publicly available information to carry out 6. Avg. time between posts- spammers post more
targeted attacks. Once attackers get access to one of user’s tweets in a short period of time in order to gain
accounts, they can easily find a way to excavate more other’s attention.
information and to use this information to access their other 7. Posting time behaviour- spammers tend to post at
accounts and accounts of their friends. fixed time schedule may be early morning or late
night when genuine users don’t use SNS.
6. MOTIVATION BEHIND REVIEW 8. Idle hours- spammers keep sending messages so they
Because of the ease of sharing information and to be in sync have less idle hours.
with ongoing topics, Social Networks have become a target for 9. Tweet frequency- spammers post tweets more
spammers. Detecting such malicious users in OSNs is difficult frequently at odd times to get attention of other users.
as spammers are very well aware of the techniques available to 10. No. of hashtages(#)- spammers tweet multiple
detect them. OSNs provide a perfect platform for spammers to unrelated updates to the most mentioned topics on
disguise as a genuine user and try to get malicious posts Twitter using # to lure legitimate users to read their
clicked by normal users for sake of making money. So tweets.
detecting such users in order to make network secure and keep 11. No. of URLs- spammer’s tweets consist of large
the private information of users confidential is the most number of URLs of malicious sites.
important topic being delved into by various researchers. So 12. @mentions- spammers use maximum @usernames
this paper will be very helpful for researchers to swiftly review of unknown users in their tweets so as to avoid being
the work that has been done in this area. detected.
13. Retweets- Retweets are the replies to any tweet using
@RT symbol and spammers use maximum @RT in
7. FEATURES DISTINGUISHING their tweets.
SPAMMERS & NON-SPAMMERS IN 14. Spam Words- Spammer’s tweets mainly consist of
TWITTER spam words.
Table 1 lists the publications reviewed in this paper and the 15. HTTP links- if tweets contain maximum number of
category of features used for detection of spam profiles in www or http://, then they are posted by spammers.
Twitter. Features on the basis of which spam and non spam 16. Duplicate tweets- spammers tend to post duplicate
profiles are differentiated are user based or content based. User tweets with different @usernames in tweets.
based features are the properties of the profile and the
behaviour of user in any social network and content based 8. EXISTING METHODS FOR
features are the properties of the text posted by users. DETECTION OF SPAM PROFILES IN
Table 1. Features for the detection of spam profiles TWITTER
Different techniques have been used by researchers to find out
Attributes used for detection of spam profiles the spam profiles in various OSNs. We are focussing only on
User based features: the work that has been done to identify spammers in Twitter as
Which include demographic features like profile details, it is not only a social communication media but in fact is used
number of followers, number of followings, to share and spread information related to trending topics in
followers/following ratio, reputation, age of account, avg. real time. Table 2 is showing the summary of the papers
time between tweets posting time behaviour, idle hours, tweet reviewed regarding the detection of spammers in Twitter.
frequency etc.[33,12,34,3,26]
Content based features: Table 2. Outline of techniques used for the
Whic include number of hashtags(#), number of URLs in detection of spammers
tweets, @ mentions, retweets, spam words, HTTP links,
trending topics, duplicate tweets etc.[33,7,11,25] Author Metrics Methodology Dataset Results
User based and content based both [1,22,24,27,29,2,4] Used Used Used
Any other feature like graphical distance, graph connectivity: Alex Hai Graph Compared Validated Naive
Markov clustering method, URL rate, interaction rate, social Wang[1] Based Naive on 500 Bayesian
relations, social activities, graph based features, neighbor and Bayesian, Twitter giving highest
based features, automation based features [21,9,28,33,23,6] Content Neural users with accuracy -
based Network, 20 recent 93.5%
Role of above mentioned features for spam profile detection as SVM and tweets
per Twitter policy [16]: Decision Tree
1. Numbers of followers-spammers have less number of Lee et. User Compared Validated Decorate
followers. al.[22] based Decorate, on 1000 giving highest
2. Numbers of followings-Spammers tend to follow a SimpleLogisti Twitter accuracy-
large number of users. c, FT, users 88.98%
3. Followers/Following Ratio- this ratio is less than 1 LogiBoost,
for spammers. RandomSubS
4. Reputation is defined as the ratio of followers to the pace,
sum of followers and followings. Spammers have Bagging, J48,
reputation<1. LibSVM
Beneven User SVM Validated Accuracy-
uto et. based on 1065 87.6% (with

29
International Journal of Computer Applications (0975 – 8887)
Volume 85 – No 10, January 2014

al.[7] and Twitter user based for identification of spammers. One good point in the approach
Content users and content is that it has been validated on two different combinations of
based based dataset – once with 10% spammers+90% non-spammers and
features) and again with 10% non-spammers+90% spammers. Limitation of
accuracy- the approach is that less dataset has been used for validation.
84.5% (with
Benevenuto et. al. [7] detected spammers on the basis of tweet
only user
content and user based features. Tweet content attributes used
based
are - number of hashtags per number of words in each tweet,
features)
Gee et. User Compared Validated on Accuracy-
number of URLs per word, number of words of each tweet,
al.[12] based Naive 450 Twitter 89.6% number of characters of each tweet, number of URLs in each
Bayesian, SVM users with tweet, number of hashtags in each tweet, number of numeric
200 recent characters that appear in the text, number of users mentioned in
tweets each tweet, number of times the tweet has been retweeted.
McCord User Compared Validated on Radom Forest Fraction of tweets containing URLs, fraction of tweets that
et. al.[24] based and Random Forest, 1000 Twitter giving highest contains spam words, and average number of words that are
content SVM, Naive users with accuracy- hashtags on the tweets are the characteristics that differentiate
based Bayesian, K- 100 recent 95.7%
spammers from non spammers. Dataset of 54 million users on
NN tweets
Lin et. URL rate, J48 Validated on Precision-86%
Twitter has been crawled with 1065 users manually labelled as
al.[28] interactio 400 Twitter spammers and non-spammers. A supervised machine learning
n rate users scheme i.e. SVM classifier has been used to distinguish
Amit A. Introduce Compared Validated on Accuracy- between spammers and non spammers. Detection accuracy of
et. al.[2] d 15 new Random Forest, 31,808 93.6% the system is 87.6% with only 3.6% non-spammers
features Decision Tree, Twitter users misclassified.
Decorate,
Naive Bayesian Twitter facilitates its users to report spam users to them by
Chakrabor User Compared Trained on SVM giving sending a message to “@spam”. So Gee et. al. [12] utilized this
ty et. al.[4] based, Random Forest, 5000 Twitter highest feature and detected spam profiles using classification
Content SVM, Naive users with accuracy-89% technique. Normal user profiles have been collected using
based Bayesian, 200 recent Twitter API and spam profiles have been collected from
Decision Tree tweets
“@spam” in Twitter. Collected data was represented in JSON
Yang et. 18 Compared Validated on Bayesian
al.[6] features Random Forest, two datasets- giving highest
then it was presented in matrix form using CSV format. Matrix
(8- Decision Tree, 5000 users accuracy- has users as rows and features as columns. Then CSV files
existing Decorate, and then 88.6% were trained using Naive Bayes algorithm with 27% error rate
& 10 new Naive Bayesian 3500 users then SVM algorithm has been used with error rate of 10%.
features with 40 Spam profiles detection accuracy is 89.3%. Limitation of this
introduce recent tweets approach is that not very technical features have been used for
d) detection and precision is also less i.e. 89.3% so it has been
suggested that aggressive deployment of any system should be
Significant work has been done by Alex Hai Wang [1] in the done only if precision is more than 99%.
year 2010 which used user based as well as content based
features for detection of spam profiles. A spam detection McCord et.al. [24] used user based features like number of
prototype system has been proposed to identify suspicious friends, number of followers and content based features like
users in Twitter. A directed social graph model has been number of URLs, replies/mentions, retweets, hashtags of
proposed to explore the “follower” and “friend” relationships. collected database. Classifiers namely Random Forest, Support
Based on Twitter’s spam policy, content-based features and Vector Machine (SVM), Naive Bayesian and K-Nearest
user-based features have been used to facilitate spam detection Neighbour have been used to identify spam profiles in Twitter.
with Bayesian classification algorithm. Classic evaluation Method has been validated on 1000 users with 95.7% precision
metrics have been used to compare the performance of various and 95.7% accuracy using the Random Forest classifier and
traditional classification methods like Decision Tree, Support this classifier gives the best results followed by the SMO,
Vector Machine (SVM), Naive Bayesian, and Neural Networks Naive Bayesian and K-NN classifiers. Limitation of this
and amongst all Bayesian classifier has been judged the best in approach is that for considered dataset reputation feature has
terms of performance. Over the crawled dataset of 2,000 users been showing wrong results i.e. it is not able to differentiate
and test dataset of 500 users, system achieved an accuracy of spammers and non-spammers, unbalanced dataset has been
93.5% and 89% precision. Limitation of this approach is that is used so Random Forest is giving best results as this classifier is
has been tested on very less dataset of 500 users by considering generally used in case of unbalanced dataset, and finally the
their 20 recent tweets. approach has been validated on less dataset.

Lee et. al.[22] deployed social honeypots consisting of genuine Lin et. al. [28] detected long-surviving spam accounts in
profiles that detected suspicious users and its bot collected Twitter on the basis of two different features that are URL rate
evidence of the spam by crawling the profile of the user and interaction rate. Most of the papers have used lot many
sending the unwanted friend requests and hyperlinks in features for detection of spam accounts like no of followers, no
MySpace and Twitter. Features of profiles like their posting of following, followers/following ratio, tweet content, no of
behaviour, content and friend information to develop a hashtags, URL links etc. But as per this paper all these features
machine learning classifier have been used for identifying are not so effective in detecting spammers so only simple yet
spammers. After analysis profiles of users who sent unsolicited effective features like URL rate and interaction rate have been
friend requests to these social honeypots in MySpace and used for detection purpose. URL rate is the number of tweets
Twitter have been collected. LIBSVM classifier has been used with URL / total number of tweets and interaction rate is the
number of tweets interacting / total number of tweets. 26,758

30
International Journal of Computer Applications (0975 – 8887)
Volume 85 – No 10, January 2014

accounts have been crawled using Twitter API and 816 long Bayesian Network. Bayesian classifier performs best with an
surviving accounts have been analysed J48 classifier with 86% accuracy of 88.6%. Limitation of this approach is that very less
precision. Limitation of the approach is that only two features data has been crawled and only a particular type of spammers
have been used for spam profile detection and if spammers are being detected with less detection rate which is the lower
keep low URL rate and low interaction rate then this technique bound of the spammers present in the dataset.
will not work as intended.
9. RESEARCH DIRECTIONS
According to Amit A. et. al. [2] there are two types of spammer During survey it became quite apparent that a lot of work has
detection techniques – users centric which are based on the been done for detecting spam profiles in different OSNs. Still
features related to user like followers/following ratio and improvements can be made to get better detection rate by using
another is URL centric which depends on detecting malicious a different technique and covering more and robust features as
URLs. Approach mentioned in this paper is hybrid which deciding parameter. So following are the few conclusions
considers above mentioned both types of features. 15 new drawn from survey:
features have been proposed to detect spammers, along with an
alert system to detect spam tweets. Tweet campaigns and 1. Since Twitter has millions of active users and this
techniques used by spammers have also been studied. Two number is constantly increasing. And almost all the
datasets from Twitter have been used one with 500K users and authors have used very small testing dataset to see the
another with 110,789 users. New features that have been used performance of their approach. So there is a need to
are: Bait oriented features which identify the techniques used increase the testing dataset to see the performance of any
by spammers to lure victims to click on malicious links like no approach.
of mentions, mentions to non-followers, hijacking trends, 2. Need to develop a multivariate model.
intersection with famous trends. Behavioral features include 3. Need to develop a method that can detect all kinds of
variance in tweet interval, variance in no of tweets per unit spammers.
time, ratio of variance in tweet interval to variance in no of 4. Need to test the approaches on different combinations of
tweets per unit time, and tweeting sources. URL features spammers and non-spammers.
include duplicate URLs, duplicate domain names, IP/domain
ratio. Content entropy features include dissimilarity of tweet
content, similarity between tweets, URL and tweet similarity.
10. CONCLUSION
Profile features include follower/following ratio, profile’s Many methods have been developed and used by various
description language dissimilarity. Thereafter all these features researchers to find out spammers in different social networks.
have been collected from malicious users as well as benign From the papers reviewed it can be concluded that most of the
users which were then given to four supervised learning work has been done using classification approaches like SVM,
algorithms like Decision Tree, Random Forest, Bayes Network Decision Tree, Naive Bayesian, and Random Forest. Detection
and Decorate using Weka tool. 93.6% of spammers with false has been done on the basis of user based features or content
positive rate of 1.8% have been detected with Decorate based features or a combination of both. Few authors also
classifier giving best results. This technique has been shown to introduced new features for detection. All the approaches have
outperform Twitter’s spammer detection policy. But this been validated on very small dataset and have not been even
technique has been tested on only 31,808 users whereas Twitter tested with different combinations of spammers and non-
is considering millions of users. spammers. Combination of features for detection of spammers
has shown better performance in terms of accuracy, precision,
Chakraborty et. al. [4] have proposed a system to detect recall etc. as compared to using only user based or content
abusive users who post abusive contents, including harmful based features.
URLs, porn URLs, and phishing links and divert away regular
users and harm the privacy of social networks. Two steps in the 11. REFERENCES
algorithm have been used- first is to check the profile of a user [1] Alex Hai Wang, Security and Cryptography (SECRYPT),
sending friend request to other user as for abusive content and Don’t Follow Me: Spam Detection in Twitter,
second is to check the similarity of two profiles. After these Proceedings of the 2010 International Conference, Pages
two steps it is supposed to recommend whether the user should 1-10, 26-28 July 2010, IEEE.
accept friend request or not. This has been tested on Twitter
dataset of 5000 users which was collected with REST API. [2] Amit A. Amleshwaram, Narasimha Reddy, Sandeep
Features considered for differentiating abusive and non-abusive Yadav, Guofei Gu, Chao Yang, CATS: Characterizing
users are- profile based, content based and timing based. Automation of Twitter Spammers, Texas A&M
Classifiers like SVM, Decision Tree, Random Forest and Naïve University, 2013, IEEE.
Bayesian have been used. SVM outperforms all classifiers and
[3] Anshu Malhotra, Luam Totti, Wagner Meira Jr.,
model is performing with an accuracy of 89%.
Ponnurangam Kumaraguru, Virgılio Almeida, Studying
Yang et. al. [6] utilized new features for the detection of User Footprints in Different Online Social Networks
spammers in Twitter. Various techniques used by spammers ,International Conference on Advances in Social
for evasion have been discussed. 10 new detection features Networks Analysis and Mining, 2012, IEEE/ACM.
including three graph-based features, three neighbor-based
[4] Ayon Chakraborty, Jyotirmoy Sundi, Som Satapathy,
features, three automation-based features and one timing-based
SPAM: A Framework for Social Profile Abuse
feature have been proposed as these features are difficult as
Monitoring.
well as expensive to dodge as they are based on the methods
which spammers don’t use in order to not being detected and [5] Boyd, Ellison, N. B. (2007), Social network sites:
requires more money, resources and time for evasion. A total Definition, history, and scholarship, Journal of Computer-
of 18 features (8 existing and 10 newly introduced) have been Mediated Communication, 13(1), article 11,
used for detecting purpose and these have been tested using http://jcmc.indiana.edu/vol13/issue1/boyd.ellison.html
classifiers like Random Forest, Decision Tree, Decorate and

31
International Journal of Computer Applications (0975 – 8887)
Volume 85 – No 10, January 2014

[6] Chao Yang, Robert Chandler Harkreader, Guofei Gu , Die vol. 6961, Pages 301-317, 2011, Springer, Heidelberg
Free or Live Hard? Empirical Evaluation and New Design ACM.
for Fighting Evolving Twitter Spammers, RAID'11
Proceedings of the 14th international conference on [22] Kyumin Lee, James Caverlee, Steve Webb, Uncovering
Recent Advances in Intrusion Detection, Pages 318-337, Social Spammers: Social Honeypots + Machine Learning,
2011, Springer-Verlag Berlin, Heidelberg, ACM Proceeding of the 33rd international ACM SIGIR
conference on Research and development in information
[7] Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, retrieval, 2010, Pages 435–442, ACM, New York (2010).
and Virgilio Almeida, Detecting Spammers on Twitter,
CEAS 2010 Seventh annual Collaboration, Electronic [23] Leyla Bilge, Thorsten Strufe, Davide Balzarotti, Engin
messaging, Anti Abuse and Spam Conference, July 2010, Kirda, All Your Contacts Are Belong to Us: Automated
Washington, US. Identity Theft Attacks on Social Networks, International
World Wide Web Conference Committee (IW3C2),
[8] Fact Sheet 35: Social Networking Privacy: How to be WWW 2009, April 20–24, 2009, Madrid, Spain, ACM
Safe, Secure and Social
[24] M. McCord, M. Chuah, Spam Detection on Twitter Using
[9] Faraz Ahmed, Muhammad Abulaish, SMIEEE, An MCL- Traditional Classifiers, ATC’11, Banff, Canada, Sept 2-4,
Based Approach for Spam Profile Detection in Online 2011, IEEE.
Social Networks, IEEE 11th International Conference on
Trust, Security and Privacy in Computing and [25] Manuel Egele, Gianluca Stringhini, Christopher Kruegel,
Communications, 2012. and Giovanni Vigna, COMPA: Detecting Compromised
Accounts on Social Networks.
[10] Georgios Kontaxis, Iasonas Polakis, Sotiris Ioannidis and
Evangelos P. Markatos, Detecting Social Network Profile [26] Marcel Flores, Aleksandar Kuzmanovic, Searching for
Cloning, 3rd International Workshop on Security and Spam: Detecting Fraudulent Accounts via Web Search,
Social Networking, 2011, IEEE. LNCS 7799, pp. 208–217, 2013. Springer-Verlag Berlin
Heidelberg 2013.
[11] Gianluca Stringhini, Christopher Kruegel, Giovanni
Vigna, Detecting Spammers on Social Networks, [27] Mauro Conti, Radha Poovendran, Marco Secchiero,
University of California, Santa Barbara, Proceedings of FakeBook: Detecting Fake Profiles in On-line Social
the 26th Annual Computer Security Applications Networks, IEEE/ACM International Conference on
Conference, ACSAC ’10, Austin, Texas USA, pages 1-9, Advances in Social Networks Analysis and Mining, 2012.
Dec. 6-10, 2010, ACM. [28] Po-Ching Lin, Po-Min Huang, A Study of Effective
[12] Grace gee, Hakson Teh, Twitter Spammer Profile Features for Detecting Long-surviving Twitter Spam
Detection, 2010. Accounts, Advanced Communication Technology
(ICACT), 15th International Conference on 27-30 Jan.
[13] http://about-threats.trendmicro.com/us/webattack- 2013, IEEE.
Information regarding Twitter threats.
[29] Sangho Lee and Jong Kimz, WARNINGBIRD: Detecting
[14] http://en.wikipedia.org/wiki/Twitter-Information of Suspicious URLs in Twitter Stream, 19th Network and
Twitter. Distributed System Security Symposium (NDSS), San
Diego, California, USA, February 5-8, 2012.
[15] http://expandedramblings.com/index.php/march-2013-by-
the-numbers-a-few-amazing-twitter-stats-Regarding [30] T. Jagatic, N. Johnson, M. Jakobsson, and F. Menczer,
statistics of Twitter. “Social phishing,” Communications of the ACM , vol.
50, no. 10, pp. 94–100, 2007.
[16] http://help.twitter.com/forums/26257/entries/1831- The
Twitter Rules. [31] Vijay A. Balasubramaniyan, Arjun Maheswaran,
Viswanathan Mahalingam, Mustaque Ahamad, H.
[17] http://twittnotes.com/2009/03/, 2000-following-limit-on- Venkateswaran, A Crow or a Blackbird? Using True
twitter.html-The 2000 Following Limit Policy on Twitter. Social Network and Tweeting Behavior to Detect
[18] http://www.spamhaus.org/consumer/definition-Spam Malicious Entities in Twitter, 2002, ACM
Definition. [32] Y. Boshmaf, I. Muslukhov, K. Beznosov, and M.
[19] J. Baltazar, J. Costoya, and R. Flores, “The real face of Ripeanu, “The socialbotnetwork: when bots socialize for
koobface: Thelargest web 2.0 botnet explained,” Trend fame and money,” in Proceedings of the 27th Annual
Micro Threat Research , 2009. Computer Security Applications Conference. ACM,2011,
pp. 93–102.
[20] J. Douceur, “The sybil attack,” Peer-to-peer Systems, pp.
251–260, 2002.[12] D. Irani, M. Balduzzi, D. Balzarotti, [33] Yin Zhuy, Xiao Wang, Erheng Zhong, Nanthan N. Liuy,
E. Kirda, and C. Pu, “Reverse socialengineering attacks in He Li, Qiang Yang, Discovering Spammers in Social
online social networks,” Detection of Intrusionsand Networks, Proceedings of the Twenty-Sixth AAAI
Malware, and Vulnerability Assessment , pp. 55–74, Conference on Artificial Intelligence.
2011. [34] Zhi Yang, Christo Wilson, Xiao Wang, Tingting Gao, Ben
[21] Jonghyuk Song, Sangho Lee and Jong Kim, Spam Y. Zhao, and Yafei Dai, Uncovering Social Network
Filtering in Twitter using Sender-Receiver Relationship, Sybils in the Wild, Proceedings of the 11th ACM/USENIX
RAID'11 Proceedings of the 14th International Internet Measurement Conference (IMC’11), 2011.
Conference on Recent Advances in Intrusion Detection,
[35]

IJCATM : www.ijcaonline.org 32

View publication stats

You might also like