0% found this document useful (0 votes)
36 views9 pages

A Survey of Methods For Spotting Spammers On Twitter

Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-7 | Issue-3 , June 2023, URL: https://www.ijtsrd.com/papers/ijtsrd57439.pdf Paper URL: https://www.ijtsrd.com.com/medicine/ayurvedic/55120/a-clinical-study-in-the-management-of-bahya-arsha-external-haemorrhoids-piles-with-suvarchikadi-lepa-study-review/dr-laxman-marutirao-wandekar

Uploaded by

Editor IJTSRD
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views9 pages

A Survey of Methods For Spotting Spammers On Twitter

Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-7 | Issue-3 , June 2023, URL: https://www.ijtsrd.com/papers/ijtsrd57439.pdf Paper URL: https://www.ijtsrd.com.com/medicine/ayurvedic/55120/a-clinical-study-in-the-management-of-bahya-arsha-external-haemorrhoids-piles-with-suvarchikadi-lepa-study-review/dr-laxman-marutirao-wandekar

Uploaded by

Editor IJTSRD
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

International Journal of Trend in Scientific Research and Development (IJTSRD)

Volume 7 Issue 3, May-June 2023 Available Online: www.ijtsrd.com e-ISSN: 2456 – 6470

A Survey of Methods for Spotting Spammers on Twitter


Hareesha Devi, Pankaj Verma, Ankit Dhiman
Department of Computer Science, Arni University Kathgarh, Indora, Himachal Pradesh, India

ABSTRACT How to cite this paper: Hareesha Devi |


Social networking sites' explosive expansion as a means of Pankaj Verma | Ankit Dhiman "A
information sharing, management, communication, storage, and Survey of Methods for Spotting
management has attracted hackers who abuse the Web to take Spammers on
advantage of security flaws for their own nefarious ends. Every day, Twitter" Published
in International
forged internet accounts are compromised. Online social networks
Journal of Trend in
(OSNs) are rife with impersonators, phishers, scammers, and Scientific Research
spammers who are difficult to spot. Users who send unsolicited and Development
communications to a large audience with the objective of advertising (ijtsrd), ISSN: 2456- IJTSRD57439
a product, entice victims to click on harmful links, or infect users' 6470, Volume-7 |
systems only for financial gain are known as spammers. Many Issue-3, June 2023, pp.694-702, URL:
studies have been conducted to identify spam profiles in OSNs. In www.ijtsrd.com/papers/ijtsrd57439.pdf
this essay, we have discussed the methods currently in use to identify
spam Twitter users. User-based, content-based, or a combination of Copyright © 2023 by author (s) and
both features could be used to identify spammers. The current paper International Journal of Trend in
gives a summary of the traits, methodologies, detection rates, and Scientific Research and Development
Journal. This is an
restrictions (if any) for identifying spam profiles, primarily on
Open Access article
Twitter. distributed under the
KEYWORDS: Twitter, legitimate users, online social networks (OS's), and terms of the Creative Commons
spammers Attribution License (CC BY 4.0)
(http://creativecommons.org/licenses/by/4.0)

INTRODUCTION
A social networking site, according to Boyd et al. difficult to identify. Anyone who has used the
[5,] enables users to (a) create a profile, (b) Internet has encountered spam of some kind,
befriend a list of other users, and (c) examine and whether it is in emails, forums, newsgroups, etc.
navigate their own and other users' buddy lists. Spam [18] is defined as the practise of sending
Through the use of Web 2.0 technology, these unsolicited bulk messages over electronic
online social networks (OSNs) enable user messaging systems. OSNs have grown in
interaction. These social networking sites are popularity and are now used as a platform for spam
expanding quickly and altering how individuals distribution. Spammers want to send product ads to
communicate with one another. These websites users who are not connected to them. Some
have transformed in less than 8 years from a spammers post URLs that similar website had a
specialised area of online activity to a phenomenon short lifespan and quickely faded, lead to phishing
that attracts millions of internet users. Online websites where users’ sensitive information is
communities bring people with similar interests stolen. The detection of spam profiles in OSNs has
together, making it simpler for them to stay in been the subject of numerous papers. However, no
touch with one another. Sixdegrees.com was the review paper that consolidates the available
first social networking site to launch in 1997, and research has yet been published in this sector. The
makeoutclub.com followed in 2000. purpose of our paper is to examine the academic
Sixdegrees.com and while new websites like research and work that have been done in this area
MySpace, LinkedIn, Bebo, Orkut, Twitter, etc. by various scholars and to highlight the potential
found success. Facebook, a very well-known directions for future research. The methods for
website, was introduced in 2004 [5] and rapidly identifying spammers on Twitter have been
rose to fame throughout the globe. OSNs' greater researched and compared in this study, along with
user numbers make them more appealing targets their presentation. The format of this essay is as
for spammers and malevolent users. On social follows: The approach used to conduct this review
media websites, spam can take many forms and is is described in Section 2, which is followed by a

@ IJTSRD | Unique Paper ID – IJTSRD57439 | Volume – 7 | Issue – 3 | May-June 2023 Page 694
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
briefing on security issues in OSNs in Section 3. networks as well as current methods for identifying
Spammers are defined in Section 4 along with their them.
motivations; the introduction to Twitter and its
SECURITY ISSUES IN OSNs
risks is given in Section 5; the purpose of this
Online social networking sites (OSNs) are
survey study is covered in Section 6; the properties susceptible to security and privacy problems due to
that can be used for detection purposes are covered
the volume of user data that these sites process
in Section 7; A comparative examination of the daily. Social networking site users are vulnerable
research produced by various researchers is
to a range of attacks:
reviewed in Section 8; new researchers are given 1. Viruses - spammers utilise social networks as a
research recommendations in Section 9; and the
distribution channel [19] for dangerous files to
review is concluded in Section 10.
infect users' systems.
METHODOLOGY 2. Phishing attacks: By pretending to be a reliable
After conducting a systematic review using a third party, users' sensitive information is
principled approach and searching major research
obtained [30].
databases for computer science like IEEE Xplore,
ACM Digital Library, Springer Link, Google 3. Users of social networks are bombarded with
Scholar, and Science Direct for relevant topics, the spam messages by spammers [11].
current methods for detecting spam profiles in 4. Sybil (fake) attack - attacker creates a number
OSNs were surveyed. We concentrated exclusively of false identities and poses as a real user in the
on studies published after 2009 since social system to undermine the reputation of
networks were not conceptualised until 1997 [1], trustworthy users in the network [20].
and only afterwards did they gain widespread
acceptance. Then, in 2004 [1], Facebook was 5. Social bots, a group of fictitious personas made
introduced, and it quickly gained popularity. As a to capture user information [32].
result, it took some time for people to become 6. Attacks involving cloning and identity theft, in
accustomed to using these networks for which perpetrators construct a profile of an
communication, which is why these networks have already-existing user on the same network or
been attacked. Over 60 papers were found after across many networks in an effort to deceive
searching the five major databases mentioned the cloned user's friends [23]. Attackers will
above. After reviewing all of the paper titles and gain access to victims' information if they allow
abstracts, the papers that will be reviewed for this the friend requests provided by these cloned
survey were chosen. Only papers that were deemed identities. Users and systems are overextended
appropriate for the current investigation were by these attacks.
selected. 21 papers in total have been chosen for
evaluation after publications with titles and TYPES OF SPAMMERS
abstracts relating to spam message detection and The fraudulent users known as spammers put
other unrelated areas were eliminated. The majority social networks' security and users' privacy at risk
of the criteria used to identify spammers have been by tainting the data shared by legal users. One of
used to categorise the papers. the following categories best describes spammers
[22]:
Through this essay, we're attempting to assemble a 1. Phishers are people who act normally but are
list of social networking papers we've read about actually out to steal the personal information of
identifying spam accounts on Twitter. The list is other real users.
probably not comprehensive, but it lends shape to
the ongoing study on identifying social network 2. Fake Users: These are users who spoof real
spammers. After reading this survey study, new users' profiles in order to distribute spam to
researchers will find it simple to assess what their friends or other network users.
research has been done, when, and how the current 3. Promoters: These are people that spread
work may be expanded to improve spam detection. harmful links in advertisements or other
Every time it was appropriate, we included details promotional materials to other people in an
on the methodology used, the dataset used, the effort to collect their personal data.
features for spammer detection, and the efficacy of
Spammers' motivations:
the strategies employed by different writers.
Promote pornography, spread malware, launch
The papers discuss, in particular, the ramifications phishing attacks, and harm the reputation of the
of spammers' interactions with members of social system.

@ IJTSRD | Unique Paper ID – IJTSRD57439 | Volume – 7 | Issue – 3 | May-June 2023 Page 695
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
TWITTER AS AN OSN the most of it by creating succinct but
Introduction compelling tweets that include links to
Twitter is a social networking website with 500 promotions for free vouchers, job postings, or
million active users [14] as of today who share other promotions.
information. It was first introduced on March 21, 2. Downloads of malware [13]: Cybercriminals
2006 [14]. Twitter's logo is a chirping bird, hence
have shared tweets with links to websites where
the name of the website. Users can access it to malware can be downloaded using Twitter. The
exchange frequent information called “tweets”
Twitter worms that transmitted direct messages
which are messages of up to 140 characters long and even malware that attacked both Windows
that anyone can send or read. These tweets are
and Mac operating systems include FAKEAV
public by default and visible to all those who are
and backdoor[13] programmes. KOOBFACE
following the tweeter. Users share these tweets
[13], a piece of social media virus that attacked
which may contain news, opinions, photos, videos,
both Facebook and Twitter, has the worst
links, and messages. Following is the standard
reputation.
terminology used in Twitter and relevant to our
work: 3. Twitter bots [13]: Online criminals frequently
utilise Twitter to run and manage botnets.
Tweets [3]: A Twitter message that is no longer These botnets threaten the security and privacy
than 140 characters.
of the users by controlling their accounts.
Followers and Followings [3]: Followers are users
Social Implications of OSNs
who a specific user is following, while Followings
In addition to the typical issues that social
are people that a user is following.
networking sites bring for users, such as spamming,
Retweet [3]: A tweet that has been forwarded to a phishing assaults, malware infestations, social bots,
user's entire following. viruses, etc., the biggest challenge is maintaining
Hashtags [3]: The # sign is used to annotate the security and confidentiality of private data.
keywords or subjects in a tweet so that search Twitter policy states that if an account has more
engines may quickly find them. than 2,000 followers, this amount is constrained by
Mention [3]: You can include replies and mentions the number of followers the account has.
of other users in tweets by using the @ sign in Social networking websites are created with the
front of their usernames. intention of making information readily available
and accessible to others. But tragically,
Lists [3]: Twitter offers a tool for grouping the
cybercriminals exploit this information, which is
persons you follow into lists.
readily accessible, to launch focused assaults.
Direct Message [3]: Also known as a DM, this Attackers can easily find a means to gain access to
refers to Twitter's mechanism for direct messaging a user's account in order to gather more information
users to communicate privately. and use that information to gain access to the user's
According to Twitter policy [16], signs of spam other accounts and the accounts of their friends.
profiles include metrics like following a lot of MOTIVATING REVIEW
users quickly,1 posting mostly links, using popular Social networks have been a target for spammers
hashtags (#) when posting unrelated information, due to the simplicity of information sharing and the
and repeatedly posting other users' tweets as your ability to stay up to date with current subjects. It
own. By tweeting to @spam, users have the option can be challenging to identify such fraudulent
to report spammy profiles to Twitter. However, the individuals in OSNs because spammers are well-
Twitter policy [16] does not make it clear whether aware of the methods available to identify them.
managers utilise user reports or automated For the purpose of collecting money, spammers
processes to look for these circumstances, despite can utilise OSNs as the ideal platform to pose as
the fact that it is assumed that both approaches are legitimate users and attempt to convince innocent
used. users to click on harmful posts. The most crucial
Threats on Twitter area being researched by numerous experts is how
1. Spammed Tweets [13]: Twitter only allows to identify such people in order to safeguard the
users to post tweets with a maximum of 140 network and protect users' private information. In
characters, but despite this restriction, order to quickly evaluate the work that has been
cybercriminals have discovered a way to make done in this field, researchers will find this paper to
be of great assistance.

@ IJTSRD | Unique Paper ID – IJTSRD57439 | Volume – 7 | Issue – 3 | May-June 2023 Page 696
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
FEATURES DISTINGUISHING SPAMMERS user-based or content-based characteristics. In any
& NON-SPAMMERS IN TWITTER social network, user-based features are the
The papers analysed in this study are shown in characteristics of the user's profile and behaviour,
Table 1, along with the type of features that were whereas content-based features are the
utilised to identify spam Twitter profiles. Spam and characteristics of the text that users publish.
non-spam profiles can be distinguished by either
Table 1 Features for the detection of spam profiles
Attributes used for detection of spam profiles
User based features:
which contain demographic information such as profile information, follower and
following numbers, followers-to-followers ratio, reputation, account age, average time
between tweets, posting habits, idle hours, tweet frequency, etc.[33,12,34,3,26]
Content based features:
among them are the quantity of hashtags (#), the quantity of URLs in tweets, @ mentions,
retweets, spam terms, HTTP links, trending topics, duplicate tweets, etc.[33,7,11,25]
User based and content based both [1,22,24,27,29,2,4]
Any additional features, such as graph connectedness or pictorial distance: Graph-based
features, neighbor-based features, interaction-based features, social links, social activities,
and Markov clustering method [21,9,28,33,23,6]
Function of the aforementioned features in identifying spam profiles in accordance with Twitter rules [16]:
1. The quantity of followers—spammers have fewer followers.
2. The amount of followers—Spammers frequently follow a lot of users.
3. Followers/Following Ratio: Spammers have a ratio of less than 1.
4. The ratio of followers to the total of followers and followings is referred to as reputation. Spammers are
well-known.
5. Account age is calculated using the current date and the account's inception date. Since spammers
typically create fresh accounts, this feature is less useful to them.
6. Average time between posts - In order to attract attention, spammers send out more tweets quickly.
7. Posting Time Behaviour: Spammers frequently post at predetermined times, such as early in the
morning or late at night when real users aren't using social media.
8. Idle hours: Spammers continue to send messages to cut down on their idle time.
9. Tweet frequency: To attract other users' attention, spammers tweet more frequently and at unusual
hours.
10. The quantity of hashtags (#) used by spammers to entice genuine users to read their tweets by posting
numerous unrelated updates to the most popular topics on Twitter.
11. URLs: Spammers frequently tweet a big number of URLs to dangerous websites.
12. @mentions: In order to avoid being found, spammers use as many @usernames of unknown
individuals as possible in their tweets.
13. Retweets are replies to any tweet that contain the @RT symbol, and spammers frequently utilise @RT
in their tweets.less free time.
14. Spam Words – The majority of spammers' tweets contain spam words.
15. HTTP links - Tweets made by spammers contain the most www or http:// characters.
16. Duplicate tweets: Spammers frequently use many @usernames in their tweets to post identical tweets.
EXISTING METHODS FOR DETECTION OF SPAM PROFILES IN TWITTER
Researchers have employed a variety of strategies to identify the spam profiles in distinct OSNs. As
Twitter is used to discuss and disseminate information about trending topics in real time rather than just as
a social communication platform, we are concentrating primarily on the work that has been done to identify
spammers on Twitter. The summary of the papers that were looked at about the identification of spammers
on Twitter is shown in Table 2.

@ IJTSRD | Unique Paper ID – IJTSRD57439 | Volume – 7 | Issue – 3 | May-June 2023 Page 697
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
In 2010, Alex Hai Wang [1] made significant progress in the area of spam profile detection using both
user- and content-based features. To find suspicious Twitter users, a prototype spam detection system has
been presented. To investigate the "follower" and "friend" relationships, a directed social graph model has
been put forth. Using a Bayesian classification technique, content-based characteristics and user-based
features have been employed to make spam detection easier in accordance with Twitter's spam policy. The
performance of numerous traditional classification techniques, including Decision Trees, Support Vector
Machines (SVM), Naive Bayesian, and Neural Networks, has been compared using standard evaluation
measures, and among all of them, the Bayesian classifier has been found to perform the best. The algorithm
attained a 93.5% accuracy and an 89% precision across the 2,000 users in the crawling dataset and the 500
users in the test dataset. This method's limitation is that it was only evaluated on a very small dataset of 500
individuals by taking into account their 20 most recent tweets.
When Lee et al. [22] installed social honeypots made up of real profiles, they were able to identify
suspicious users, and their bot gathered proof of spam by crawling the profile of the user who sent the
unsolicited friend requests and URLs on Twitter and MySpace. Spammers have been identified using
characteristics of profiles such as their posting habits, content, and friend information to build machine
learning classifiers. Following investigation, profiles of users who contacted these social honeypots on
Twitter and MySpace via unsolicited friend requests have been gathered. Spammers have been identified
using the LIBSVM classifier. The approach's validation on two separate dataset combinations—10%
spammers+90% non-spammers and 10% non-spammers+90% spammers—is one of its strong points. The
approach has a drawback because fewer datasets have been utilised for validation.
Based on the content of tweets and user-based attributes, Benevenuto et al. [7] identified spammers. The
following tweet content attributes are used: the quantity of hashtags per word, the quantity of URLs per
word, the quantity of words per tweet, the quantity of characters per tweet, the quantity of hashtags per
tweet, the quantity of numeric characters in the text, the quantity of users mentioned in each tweet, and the
quantity of times the tweet has been retweeted. The features that set spammers apart from non-spammers
include the percentage of tweets that contain URLs, the percentage of tweets that contain spam words, and
the average amount of words that are hashtags on the tweets. 54 million Twitter users have been crawled,
and 1065 users have been manually classified as spammers and non-spammers. Spammers and non-
spammers have been separated using supervised machine learning, or SVM classifier. The system's
detection accuracy is 87.6%, with only 3.6% of non-spammers incorrectly categorised.
Sending a message to "@spam" on Twitter enables users to report spam accounts to the company. Gee et al.
[12] took use of this property and used a classification technique to find spam profiles. Both spam and
regular user profiles have been gathered using the Twitter API and "@spam" in Twitter, respectively. The
collected data was first represented in JSON before being provided in CSV format as a matrix. Users are
rows in the matrix, and features are columns. Then CSV files were trained using Naive Bayes algorithm
with 27% error rate then SVM algorithm has been used with error rate of 10%. Spam profiles detection
accuracy is 89.3%. Limitation of this approach is that not very technical features have been used for
detection and precision is also less i.e. 89.3% so it has been suggested that aggressive deployment of any
system should be done only if precision is more than 99%.
McCord et al. [24] employed content-based features such the quantity of URLs, replies/mentions, retweets,
and hashtags as well as user-based features like the quantity of friends and followers. Spam profiles on
Twitter have been identified using classifiers including Random Forest, Support Vector Machine (SVM),
Naive Bayesian, and K-Nearest Neighbour. The Random Forest classifier, which yields the best results after
the SMO, Naive Bayesian, and K-NN classifiers, has been validated on 1000 users with 95.7% precision and
95.7% accuracy. As a result of the unbalanced dataset used and the fact that Random Forest is typically used
in cases of unbalanced datasets, this approach's limitation is that reputation feature has been giving incorrect
results for the considered dataset, failing to distinguish between spammers and non-spammers. Finally, the
approach has only been validated on a small sample size.
Using two distinct features—URL rate and interaction rate—Lin et al. [28] identified persistent spam
accounts in Twitter. Many different indicators, including the number of followers, number of followings,
followers-to-following ratio, tweet content, number of hashtags, URL links, etc., have been utilised by the
majority of publications to identify spam accounts. However, according to this study, all of these features
are not very good at spotting spammers, hence only straightforward yet useful features like URL rate and

@ IJTSRD | Unique Paper ID – IJTSRD57439 | Volume – 7 | Issue – 3 | May-June 2023 Page 698
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
interaction rate have been employed for identification. The ratio of tweets with URLs to all tweets is known
as the URL rate, while the ratio of tweets that interact with one another is known as the interaction rate.
Twitter API was used to crawl 26,758 accounts, and J48 classifier analysis was performed on 816 long-
surviving accounts with an accuracy rate of 86%. The approach's limitation is that only two variables were
utilised to detect spam profiles; hence, if spammers maintain low URL rates and low interaction rates, the
system will not function as planned.
There are two different kinds of spammer detection systems, according to Amit A. et al. [2]: one is URL-
centric, which relies on identifying fraudulent URLs, and the other is user-centric, which is based on
features relating to people such followers/following ratio. The method used in this research is a hybrid one
that takes into account both of the properties listed above. Along with an alert system to identify spam
tweets, 15 new features have been proposed to catch spammers. Spammers' tweet campaigns and methods
have also been researched. A dataset from Twitter with 500K users and another with 110,789 individuals
were both used. Bait-oriented features, which highlight the strategies used by spammers to get victims to
click on harmful links, include mentions to non-followers, trend hijacking, and trend intersection with well-
known trends. Tweet interval variation, tweet volume variation, ratio of tweet interval variation to tweet
volume variation, and tweeting sources are examples of behavioural characteristics. Duplicate URLs,
duplicate domain names, and IP/domain ratio are examples of URL characteristics. Dissimilarity of tweet
content, similarity of tweets, and URL and tweet similarity are all examples of content entropy properties.
Follower/following ratio and the profile's description language dissimilarity are aspects of the profile. Then,
using the Weka tool, all of these features were gathered from both malicious and benign users and fed into
four supervised learning algorithms: Decision Tree, Random Forest, Bayes Network, and Decorate. With
Decorate's classifier, which produces the best results, 93.6% of spammers have been found. It has been
demonstrated that this method performs better than Twitter's spammer detection strategy. However, this
method has only been tested on 31,808 individuals, whereas Twitter is taking into account millions of users.
A technique to identify abusive us ers that publish offensive content, including dangerous URLs,
pornographic URLs, and phishing links, drive regular users from social networks, and violate their privacy
has been presented by Chakraborty et al. [4]. The algorithm has two steps: the first checks a user's profile for
offensive content before sending a friend request to another user, and the second checks the similarity of two
profiles. If the user should accept a friend invitation after these two phases is up to the recommendation.
This has been tested with a 5000 user Twitter dataset that was gathered using the REST API. Timing,
content, and profile-based criteria are all taken into account when determining how to distinguish between
abusive and non-abusive users. There have been SVM, Decision Tree, Random Forest, and Nave Bayesian
classifiers employed. All classifiers are outperformed by SVM, and the model is operating at an accuracy of
89%.
New features were used by Yang et al. [6] to identify spammers on Twitter. There have been discussions of
a number of evasion strategies used by spammers. Ten new detection features have been proposed, including
three graph-based features, three neighbor-based features, three automation-based features, and one timing-
based feature. These features are expensive and difficult to get around because they are based on techniques
that spammers don't use to avoid detection and require more time, money, and resources. With the help of
classifiers like Random Forest, Decision Tree, Decorate, and Bayesian Network, 18 features—eight already
existent and ten new—have been examined for detection purposes. A Bayesian classifier's accuracy of
88.6% is the best. This method has a limitation in that very little data has been crawled and only a specific
sort of spammers is being found with a low detection rate, which is the minimum number of spammers
found in the dataset.
RESEARCH DIRECTIONS
During the survey, it became pretty clear that there has been a lot of work done to identify spam profiles in
various OSNs. Even so, the detection rate can be improved by switching up the method and using more
substantial features as the determining factor. The following are a few findings from the survey:
1. Considering that Twitter has millions of active users, and this number is growing. Additionally, almost
all writers used a relatively limited testing dataset to evaluate the effectiveness of their methodology.
Therefore, in order to evaluate the effectiveness of any strategy, the testing dataset must be expanded.
2. A multivariate model must be developed.
3. A technique that can identify all types of spammers must be developed.
4. It is necessary to test the methods using various mixtures of spammers and non-spammers.

@ IJTSRD | Unique Paper ID – IJTSRD57439 | Volume – 7 | Issue – 3 | May-June 2023 Page 699
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
Table 2 Outline of techniques used for the detection of spammers
Author Metrics Used Methodology Used Dataset Used Results
Compared Naive
Graph Based Validated on 500 Naive Bayesian giving
Alex Hai Bayesian, Neural
and Content Twitter users with highest accuracy -
Wang[1] Network, SVM and
based 20 recent tweets 93.5%
Decision Tree
Compared Decorate,
SimpleLogisti c, FT,
Decorate giving
Lee et. LogiBoost, Validated on 1000
User based highest accuracy-
al.[22] RandomSubS pace, Twitter users
88.98%
Bagging, J48,
LibSVM
Accuracy-
87.6% (with user
User Validated based and content
Beneven
Based and SVM on 1065 Twitter based features) and
uto et.al.[7]
Content based users accuracy- 84.5% (with
only user based
features)
Validated on 450
Gee et. Compared Naive Twitter users with
User based Accuracy- 89.6%
al.[12] Bayesian, SVM 200 recent
tweets
Compared Random Validated on 1000
Radom Forest giving
McCord et. User based and Forest, SVM, Naive Twitter users with
highest accuracy-
al.[24] content based Bayesian, K- 100 recent
95.7%
NN tweets
Lin et. URL rate, Validated on
J48 Precision-86%
al.[28] interactio n rate 400 Twitter users
Compared Random
Amit A. et. Introduce d 15 Forest, Decision Tree, Validated on 31,808
Accuracy- 93.6%
al.[2] new features Decorate, Naive Twitter users
Bayesian
Compared Random Trained on 5000
Chakrabor User based, Forest, SVM, Naive Twitter users with SVM giving highest
ty et. al.[4] Content based Bayesian, 200 recent accuracy-89%
Decision Tree tweets
18 Validated on two
features (8- Compared Random datasets- 5000 users
Bayesian giving
Yang et. existing & 10 Forest, Decision Tree, and then 3500
highest accuracy-
al.[6] new features Decorate, Naive users
88.6%
introduce Bayesian with 40 recent
d) tweets
CONCLUSION only tried with a single combination of spammers
Researchers have created and employed a variety and non-spammers and were validated on a very
of techniques to identify spammers on various limited dataset. In comparison to employing solely
social networks. The majority of the work has been user-based or content-based characteristics,
done utilising classification approaches like SVM, combining features for the detection of spammers
Decision Tree, Naive Bayesian, and Random has demonstrated improved performance in terms
Forest, as can be inferred from the publications of accuracy, precision, recall, etc.
examined. User-based features, content-based
REFERENCES
features, or a combination of both have been used [1] Don’t Follow Me: Spam Detection in Twitter,
for detection. A few authors additionally added Proceedings of the 2010 International
new detection features. All of the methods were

@ IJTSRD | Unique Paper ID – IJTSRD57439 | Volume – 7 | Issue – 3 | May-June 2023 Page 700
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
Conference, Pages 1-10, 26-28 July 2010, [12] Gianluca Stringhini, Christopher Kruegel,
IEEE. Giovanni Vigna, Detecting Spammers on
Social Networks, University of California,
[2] Amit A. Amleshwaram, Narasimha Reddy,
Santa Barbara, Proceedings of the 26th Annual
Sandeep Yadav, Guofei Gu, Chao Yang,
Computer Security Applications Conference,
CATS: Characterizing Automation of Twitter
ACSAC ’10, Austin, Texas USA, pages 1-9,
Spammers, Texas A&M University, 2013,
Dec. 6-10, 2010, ACM.
IEEE.
[13] Grace gee, Hakson Teh, Twitter Spammer
[3] Anshu Malhotra, Luam Totti, Wagner Meira
Profile Detection, 2010.
Jr., Ponnurangam Kumaraguru, Virgılio
Almeida, Studying User Footprints in Different [14] http://about-
Online Social Networks threats.trendmicro.com/us/webattack-
Information regarding Twitter threats.
[4] ,International Conference on Advances in
Social Networks Analysis and Mining, 2012, [15] http://en.wikipedia.org/wiki/Twitter-
IEEE/ACM. Information of Twitter.
[5] Ayon Chakraborty, Jyotirmoy Sundi, Som [16] http://expandedramblings.com/index.php/march
Satapathy, SPAM: A Framework for Social -2013-by-the-numbers-a-few-amazing-
Profile Abuse Monitoring. twitter-stats-Regarding statistics of Twitter.
[6] Boyd, Ellison, N. B. (2007), Social network [17] http://help.twitter.com/forums/26257/entries/
sites: Definition, history, and scholarship, 1831- The Twitter Rules.
Journal of Computer- Mediated [18] http://twittnotes.com/2009/03/, 2000-
Communication, 13(1), article 11, following-limit-on- twitter.html-The 2000
http://jcmc.indiana.edu/vol13/issue1/boyd.elliso Following Limit Policy on Twitter.
n.html
[19] http://www.spamhaus.org/consumer/definitio
[7] Chao Yang, Robert Chandler Harkreader, n-Spam Definition.
Guofei Gu , Die Free or Live Hard? Empirical
Evaluation and New Design for Fighting [20] J. Baltazar, J. Costoya, and R. Flores, “The
Evolving Twitter Spammers, RAID'11 real face of koobface: Thelargest web 2.0
Proceedings of the 14th international botnet explained,” Trend Micro Threat
conference on Recent Advances in Intrusion Research , 2009.
Detection, Pages 318-337, 2011, Springer- [21] J. Douceur, “The sybil attack,” Peer-to-peer
Verlag Berlin, Heidelberg, ACM Systems, pp. 251–260, 2002.[12] D. Irani, M.
[8] Fabricio Benevenuto, Gabriel Magno, Tiago Balduzzi, D. Balzarotti,
Rodrigues, and Virgilio Almeida, Detecting [22] E. Kirda, and C. Pu, “Reverse
Spammers on Twitter, CEAS 2010 Seventh socialengineering attacks in online social
annual Collaboration, Electronic messaging, networks,” Detection of Intrusionsand
Anti Abuse and Spam Conference, July 2010, Malware, and Vulnerability Assessment , pp.
Washington, US. 55–74, 2011.
[9] Fact Sheet 35: Social Networking Privacy: [23] Jonghyuk Song, Sangho Lee and Jong Kim,
How to be Safe, Secure and Social Spam Filtering in Twitter using Sender-
[10] Faraz Ahmed, Muhammad Abulaish, SMIEEE, Receiver Relationship, RAID'11 Proceedings
An MCL- Based Approach for Spam Profile of the 14th International Conference on
Detection in Online Social Networks, IEEE Recent Advances in Intrusion Detection, vol.
11th International Conference on Trust, 6961, Pages 301-317, 2011, Springer,
Security and Privacy in Computing and Heidelberg ACM.
Communications, 2012. [24] Kyumin Lee, James Caverlee, Steve Webb,
[11] Georgios Kontaxis, Iasonas Polakis, Sotiris Uncovering Social Spammers: Social
Ioannidis and Evangelos P. Markatos, Honeypots + Machine Learning, Proceeding
Detecting Social Network Profile Cloning, 3rd of the 33rd international ACM SIGIR
International Workshop on Security and Social conference on Research and development in
Networking, 2011, IEEE. information retrieval, 2010, Pages 435–442,
ACM, New York (2010).

@ IJTSRD | Unique Paper ID – IJTSRD57439 | Volume – 7 | Issue – 3 | May-June 2023 Page 701
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
[25] Leyla Bilge, Thorsten Strufe, Davide Mustaque Ahamad, H. Venkateswaran, A
Balzarotti, Engin Kirda, All Your Contacts Crow or a Blackbird? Using True Social
Are Belong to Us: Automated Identity Theft Network and Tweeting Behavior to Detect
Attacks on Social Networks, International Malicious Entities in Twitter, 2002, ACM
World Wide Web Conference Committee [35] Y. Boshmaf, I. Muslukhov, K. Beznosov, and
(IW3C2),
M. Ripeanu, “The socialbotnetwork: when
[26] WWW 2009, April 20–24, 2009, Madrid, bots socialize for fame and money,” in
Spain, ACM Proceedings of the 27th Annual Computer
Security Applications Conference.
[27] M. McCord, M. Chuah, Spam Detection on
Twitter Using Traditional Classifiers, ACM,2011, pp. 93–102.
ATC’11, Banff, Canada, Sept 2-4, 2011, [36] Yin Zhuy, Xiao Wang, Erheng Zhong,
IEEE. Nanthan N. Liuy, He Li, Qiang Yang,
Discovering Spammers in Social Networks,
[28] Manuel Egele, Gianluca Stringhini,
Proceedings of the Twenty-Sixth AAAI
Christopher Kruegel, and Giovanni Vigna,
Conference on Artificial Intelligence.
COMPA: Detecting Compromised Accounts
on Social Networks. [37] Zhi Yang, Christo Wilson, Xiao Wang,
Tingting Gao, Ben
[29] Marcel Flores, Aleksandar Kuzmanovic,
Searching for Spam: Detecting Fraudulent [38] Y. Zhao, and Yafei Dai, Uncovering Social
Accounts via Web Search, LNCS 7799, pp. Network Sybils in the Wild, Proceedings of
208–217, 2013. Springer-Verlag Berlin the 11th ACM/USENIX Internet
Heidelberg 2013. Measurement Conference (IMC’11), 2011.
[30] Mauro Conti, Radha Poovendran, Marco [39] Verma, P., Khanday, A. M. U. D., Rabani, S.
Secchiero, FakeBook: Detecting Fake T., Mir, M. H., & Jamwal, S. (2019). Twitter
Profiles in On-line Social Networks, sentiment analysis on Indian government
IEEE/ACM International Conference on project using R. Int J Recent Technol Eng,
Advances in Social Networks Analysis and 8(3), 8338-41.
Mining, 2012.
[40] Verma, P., & Jamwal, S. (2020). Mining
[31] Po-Ching Lin, Po-Min Huang, A Study of public opinion on Indian Government
Effective Features for Detecting Long- policies using R. Int. J. Innov. Technol.
surviving Twitter Spam Accounts, Advanced Explor. Eng.(IJITEE), 9(3).
Communication Technology (ICACT), 15th [41] Thakur, N., Choudhary, A., & Verma, P.
International Conference on 27-30 Jan. 2013, Machine Learning Algorithms-A Systematic
IEEE.
Review.
[32] Sangho Lee and Jong Kimz,
[42] Verma, P., & Mahajan, S. A Systematic
WARNINGBIRD: Detecting Suspicious review of Techniques to Spot Spammers on
URLs in Twitter Stream, 19th Network and
Twitter.
Distributed System Security Symposium
(NDSS), San Diego, California, USA, [43] Thakur, M., & Verma, P. A Review of
February 5-8, 2012. Computer Network Topology and Analysis
Examples.
[33] T. Jagatic, N. Johnson, M. Jakobsson, and F.
Menczer, “Social phishing,” [44] Kumar, A., Guleria, A., & Verma, P. Internet
Communications of the ACM , vol. 50, no. of Things (IoT) and Its Applications: A
10, pp. 94–100, 2007. Survey Paper.
[34] Vijay A. Balasubramaniyan, Arjun
Maheswaran, Viswanathan Mahalingam,

@ IJTSRD | Unique Paper ID – IJTSRD57439 | Volume – 7 | Issue – 3 | May-June 2023 Page 702

You might also like