0% found this document useful (0 votes)
15 views11 pages

Spam Review Detection Using Linguistic Methods For Specified User in Twitter

Contact us for project abstract, enquiry, explanation, code, execution, documentation. Phone/Whatsap : 9573388833 Email : [email protected] Website : https://dcs.datapro.in/contact-us-2 Tags: btech, mtech, final year project, datapro, machine learning, cyber security, cloud computing, blockchain,

Uploaded by

dataprodcs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
0% found this document useful (0 votes)
15 views11 pages

Spam Review Detection Using Linguistic Methods For Specified User in Twitter

Contact us for project abstract, enquiry, explanation, code, execution, documentation. Phone/Whatsap : 9573388833 Email : [email protected] Website : https://dcs.datapro.in/contact-us-2 Tags: btech, mtech, final year project, datapro, machine learning, cyber security, cloud computing, blockchain,

Uploaded by

dataprodcs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
You are on page 1/ 11

TABLE OF CONTENTS

Chapter No. Title Page No.

1. Abstract 6

2. Introduction 7-8

3. Literature Survey 9-11

4. Proposed Systems 12

5. Architecture 13

6. Use Case diagram 14

7. Algorithm 15

8. Modules 16-18

9. Implementation 18

10. Result 19-21

11. Conclusion 22

12. References 23

LIST OF FIGURES

FIGURE NO FIGURE NAME PAGE NO.

5.1 SYSTEM ARCHITECTURE 13


6.1 USE CASE 14
10.1 ABSTRACT 19
10.2 TESTING 20
10.3 DATA COLLECTION 20
10.4 USER CHECK 21
10.5 DETECTION 21

5
CHAPTER 1

ABSTRACT

 Online review systems play an important role in affecting consumers' behaviors and
decision making, attracting many spammers to insert fake reviews to manipulate review
content and ratings. To increase utility and improve user experience, some online review
systems allow users to form social relationships between each other and encourage their
interactions. Here, we aim at providing an efficient and effective method to identify
review spammers by incorporating social relations based on two assumptions that people
are more likely to consider reviews from those connected with them as trustworthy.
 Review spammers are less likely to maintain a large relationship network with normal
users. The contributionsare two-fold:
 We elaborate how social relationships can be incorporated into review rating prediction
and propose a trust-based rating prediction model using proximity as trust weight.
 We design a trust-aware detection model based on rating variance which iteratively
calculates user-specific overall trustworthiness scores as the indicator.
 Experiments on the dataset collected from Yelp.com show that the proposed trust-based
prediction achieves a higher accuracy than standard CF method, and there exists a strong
correlation between social relationships and the overall trustworthiness scores.

Keywords:

Machine Learning; Spam Detection; Scalability; Twitter

6
CHAPTER 2

INTRODUCTION

What Is A Social Network?


Wikipedia defines a social network service as a service which “focuses on the building and
verifying of online social networks for communities of people who share interests and activities,
or who are interested in exploring the interests and activities of others, and which necessitates
the use of software.”

A report published by OCLC provides the following definition of social networking sites: “Web
sites primarily designed to facilitate interaction between users who share interests, attitudes
and activities, such as Facebook, Mixi and MySpace.”

What Can Social Networks Be Used For?

Social networks can provide a range of benefits to members of an organization:

Support for learning: Social networks can enhance informal learning and support social
connections within groups of learners and with those involved in the support of learning.

Support for members of an organization: Social networks can potentially be used my all
members of an organization, and not just those involved in working with students. Social
networks can help the development of communities of practice.

7
Engaging with others: Passive use of social networks can provide valuable business
intelligence and feedback on institutional services (although this may give rise to ethical
concerns).

Ease of access to information and applications: The ease of use of many social networking
services can provide benefits to users by simplifying access to other tools and
applications.The Facebook Platform provides an example of how a social networking service
can be used as an environment for other tools.

Common interface: A possible benefit of social networks may be the common interface which
spans work / social boundaries. Since such services are often used in a personal capacity the
interface and the way the service works may be familiar, thus minimizing training and support
needed to exploit the services in a professional context. This can, however, also be a barrier to
those who wish to have strict boundaries between work and social activities.

Opportunities and Challenges


The popularity and ease of use of social networking services have excited institutions with their
potential in a variety of areas. However effective use of social networking services poses a
number of challenges for institutions including long-term sustainability of the services; user
concerns over use of social tools in a work or study context; a variety of technical issues and
legal issues such as copyright, privacy, accessibility; etc.

Institutions would be advised to consider carefully the implications before promoting significant
use of such services.

Scope of the project:

Social media can also be used for spreading fake news and malicious links (like Ransomware
and malware). So, we need to be careful while using social media. Since social media is widely
used now-a-days, these spams are increasing rapidly. In this project we are mainly focusing on
spam detection in social media (specifically in Twitter) using machine learning techniques.

8
CHAPTER 3

LITERATURE SURVEY

1) Statistical features-based real-time detection of drifted Twitter spam

Authors: C. Chen, Y. Wang, J. Zhang, Y. Xiang, W. Zhou, and G. Min

Twitter spam has become a critical problem nowadays. Recent works focus on applying
machine learning techniques for Twitter spam detection, which make use of the statistical
features of tweets. In our labeled tweets data set, however, we observe that the statistical
properties of spam tweets vary over time, and thus, the performance of existing machine
learning-based classifiers decreases. This issue is referred to as “Twitter Spam Drift”. In order to
tackle this problem, we first carry out a deep analysis on the statistical features of one million
spam tweets and one million non-spam tweets, and then propose a novel Lfun scheme. The
proposed scheme can discover “changed” spam tweets from unlabeled tweets and incorporate
them into classifier's training process. A number of experiments are performed to evaluate the
proposed scheme. The results show that our proposed Lfun scheme can significantly improve
the spam detection accuracy in real-world scenarios.

2) Automatically identifying fake news in popular Twitter threads

Authors: C. Buntain and J. Golbeck

Information quality in social media is an increasingly important issue, but web-scale data
hinders experts' ability to assess and correct much of the inaccurate content, or "fake news,"
present in these platforms. This project develops a method for automating fake news detection
on Twitter by learning to predict accuracy assessments in two credibility-focused Twitter
datasets: CREDBANK, a crowd sourced dataset of accuracy assessments for events in Twitter,

9
and PHEME, a dataset of potential rumors in Twitter and journalistic assessments of their
accuracies. We apply this method to Twitter content sourced from Buzz Feed's fake news
dataset and show models trained against crowd sourced workers outperform models based on
journalists' assessment and models trained on a pooled dataset of both crowd sourced workers
and journalists. All three datasets, aligned into a uniform format, are also publicly available. A
feature analysis then identifies features that are most predictive for crowd sourced and
journalistic accuracy assessments, results of which are consistent with prior work. We close
with a discussion contrasting accuracy and credibility and why models of non-experts
outperform models of journalists for fake news detection in Twitter.

3) A model-based approach for identifying spammers in social networks

Authors: F. Fathaliani and M. Bouguessa

We view the task of identifying spammers in social networks from a mixture modeling
perspective, based on which we devise a principled unsupervised approach to detect
spammers. In our approach, we first represent each user of the social network with a feature
vector that reflects its behavior and interactions with other participants. Next, based on the
estimated users feature vectors, we propose a statistical framework that uses the Dirichlet
distribution in order to identify spammers. The proposed approach is able to automatically
discriminate between spammers and legitimate users, while existing unsupervised approaches
require human intervention in order to set informal threshold parameters to detect spammers.
Furthermore, our approach is general in the sense that it can be applied to different online
social sites. To demonstrate the suitability of the proposed method, we conducted experiments
on real data extracted from Instagram and Twitter.

10
Problem Statement

For real-time spam detection, we further extracted 12 lightweight features for tweet
representation. Spam detection was then transformed to a binary classification problem in the
feature space and can be solved by conventional machine learning techniques and evaluated
the impact of different factors to the spam detection performance, which included spam to no
spam ratio, feature discretization, training data size, data sampling, time-related data, and
machine learning techniques. From the results it come to know that spam tweet detection is
still a big challenge and a robust detection technique should take into account the three aspects
of data, feature, and model.

11
CHAPTER 4
PROPOSED SYSTEM:
 The aim of the project is to identify fake user detection on Twitter and to present a
framework by classifying these approaches into several categories. For classification, we
have identified four means of reporting spammers that can be helpful in identifying fake
identities of users. Spammers can be identified based on:
1) Fake content
2) URL based spam detection
3) Detecting spam in trending topics
4) Fake user identification.
 Moreover, the analysis also shows that machine learning-based techniques can be
effective for identifying fake user on Twitter. However, the selection of the most feasible
techniques and methods is highly dependent on the available data.

ADVANTAGES OF PROPOSED SYSTEM

 This study includes machine learning methodology proposed using real time datasets
and with different characteristics and accomplishments.
 The proposed system is more effective and accurate than other existing systems.
 Tested with real time data.

12
CHAPTER 5

Architecture

5.1 SYSTEM ARCHITECTURE

13
CHAPTER 6

Use Case Diagram

6.1 USE CASE DIAGRAM

14
CHAPTER 7

Algorithm

Decision Trees for Classification:

Decision Trees are a type of Supervised Machine Learning technique (that is you explain what
the input is and what the corresponding output is in the training data) where the data is
continuously split according to a certain parameter. The tree can be explained by two entities,
namely decision nodes and leaves. The leaves are the decisions or the final outcomes. And the
decision nodes are where the data is split.

ID3 Algorithm

• The ID3 algorithm begins with the original set as the root node.
• On each iteration of the algorithm, it iterates through every unused attribute of the set
and calculates the entropy or the information gain of that attribute.
• It then selects the attribute which has the smallest entropy (or largest information gain)
value. The set is then split or partitioned by the selected attribute to produce subsets of
the data.
• The algorithm continues to recurs on each subset, considering only attributes never
selected before.

15

You might also like