0% found this document useful (0 votes)
52 views60 pages

Sample Copy of Major Project Report

fake profile dedaction project
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views60 pages

Sample Copy of Major Project Report

fake profile dedaction project
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 60

A

Project Report
On
FAKE PROFILE IDENTIFICATION IN SOCIAL
NETWORKS USING MACHINE LEARNING
AND NLP
Submitted in partial fulfillment of the requirements for the award of Degree
BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING (AI&ML)
by
P.SHRESHTA REDDY (207R1A66G5)
K.NIKHIL (207R1A66E6)
AYUSH SHRIVASTHAV (207R1A66C6)

Under the Guidance of


Dr. S RAO CHINTALAPUDI
Professor and HOD CSE(AI&ML)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


(AI&ML)
CMR TECHNICAL
CAMPUS UGC
AUTONOMOUS
(Accredited by NAAC, NBA, Permanently Affiliated to JNTUH, Approved by AICTE, New
Delhi) Recognized Under Section 2(f) & 12(B) of the UGCAct.1956, Kandlakoya (V),
Medchal Road, Hyderabad-501401.
2020-2024
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING (AI&ML)

CERTIFICATE

This is to certify that the project entitled “FAKE PROFILE IDENTIFICATION IN


SOCIAL NETWORKS USING MACHINE LEARNING AND NLP” being submitted by
P.SHRESHTA REDDY (207R1A66G5), K.NIKHIL (207R1A66E6) & AYUSH
SHRIVASTAV (207R1A66C6) in partial fulfillment of the requirements for the award of the
degree of B.Tech in Computer Science and Engineering (AI&ML) to the Jawaharlal Nehru
Technological University Hyderabad, is a record of bonafide work carried out by them under
our guidance and supervision during the year 2023-24.

The results embodied in this thesis have not been submitted to any other University or
Institute for the award of any degree or diploma.

Dr. S Rao Chintalapudi Dr. S Rao Chintalapudi


Professor and HOD CSE(AI&ML) HOD CSE(AI&ML)
INTERNAL GUIDE

EXTERNAL EXAMINER

Submitted for viva voice Examination held on


ACKNOWLEDGEMENT

Apart from the efforts of us, the success of any project depends largely on the
encouragement and guidelines of many others. We take this opportunity to express our
gratitude to the people who have been instrumental in the successful completion of this
project.
We take this opportunity to express my profound gratitude and deep regard to
my guide Dr.S Rao Chintalapudi, Associate Professor and HOD CSE(AI&ML) for his
exemplary guidance, monitoring and constant encouragement throughout the project work.
The blessing, help and guidance given by him shall carry us a long way in the journey of
life on which we are about to embark.
We also take this opportunity to express a deep sense of gratitude to the
Project Review Committee (PRC) Dr. G. Vinoda Reddy, Dr. K. Mahesh, N. Sateesh & B.
Mamatha for their cordial support, valuable information and guidance, which helped us in
completing this task through various stages.
We are also thankful to Dr. S Rao Chintalapudi, Head, Department of Computer
Science and Engineering (AI&ML) for providing encouragement and support for completing
this project successfully.
We are obliged to Dr. A. Raji Reddy, Director for being cooperative throughout
the course of this project. We also express our sincere gratitude to Sri. Ch. Gopal Reddy,
Chairman for providing excellent infrastructure and a nice atmosphere throughout the course
of this project.
The guidance and support received from all the members of CMR Technical
Campus who contributed to the completion of the project. We are grateful for their constant
support and help.
Finally, we would like to take this opportunity to thank our family for their
constant encouragement, without which this assignment would not be completed. We
sincerely acknowledge and thank all those who gave support directly and indirectly in the
completion of this project.

P.SHRESHTA REDDY (207R1A66G5)


K.NIKHIL (207R1A66E6)
AYUSH SHRIVASTAV (207R1A66C6)
ABSTRACT

At present social network sites are part of the life for most of the people. Every day
several people are creating their profiles on the social network platforms and they are interacting
with others independent of the user’s location and time. The social network sites not only
providing advantages to the users and also provide security issues to the users as well their
information. To analyze, who are encouraging threats in social network we need to classify the
social networks profiles of the users. From the classification, we can get the genuine profiles and
fake profiles on the social networks. Traditionally, we have different classification methods for
detecting the fake profiles on the social networks. But, we need to improve the accuracy rate of
the fake profile detection in the social networks.

In this project we are proposing Machine learning and Natural language Processing (NLP)
techniques to improve the accuracy rate of the fake profiles detection. We can use the Support
Vector Machine (SVM) and Naïve Bayes algorithm. Consequently, the convergence of cutting-
edge Machine Learning and Natural Language Processing methodologies stands as a beacon of
hope in this realm. By harnessing the prowess of Support Vector Machine (SVM) and Naïve
Bayes algorithms, this project embarks on a journey towards not just detecting fake profiles, but
elevating the very foundations of online security. As we navigate the digital age, this initiative
embodies a proactive stride towards ensuring authenticity, bolstering user trust, and fostering a
secure social network environment for all.
LIST OF FIGURES

FIGURE NO FIGURE NAME PAGE NO

3.1 Project Architecture for Fake Profile 12


Identification in Social Network Using
Machine Learning and NLP

3.2 Use Case Diagram for Fake Profile 15


identification in Social Networks
Using Machine Learning and NLP

3.3 Class Diagram for Fake Profile 17


identification in Social Networks
Using Machine Learning and NLP

3.4 Sequence diagram for Fake Profile 19


identification in Social Networks
Using Machine Learning and NLP

27
4.3 Confusion Matrix

34
4.5 Result Analysis
5.1 Main Home Page 37

5.2 Service Provider Login Page 37

5.3 User Register Page 38

5.4 Predict Profile Identification Status 38


Page

5.5 Profile Status prediction Type Ratio 39


Page

5.6 Profile Datasets Trained and Tested Results 40


LIST OF TABLES

TABLE NO TABLE NAME PAGE NO

6.3 TESTCASES 44
TABLE OF CONTENTS

ABSTRACT i
LIST OF FIGURES ii
LIST OF TABLES iii
1. INTRODUCTION 1
1.1 PROJECT SCOPE 1
1.2 PROJECT PURPOSE 2
1.3 PROJECT FEATURES 2
2. SYSTEM ANALYSIS 4
2.1 PROBLEM DEFINITION 4
2.2 EXISTING SYSTEM / LITERATURE REVIEW 4
2.2.1 EXISTING SYSTEM 6
2.2.2 LIMITATIONS OF EXISTING SYSTEMS 7
2.3 PROPOSED SYSTEM 7
2.3.1 PROPOSED APPROACH 8
2.3.2 ADVANTAGES OF PROPOSED SYSTEM 9
2.4 HARDWARE & SOFTWARE REQUIREMENTS 9
2.4.1 HARDWARE REQUIREMENTS 9
2.4.2 SOFTWARE REQUIREMENTS 10
3. ARCHITECTURE 11
3.1 PROJECT ARCHITECTURE 12
3.2 USE CASE DIAGRAM 15
3.3 CLASS DIAGRAM 17
3.4 SEQUENCE DIAGRAM 19
4. IMPLEMENTATION 22
4.1 SUPPORT VECTOR MACHINE 23
4.2 NAÏVE BAYES 24
4.3 DATASET DESCRIPTION 25
4.4 PERFORMANCE METRICS 26
4.5 SAMPLE CODE 28
4.6 RESULT ANALYSIS 34
5. SCREENSHOTS 36
6. TESTING 41

6.1 INTRODUCTION TO TESTING 42


6.2 TYPES OF TESTING 42

6.2.1 UNIT TESTING 42


6.2.2 INTEGRATION TESTING 42
6.2.3 FUNCTIONAL TESTING 43
6.3 TEST CASES 44
7.CONCLUSION & FUTURE SCOPE 45
7.1 CONCLUSION 46
7.2 FUTURE SCOPE 46
BIBLIOGRAPHY 47

REFERENCES 48
GITHUB LINK 48
1. INTRODUCTION
Fake Profile Identification in Social Networks Using Machine Learning and NLP

1. INTRODUCTION

1.1 PROJECT SCOPE


The overarching goal of this project is to tackle the pervasive challenges that
persist within the domain of social networking, with a specific emphasis on bolstering security
measures and safeguarding user privacy across diverse Online Social Networks (OSNs). In an
era where connectivity and interaction define the fabric of our digital society, it has become
imperative to address the vulnerabilities that users face on various OSN platforms. The project
seeks to provide robust solutions that transcend the limitations of existing security protocols,
thereby ensuring a more resilient and trustworthy online social environment.
The scope of this initiative extends across a wide spectrum of OSN services,
encompassing platforms that are interaction-based as well as those predominantly focused on
information dissemination. By adopting a comprehensive approach, the project aims to cater to
the nuanced challenges posed by different types of OSNs, recognizing that each category
demands tailored security strategies. This includes addressing issues such as identity theft, which
remains a persistent threat in the digital landscape, as well as countering impersonation attacks
that undermine the authenticity of user interactions. By navigating the complexities of these
security concerns, the project seeks to fortify the foundations of online social interactions,
fostering an environment where users can engage with confidence and trust.
The primary objective of this project endeavor is to develop and implement
comprehensive strategies that effectively mitigate the multifaceted risks inherent in online social
environments. Through a combination of advanced encryption techniques, robust authentication
mechanisms, and proactive monitoring systems, the project aims to establish a new paradigm for
security in the realm of OSNs. By proactively addressing identity-related risks and other security
challenges, the project endeavors to contribute to the evolution of online social networking into a
more secure, resilient, and user-centric ecosystem. Ultimately, the envisioned outcome is an
online environment where individuals can freely connect, share, and interact, knowing that their
identities and privacy are safeguarded against a backdrop of evolving digital threats.

1
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

1.2 PROJECT PURPOSE


The purpose of this project is establishing robust security protocols and privacy
measures tailored to the intricate landscape of Online Social Networks (OSNs). Recognizing the
diverse array of platforms within this digital ecosystem, the project endeavors to devise
comprehensive strategies that address the vulnerabilities inherent in the exposure of personal
information. The advent of online interactions has brought about new challenges, particularly
with the escalating threat of identity theft. By delving into cutting-edge encryption techniques,
secure authentication mechanisms, and proactive monitoring systems, the project seeks to fortify
the defenses of OSNs against potential breaches, ensuring that user data remains confidential and
shielded from malicious actors.
Beyond the realm of data security, this initiative also takes aim at the social
challenges that mar the user experience on OSNs. The project acknowledges the prevalence of
issues such as online bullying, misuse, and trolling, often exacerbated by the proliferation of
false profiles. Through a multifaceted approach, incorporating user education, content
moderation, and stringent policy enforcement, the project aims to create an online social
environment that not only protects user information but also fosters a positive and secure user
experience. By mitigating the impact of identity theft and countering the negative repercussions
of online misconduct, the overarching goal is to redefine the digital landscape, establishing
OSNs as spaces where individuals can engage authentically, express themselves freely, and
connect with others without compromising their privacy or well-being.

1.3 PROJECT FEATURES


The project unfolds as a comprehensive endeavor, intricately designed to
confront the multifaceted security and privacy challenges pervasive in the realm of Online
Social Networks (OSNs). At its core, the initiative prioritizes the mitigation of identity theft
risks through the implementation of advanced strategies geared towards enhancing user
authentication and verification processes. By fortifying these critical aspects of user
identification, the project endeavors to create a more robust barrier against unauthorized access
and malicious activities that exploit personal information. Through meticulous attention to
authentication protocols, the project aims to instill greater confidence in users, assuring them of
the protection of their digital identities within the dynamic landscape of OSNs.

2
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

Simultaneously, the project places a significant emphasis on fortifying


privacy settings within OSNs, recognizing the paramount importance of providing users with
enhanced control over the visibility of their personal information. The development of
sophisticated privacy settings seeks to empower users to tailor their online presence according to
their preferences, mitigating the risk of unintended information exposure. Beyond static privacy
measures, the project explores dynamic profile analysis, delving into user behavior and network
locality. This nuanced approach enables a more personalized and adaptive response to profile
information management, further contributing to the overarching goal of fostering a secure
online social environment.
A pivotal aspect of the project involves the detection and mitigation of false
profiles, addressing issues of social engineering, online impersonation, and other forms of
malicious activities that erode the trust and safety of OSNs. By implementing cutting-edge
techniques and algorithms, the project seeks to reduce the prevalence of false profiles, thereby
minimizing the potential harm they may cause. Moreover, the initiative recognizes the
importance of user education and awareness in maintaining a secure online presence. With this
in mind, the project aims to develop comprehensive user education initiatives that raise
awareness about potential security risks and promote best practices. Through this holistic
framework, the project aspires to shape a safer and more secure online social experience,
cultivating trust and resilience across diverse platforms.

3
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

2.SYSTEM ANALYSIS

4
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

2. SYSTEM ANALYSIS

SYSTEM ANALYSIS

System Analysis is the important phase in the system development process. The
System is studied to the minute details and analyzed. The system analyst plays an important role
of an interrogator and dwells deep into the working of the present system. In analysis, a detailed
study of these operations performed by the system and their relationships within and outside the
system is done. A key question considered here is, “what must be done to solve the problem?”
The system is viewed as a whole and the inputs to the system are identified. Once analysis is
completed the analyst has a firm understanding of what is to be done.

2.1 PROBLEM DEFINITION

The general statement for Fake Profile Identification is it employs advanced machine
learning algorithms and Natural Language Processing (NLP) to automatically discern genuine
from fraudulent accounts in Online Social Networks (OSNs). This system analyzes user profiles
and activities, identifying patterns and anomalies that may indicate the presence of false profiles.
By scrutinizing user behaviors, posting patterns, and engagement metrics, along with linguistic
aspects of user-generated content, the system aims to proactively mitigate risks associated with
social engineering and impersonation. This innovative approach contributes to a safer online
environment, upholding the integrity of digital interactions and enhancing user trust in OSNs

2.2 EXISTING SYSTEM

Existing systems grapple with the intricate task of striking a balance between user
preferences and efficient dialog management in natural language dialog systems. This challenge
is compounded in LinkedIn profile identification by privacy constraints, which curtail the
effectiveness of conventional methods. The limitations imposed by privacy considerations hinder
the seamless integration of user preferences and pose obstacles to optimal dialog management
strategies within these systems.

5
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

Moreover, the hurdles extend to spatio-temporal mining for malicious events,


where the selection of an optimal feature set becomes a critical issue. The risk of the feature set
being either too small or too large directly impacts the accuracy of detection. This challenge is
exacerbated by the dependency on user-selected features, introducing complexity in ensuring a
comprehensive set for identifying malicious content. The need for an extensive feature set
clashes with the inherent difficulty of determining the ideal combination, leading to potential
gaps in the system's ability to effectively detect and address malicious events.
Adding to the complexities, the k-Nearest Neighbors (KNN) algorithm, which is
commonly employed in existing systems, proves to be a disadvantage. Its reliance on distance
metrics and the entire dataset for each prediction can result in computational inefficiency and
reduced scalability, especially when dealing with large datasets. This drawback highlights the
need for more efficient algorithms in these systems to enhance their performance and overcome
the challenges associated with user preferences, privacy constraints, and the intricacies of spatio-
temporal mining for malicious events.

2.2.1 EXISTING SYSTEM

K- NEAREST NEIGHBORS (KNN)

In previous systems for fake profile identification, the k-Nearest Neighbors (KNN)
algorithm has been employed by first representing profiles as vectors in a high-dimensional
feature space. Each dimension corresponds to a characteristic of the profile, such as posting
frequency, friend count, or profile completeness. During classification, KNN identifies the k
nearest neighbors of a given profile based on a distance metric (e.g., Euclidean distance) and
assigns a label to the profile based on the majority class among its neighbors. This approach
benefits from simplicity and interpretability, as it directly compares new profiles to existing ones
for classification, without assuming underlying distributions.

However, despite its simplicity, KNN may not be the most effective approach for
fake profile identification due to several reasons. Firstly, KNN tends to be computationally
expensive, especially as the dataset size increases, since it requires calculating distances to all
data points for each prediction. Additionally, KNN is sensitive to noise and irrelevant features in
the dataset, which can lead to inaccurate classifications. Furthermore, KNN's performance
heavily relies on the choice of the number of neighbors (k) and the distance metric, which may
not be optimal for all datasets. Lastly, KNN may struggle with high-dimensional data, as the
6
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
curse of dimensionality

can cause the feature space to become sparse, impacting the effectiveness of distance-based
computations. As a result, while KNN offers simplicity and interpretability, it may not always
provide the desired level of accuracy and scalability required for robust fake profile
identification systems.

2.2.3 LIMITATIONS OF EXISTING SYSTEM

Following are the disadvantages of existing system:

 Absence of learning algorithms like SVM and Naive Bayes.


 Failure to address social networking issues such as privacy concerns, online bullying,
misuse, and trolling.

2.3 PROPOSED SYSTEM

Our proposed system represents a pioneering approach to the identification of


false profiles within online social networks, harnessing the synergy of machine learning and
natural language processing. At its core, the system introduces the Support Vector Machine
(SVM) classifier, a powerful tool that maximizes the separation between different data classes.
By mapping profiles in a multidimensional space, SVM enhances the precision of false profile
identification, contributing to a more nuanced and accurate delineation between genuine and
fraudulent accounts. This classifier plays a pivotal role in elevating the overall reliability of the
detection process, ensuring a higher level of confidence in distinguishing between authentic and
deceptive user profiles.

Complementing the SVM classifier, our system integrates the Naive Bayes
algorithm to further enhance detection accuracy. Despite its assumption of feature independence,
Naive Bayes proves to be exceptionally effective in evaluating the likelihood of a profile being
false based on various features. This algorithm's ability to consider the joint probabilities of
multiple features adds a layer of sophistication to the analysis, providing a comprehensive
understanding of profile authenticity. Notably, our proposed system achieves an impressive
accuracy of 82 percent, surpassing the performance of existing systems. This accomplishment
underscores the effectiveness of the Support Vector Machine classifier and Naive Bayes
algorithm in tandem, forming a robust framework that not only elevates the accuracy of false

7
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
profile detection but also enhances the overall security of online social networks.

2.3.1 PROPOSED APPROACH

A proposed approach for fake profile identification using Support Vector Machines
(SVM) and Naive Bayes involves several steps. Firstly, the data preprocessing phase includes
collecting a dataset containing profiles' attributes such as posting frequency, friend count, profile
completeness, and textual information like profile descriptions and status updates. Then, the
dataset is cleaned by removing duplicates, irrelevant features, and handling missing values.
Textual data undergoes tokenization, stop-word removal, and possibly stemming or
lemmatization.

Following preprocessing, feature extraction is conducted to convert the textual and non-textual
attributes into a suitable format for classification. This may involve techniques like TF-IDF
(Term Frequency-Inverse Document Frequency) for text data and normalization or scaling for
numerical features. Once the features are extracted, they are used to train both SVM and Naive
Bayes classifiers separately.

In the training phase, the SVM classifier learns to separate genuine and fake profiles by finding
the hyperplane that maximizes the margin between the two classes in the feature space. On the
other hand, Naive Bayes calculates the probabilities of each feature given the class labels and
uses Bayes' theorem to compute the posterior probability of a profile being genuine or fake.

After training, the performance of both classifiers is evaluated using metrics such as accuracy,
precision, recall, and F1 score on a held-out validation set or through cross-validation. This
evaluation helps in selecting the best-performing model for deployment. It's worth noting that
SVM and Naive Bayes have different assumptions and properties, so comparing their
performance provides insights into which algorithm is more suitable for the specific
characteristics of the dataset.

Once the best-performing model is chosen, it can be deployed for real-time classification of new
profiles. The deployed model takes the profile attributes as input and predicts whether the profile
is genuine or fake. This classification can be used to flag suspicious profiles for further

8
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
investigation or to automate actions such as suspending fake accounts.

Finally, the deployed model should be monitored regularly to ensure its performance remains
satisfactory over time. This may involve retraining the model periodically with updated data and
fine-tuning its hyper parameters to adapt to evolving patterns of fake profile creation and
detection. Additionally, continuous evaluation of the model's performance and incorporating
feedback from users or domain experts can help in further improving the effectiveness of the
fake profile identification system.

2.3.2 ADVANTAGES OF THE PROPOSED SYSTEM

The proposed system implemented using the machine learning techniques, the proposed system
is processing in the following way.
 Improved accuracy in fake profile identification
 Comprehensive analysis considering various factors and behaviors
 Adaptive learning capability for staying effective over time
 Increased efficiency through automation
 Minimization of false positives
 Effective handling of diverse textual data with NLP

2.4 HARDWARE & SOFTWARE REQUIREMENTS

2.4.1 HARDWARE REQUIREMENTS:

Hardware interfaces specify the logical characteristics of each interface


between the software product and the hardware components of the system. The
following are some hardware requirements.

 PROCESSOR : i5 or above
 RAM : 4GB (min)
 HARD DISK : 20 GB
 KEYBOARD : Standard Windows Keyboard
 MOUSE : Two or Three Button Mouse
 MONITOR : SVGA

9
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

2.4.2 SOFTWARE REQUIREMENTS:

Software Requirements specifies the logical characteristics of each interface


and software components of the system. The following are some software
requirements

 OPERATING SYSTEM : Windows 10


 CODE LANGUAGE : Python
 LIBRARIES : Scikit-Learn, TensorFlow
pandas, SpaCy and Numpy
 FRONT-END : Python

 BACK-END : Django-ORM
 DESIGNING : HTML, CSS, JavaScript

 DATABASE : MySQL

 Web Server : WAMP Server

10
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

3. ARCHITECTURE

11
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

3.ARCHITECTURE

3.1 PROJECT ARCHITECTURE

This project architecture shows the procedure followed for classification,


starting from input to final prediction

3.1: Project Architecture of Fake Profile Identification in Social Networks


Using Machine Learning and NLP

12
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

DESCRIPTION
Our methodological journey unfolds with a meticulous extraction process, where
we acquire a diverse and comprehensive dataset from various online social networks. This
dataset lays the groundwork for an in-depth analysis focused on unraveling the intricacies of
false profile identification. As we traverse the realms of user profiles across platforms, our
emphasis intensifies on capturing a rich and varied set of data, ensuring a representative
sample that mirrors the diversity inherent in online social interactions.

Following the dataset extraction, we delve into a detailed preprocessing phase


designed to refine and optimize the collected data for subsequent analysis. At this stage,
advanced Natural Language Processing (NLP) techniques take center stage. Tokenization
dissects textual content into individual units, stop-word removal eliminates common and non-
informative words, stemming reduces words to their root form, and lemmatization further
normalizes linguistic variations. This comprehensive preprocessing aims to create a
standardized and refined dataset, enabling a nuanced exploration of user profiles.

Building upon the preprocessed dataset, our methodology seamlessly integrates


sophisticated dimensionality reduction techniques. The primary objective is to alleviate the
computational complexities often associated with large datasets. Concurrently, feature
extraction methods are employed to distill pertinent information from the dataset, optimizing
it for subsequent algorithmic analysis. This strategic curation of features ensures that the
dataset retains its essential characteristics while becoming more manageable, thereby
facilitating a more efficient computational analysis.

At the core of our false profile identification system lies the deployment of the
Support Vector Machine (SVM), a robust and versatile classification algorithm. SVM excels
in distinguishing between different classes based on the features extracted during the
preprocessing stage. By maximizing the separation between genuine and fake profiles in a
high-dimensional space, SVM contributes to a precise and reliable classification process,
ultimately enhancing the overall accuracy of our system.

13
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

Executing the SVM algorithm seamlessly transitions us into the evaluation phase,
a critical step in assessing the system's performance. Key parameters, particularly the True
Positive Rate (TPR) and False Positive Rate (FPR), assume pivotal roles in gauging the
accuracy of the classification. This nuanced evaluation mechanism ensures a reliable
distinction between authentic and deceptive profiles. A profile is deemed legitimate if the
TPR exceeds the FPR; otherwise, it is classified as a fake profile. This meticulous approach
enhances the overall efficacy of our proposed system, reinforcing its capability to identify and
mitigate the presence of false profiles within the dynamic landscape of online social networks.

14
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

3.2 USE CASE DIAGRAM

In the use case diagram, we have basically one actor who is the user in the
trained model.
A use case diagram is a graphical depiction of a user's possible interactions
with a system. A use case diagram shows various use cases and different types of users
the system has. The use cases are represented by either circles or ellipses. The actors
are often shown as stick figures.

3.2: Use Case Diagram for Fake Profile Identification in Social Networks Using
Machine Learning and NLP

15
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

DESCRIPTION

The use case diagram encapsulates the functionalities available to the service
provider and remote user entities. For the service provider, a set of methods is defined to
facilitate access to train and test user profile datasets, view accuracy results in a bar chart,
examine detailed accuracy outcomes, register and log in to their account, predict profile
identification statuses, view their own profile, explore all predicted profile identification results,
find and view specific prediction ratio results, inspect their own profile identity ratio results, and
download predicted datasets. These methods collectively empower the service provider to
interact comprehensively with the system, from data access and analysis to user management.

On the other hand, the remote user class is equipped with methods for registering
and logging into their account, viewing their own profile, predicting profile identification
statuses, and obtaining an overview of all registered remote users. The simplicity of the remote
user's functionality aligns with the limited scope of operations expected from this entity.

Within the use case diagram, a clear relationship is established between the service
provider and the remote user. This association allows the service provider to view all registered
remote users, fostering a collaborative environment within the system. This structural depiction
offers a concise overview of the system's architecture, highlighting the essential functionalities
available to both service providers and remote users, and the relationships that facilitate a
seamless interaction between these entities.

16
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

3.3 CLASS DIAGRAM

Class diagram is a type of static structure diagram that describes the structure
of a system by showing the system’s classes, their attributes, operations(or methods),
and the relationships among objects.

3.3: Class Diagram for Fake Profile Identification in Social Networks Using
Machine Learning and NLP

17
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

DESCRIPTION
In the envisioned class diagram for fake profile identification, four distinct classes play
integral roles in orchestrating the system's functionality: Service Provider, Remote User, Register, and
Login. The Service Provider class embodies the core entity responsible for overseeing and managing the
various aspects of fake profile identification. Its methods include accessing and manipulating user profile
datasets, viewing accuracy results through bar charts, predicting profile identification statuses, managing
user accounts through registration and login processes, and analyzing detailed accuracy outcomes. This
class serves as the orchestrator, overseeing the intricate processes involved in the identification of
fraudulent profiles within the online social network.

Complementing the Service Provider, the Remote User class represents the end-user entity
interacting with the system. While its functionalities are more streamlined, focusing on user-specific
actions such as registration, login, profile viewing, and predicting profile identification statuses, the
Remote User class is essential for the system's user-centric interactions. It acts as the consumer of the
fake profile identification services, benefiting from the predictions made by the system to assess the
legitimacy of profiles encountered within the online social network.

The Register and Login classes in the diagram cater specifically to the user authentication
and account management processes. These classes facilitate the creation of new user accounts through
registration and the secure login of users into the system. By segregating these functionalities into
dedicated classes, the system achieves a modular and organized structure, enhancing maintainability and
scalability. Overall, this class diagram encapsulates the intricate relationships and responsibilities of the
Service Provider, Remote User, Register, and Login classes, providing a comprehensive framework for
effective fake profile identification within online social networks

18
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

3.4 SEQUENCE DIAGRAM

A sequence diagram shows object interactions arranged in time sequence. It depicts the
objects involved in the scenario and the sequence of messages exchanged between the objects needed to
carry out the functionality of the scenario. Sequence diagrams are typically associated with use case
realizations in the logical view of the system under development.

SEQUENCE DIAGRAM FOR BULDING FAKE PROFILE IDENTIFICATION MODEL

Figure 3.4.1: Sequence Diagram for building fake profile identification model Using
Machine Learning and NLP

19
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

SEQUENCE DIAGRAM FOR IDENTIFYING FAKE PROFILE USING MACHINE


LEARNING

Figure 3.4.2: Sequence Diagram for identifying fake profile Using Machine
Learning and NLP

20
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

DESCRIPTION

The sequence diagram delineates the interaction and flow of messages between
entities in the system, specifically focusing on the service provider, web server for building fake
profile identification, remote user, and the web server for identifying fake profiles. Let's
elaborate on this sequence in two paragraphs.

The sequence initiates with the Service Provider sending a request to the Web Server
for Building Fake Profile Identification, signalling the commencement of the fake profile
identification process. The web server, upon receiving the request, orchestrates the retrieval of
necessary datasets and begins the pre-processing of the data. This entails tokenization, stop-word
removal, stemming, and lemmatization, all of which are pivotal for refining the dataset for
subsequent analysis. Following this pre-processing step, the web server proceeds to apply
dimensionality reduction techniques and feature extraction methods to optimize the dataset,
making it conducive for analysis. Once the data is prepared, the web server employs a Support
Vector Machine (SVM) algorithm to train the system on the authentic profiles, enhancing its
ability to distinguish between genuine and fake profiles. This training process is encapsulated in
a sequence of messages exchanged between the Service Provider and the Web Server for
Building Fake Profile Identification, illustrating a collaborative effort in the system's
construction.

Subsequently, the sequence transitions to the identification phase, where a Remote


User initiates a request to the Web Server for Identifying Fake Profiles. This prompts the web
server to retrieve the relevant datasets and pre-process the user's input profile using the same
NLP techniques employed during the training phase. The web server then deploys the trained
SVM algorithm to predict the identification status of the user's profile, discerning whether it is
genuine or potentially fake. The result of this prediction is relayed back to the Remote User,
providing them with valuable insights into the legitimacy of the profile in question. This
iterative process, involving the Remote User, the Web Server for Identifying Fake Profiles, and
the trained SVM algorithm, encapsulates the core functionality of the system, empowering users
to make informed decisions about the authenticity of profiles within the online social network.

21
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

4. IMPLEMENTATION

22
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

4.1 SUPPORT VECTOR MACHINE

In our project for fake profile identification, we leverage Support Vector


Machines (SVMs) as a key component of our classification pipeline. SVMs are chosen due to
their effectiveness in separating classes in high-dimensional feature spaces, making them
well-suited for the task of distinguishing between genuine and fake profiles. Our approach
begins with data collection, where we gather a dataset comprising various attributes of user
profiles, such as posting frequency, friend count, profile completeness, and textual
information like profile descriptions and status updates. These attributes serve as the features
used by the SVM model to learn patterns distinguishing genuine from fake profiles.

During the training phase, the SVM model learns to discriminate between genuine
and fake profiles by optimizing a decision boundary in the feature space. The objective is to
find the hyperplane that maximizes the margin between the two classes while minimizing
classification errors. Through iterative optimization, the SVM identifies a decision boundary
that best separates the genuine and fake profiles based on their feature representations.
Additionally, SVMs are capable of handling non-linear decision boundaries through kernel
methods, allowing them to capture complex relationships within the data.

In the identification phase, the trained SVM classifier is utilized to classify new
profiles as either genuine or fake. When presented with the attributes of a profile, the SVM
model calculates its position relative to the decision boundary learned during training.
Profiles lying on one side of the boundary are classified as genuine, while those on the other
side are classified as fake. This decision-making process enables our system to automatically
identify and flag suspicious profiles in real-time, providing a proactive approach to
combating fraudulent activities on social media platforms and online communities. Overall,
SVMs serve as a powerful tool in our project for fake profile identification, offering robust
performance and accurate classification capabilities to help ensure the integrity and security
of online environments.

23
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

4.2 NAÏVE BAYES

In our project for fake profile identification, we integrate Naive Bayes as a


fundamental component of our classification framework. Naive Bayes is selected for its
simplicity, efficiency, and effectiveness in handling text data, which is prevalent in social
media profiles. The Naive Bayes algorithm operates based on Bayes' theorem, which
calculates the posterior probability of a class given the features of an instance. Despite its
"naive" assumption of feature independence, Naive Bayes often performs well in practice and
requires minimal computational resources, making it suitable for largescale fake profile
detection tasks.

To utilize Naive Bayes in our project, we start by preparing a dataset containing


various attributes of user profiles, including textual information such as profile descriptions
and status updates, as well as non-textual features like posting frequency and friend count.
Text preprocessing techniques such as tokenization, stop-word removal, and possibly
stemming or lemmatization are applied to the textual data to extract meaningful features. The
Naive Bayes classifier then learns from this dataset, estimating the probabilities of each
feature given the class labels (genuine or fake).

During classification, Naive Bayes calculates the posterior probability of each


profile being genuine or fake based on its attributes. By applying Bayes' theorem, which
combines prior probabilities with the likelihood of observed features given each class, Naive
Bayes computes the probability of each class for a given profile. Profiles are then classified
as genuine or fake based on the class with the highest posterior probability. This probabilistic
approach enables Naive Bayes to effectively identify genuine and fake profiles by leveraging
the distribution of features within each class, making it a valuable tool in our project for
enhancing the security and integrity of online communities.

24
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

4.3 DATASET DESCRIPTION

Our dataset, sourced from the Kaggle website, encompasses a diverse range of
attributes aimed at discerning the authenticity of user profiles, comprising of 2676 instances.
https://www.kaggle.com/datasets/whoseaspects/genuinefake-user-profile-dataset.The
attributes include identifiers such as ID, name, and screen name, facilitating unique
identification and tracking of individual profiles. Additionally, categorical attributes like
default profile, which indicates whether a user has customized their profile layout, and
location, denoting the geographical information provided by the user, contribute to
understanding user behavior and characteristics. These attributes serve as valuable features
for discerning genuine profiles from fake ones, offering insights into user engagement and
interaction patterns.

Moreover, quantitative attributes like favorites count, statuses count, followers


count, friends count, and favorites count provide quantitative measures of user activity and
engagement on the platform. These metrics offer valuable signals regarding the level of
activity, popularity, and interaction of a user within the platform's ecosystem. Additionally,
attributes such as profile image URL, profile banner URL, and profile background image
URL provide insights into the visual elements associated with user profiles, which may
influence user perception and credibility. By leveraging these diverse attributes, our dataset
offers a comprehensive view of user profiles, enabling robust analysis and classification of
genuine and fake profiles.

Furthermore, textual attributes like description and name serve as textual


representations of user profiles, offering insights into user-generated content and self-
description. The description attribute contains user-provided text describing themselves or
their interests, while the name attribute denotes the user's chosen display name. These textual
features offer valuable cues for understanding user intent, interests, and authenticity, as fake
profiles may exhibit patterns of generic or misleading information. By incorporating both
numerical and textual attributes, our dataset provides a multifaceted perspective on user
profiles, enabling effective classification and identification of genuine and fake profiles in

25
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

online communities.

4.4 PERFORMANCE METRICS

ACCURACY

Accuracy is a crucial performance metric used in fake profile identification systems to


assess the overall correctness of classifications. It represents the proportion of correctly
identified profiles, whether genuine or fake, among all profiles in the dataset. In the context
of fake profile identification, accuracy measures the system's ability to correctly classify
profiles without distinction between false positives (genuine profiles misclassified as fake)
and false negatives (fake profiles misclassified as genuine). A high accuracy score indicates
that the system effectively distinguishes between genuine and fake profiles, providing
confidence in its reliability. However, accuracy alone may not be sufficient for evaluating the
performance of such systems, especially when dealing with imbalanced datasets where one
class is significantly more prevalent than the other. In such cases, accuracy should be
complemented with other metrics such as precision, recall, and F1 score to provide a more
comprehensive assessment of the system's effectiveness and identify potential biases or
shortcomings in the classification process.

Accuracy= (Number of correctly identified fake profiles/ Total number of profiles) ×100%

CLASSIFICATION REPORT

The classification report in the context of fake profile identification provides a


comprehensive summary of the model's performance by presenting various evaluation
metrics for each class (genuine and fake profiles). It typically includes metrics such as
precision, recall, F1 score, and support.

Precision: Precision measures the proportion of correctly identified fake profiles among all
profiles classified as fake. It is calculated as the ratio of true positives (correctly identified
fake profiles) to the sum of true positives and false positives (genuine profiles misclassified
as fake). High precision indicates that when the model predicts a profile as fake, it is likely to

26
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

be correct.
Precision= True Positives /( True Positives + False Positives )

Recall: Recall, also known as sensitivity, measures the proportion of correctly identified fake
profiles among all actual fake profiles in the dataset. It is calculated as the ratio of true
positives to the sum of true positives and false negatives (fake profiles misclassified as
genuine). High recall indicates that the model effectively captures most of the fake profiles in
the dataset.
Recall= True Positives /( True Positives + False Negatives)

F1 Score: The F1 score is the harmonic mean of precision and recall, providing a balance
between the two metrics. It takes into account both false positives and false negatives and is
calculated as 2 * (precision * recall) / (precision + recall). F1 score ranges from 0 to 1, where
higher values indicate better performance in terms of both precision and recall.
F1=2× (Precision×Recall/ Precision+Recall)

Support: Support represents the number of occurrences of each class in the dataset. It
provides insights into the distribution of classes and helps interpret the significance of
precision, recall, and F1 score. For instance, if one class has significantly higher support than
the other, it may influence the interpretation of the model's performance metrics.

CONFUSION MATRIX

The confusion matrix is a fundamental tool in evaluating the performance of


classification models in fake profile identification. It provides a tabular representation that
summarizes the performance of a classification algorithm by comparing predicted class labels
with actual class labels. In the context of fake profile identification, the confusion matrix
comprises two classes: genuine profiles (often labeled as 0) and fake profiles (labeled as 1).
The matrix consists of four quadrants: true positives (TP), false positives (FP), true negatives
(TN), and false negatives (FN).

True positives (TP) represent the number of fake profiles correctly identified by the
model. These are instances where the model correctly predicts a profile as fake when it is
indeed fake. False positives (FP) indicate the number of genuine profiles incorrectly
classified as fake by the model. These occur when the model mistakenly predicts a genuine

27
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

profile as fake. True negatives (TN) represent the number of genuine profiles correctly
identified as genuine by the model. These instances occur when the model correctly predicts
a genuine profile as genuine. Lastly, false negatives (FN) denote the number of fake profiles
incorrectly classified as genuine by the model. These occur when the model fails to identify a
fake profile accurately and mistakenly predicts it as genuine.

By analyzing the values in the confusion matrix, one can gain insights into the
strengths and weaknesses of the classification model. For instance, a high number of true
positives and true negatives relative to false positives and false negatives indicates that the
model performs well in accurately identifying both genuine and fake profiles. Conversely, a
higher number of false positives or false negatives may indicate areas where the model's
performance can be improved. The confusion matrix provides a clear and concise
visualization of the classification results, enabling stakeholders to understand the model's
performance and make informed decisions about its effectiveness in fake profile
identification tasks.

4.3: Confusion matrix

4.4 SAMPLE CODE

from django.db.models import Count, Avg


from django.shortcuts import render, redirect
from django.db.models import Count
from django.db.models import Q

28
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

import datetime
import xlwt
from django.http import HttpResponse

import string
import re
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re
from sklearn.ensemble import VotingClassifier
import warnings
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score
import numpy as np
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
import openpyxl

# Create your views here.


from Remote_User.models import
ClientRegister_Model,profile_identification_type,detection_ratio,detection_accuracy

def serviceproviderlogin(request):
if request.method == "POST":
admin = request.POST.get('username')
password = request.POST.get('password')
if admin == "Admin" and password =="Admin":
return redirect('View_Remote_Users')

return render(request,'SProvider/serviceproviderlogin.html')

def View_Profile_Identity_Prediction(request):

obj = profile_identification_type.objects.all()
return render(request, 'SProvider/View_Profile_Identity_Prediction.html', {'objs': obj})

def View_Profile_Identity_Prediction_Ratio(request):

29
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

detection_ratio.objects.all().delete()
ratio = ""
kword = 'Genuine Profile'
print(kword)
obj = profile_identification_type.objects.all().filter(Prediction=kword)
obj1 = profile_identification_type.objects.all()
count = obj.count();
count1 = obj1.count();
ratio = (count / count1) * 100
if ratio != 0:
detection_ratio.objects.create(names=kword, ratio=ratio)

ratio1 = ""
kword1 = 'Fake Profile'
print(kword1)
obj1 = profile_identification_type.objects.all().filter(Prediction=kword1)
obj11 = profile_identification_type.objects.all()
count1 = obj1.count();
count11 = obj11.count();
ratio1 = (count1 / count11) * 100
if ratio1 != 0:
detection_ratio.objects.create(names=kword1, ratio=ratio1)

obj = detection_ratio.objects.all()
return render(request, 'SProvider/View_Profile_Identity_Prediction_Ratio.html', {'objs':
obj})

def View_Remote_Users(request):
obj=ClientRegister_Model.objects.all()
return render(request,'SProvider/View_Remote_Users.html',{'objects':obj})

def charts(request,chart_type):
chart1 = detection_ratio.objects.values('names').annotate(dcount=Avg('ratio'))
return render(request,"SProvider/charts.html", {'form':chart1, 'chart_type':chart_type})

def charts1(request,chart_type):
chart1 = detection_accuracy.objects.values('names').annotate(dcount=Avg('ratio'))
return render(request,"SProvider/charts1.html", {'form':chart1, 'chart_type':chart_type})

def likeschart(request,like_chart):
charts =detection_accuracy.objects.values('names').annotate(dcount=Avg('ratio'))

30
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

return render(request,"SProvider/likeschart.html", {'form':charts, 'like_chart':like_chart})

def likeschart1(request,like_chart):
charts =detection_ratio.objects.values('names').annotate(dcount=Avg('ratio'))
return render(request,"SProvider/likeschart1.html", {'form':charts, 'like_chart':like_chart})

def Download_Trained_DataSets(request):

response = HttpResponse(content_type='application/ms-excel')
# decide file name
response['Content-Disposition'] = 'attachment; filename="Predicted_Datasets.xls"'
# creating workbook
wb = xlwt.Workbook(encoding='utf-8')
# adding sheet
ws = wb.add_sheet("sheet1")
# Sheet header, first row
row_num = 0
font_style = xlwt.XFStyle()
# headers are bold
font_style.font.bold = True
# writer = csv.writer(response)
obj = profile_identification_type.objects.all()
data = obj # dummy method to fetch data.
for my_row in data:
row_num = row_num + 1

ws.write(row_num, 0, my_row.prof_idno, font_style)


ws.write(row_num, 1, my_row.name, font_style)
ws.write(row_num, 2, my_row.screen_name, font_style)
ws.write(row_num, 3, my_row.statuses_count, font_style)
ws.write(row_num, 4, my_row.followers_count, font_style)
ws.write(row_num, 5, my_row.friends_count, font_style)
ws.write(row_num, 6, my_row.created_at, font_style)
ws.write(row_num, 7, my_row.location, font_style)
ws.write(row_num, 8, my_row.default_profile, font_style)
ws.write(row_num, 9, my_row.prf_image_url, font_style)
ws.write(row_num, 10, my_row.prf_banner_url, font_style)
ws.write(row_num, 11, my_row.prf_bgimg_https, font_style)
ws.write(row_num, 12, my_row.prf_text_color, font_style)
ws.write(row_num, 13, my_row.profile_image_url_https, font_style)
ws.write(row_num, 14, my_row.prf_bg_title, font_style)
ws.write(row_num, 15, my_row.profile_background_image_url, font_style)
ws.write(row_num, 16, my_row.description, font_style)
ws.write(row_num, 17, my_row.Prf_updated, font_style)

31
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

ws.write(row_num, 18, my_row.Prediction, font_style)

wb.save(response)
return response

def Train_Test_DataSets(request):

detection_accuracy.objects.all().delete()

df = pd.read_csv('Profile_Datasets.csv')

def clean_text(text):
'''Make text lowercase, remove text in square brackets,remove links,remove punctuation
and remove words containing numbers.'''
text = text.lower()
text = re.sub('\[.*?\]', '', text)
text = re.sub('https?://\S+|www\.\S+', '', text)
text = re.sub('<.*?>+', '', text)
text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
text = re.sub('\n', '', text)
text = re.sub('\w*\d\w*', '', text)
text = re.sub('"@', '', text)
text = re.sub('@', '', text)
text = re.sub('https: //', '', text)
text = re.sub('—', '', text)
text = re.sub('\n\n', '', text)

return text

df['processed_content'] = df['name'].apply(lambda x: clean_text(x))

def apply_results(label):
if (label == 0):
return 0 # Fake
elif (label == 1):
return 1 # Genuine

df['results'] = df['Label'].apply(apply_results)

cv = CountVectorizer(lowercase=False)

y = df['results']
X = df["id"].apply(str)

32
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

print("X Values")
print(X)
print("Labels")
print(y)

X = cv.fit_transform(X)

models = []
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
X_train.shape, X_test.shape, y_train.shape
print("X_test")
print(X_test)
print(X_train)

# print("Naive Bayes")
#from sklearn.naive_bayes import MultinomialNB
#NB = MultinomialNB()
#NB.fit(X_train, y_train)
#predict_nb = NB.predict(X_test)
#naivebayes = accuracy_score(y_test, predict_nb) * 100
#print("ACCURACY") #print(naivebayes)
#print("CLASSIFICATION REPORT")
#print(classification_report(y_test, predict_nb))
#print("CONFUSION MATRIX")
#print(confusion_matrix(y_test, predict_nb))
#detection_accuracy.objects.create(names="Naive Bayes", ratio=naivebayes)

# SVM Model
print("SVM")
from sklearn import svm
lin_clf = svm.LinearSVC()
lin_clf.fit(X_train, y_train)
predict_svm = lin_clf.predict(X_test)
svm_acc = accuracy_score(y_test, predict_svm) * 100
print(svm_acc)
print("CLASSIFICATION REPORT")
print(classification_report(y_test, predict_svm))
print("CONFUSION MATRIX")
print(confusion_matrix(y_test, predict_svm))
models.append(('svm', lin_clf))
detection_accuracy.objects.create(names="SVM", ratio=svm_acc)

33
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

print("KNeighborsClassifier")
from sklearn.neighbors import KNeighborsClassifier
kn = KNeighborsClassifier()
kn.fit(X_train, y_train)
knpredict = kn.predict(X_test)
print("ACCURACY")
print(accuracy_score(y_test, knpredict) * 100)
print("CLASSIFICATION REPORT")
print(classification_report(y_test, knpredict))
print("CONFUSION MATRIX")
print(confusion_matrix(y_test, knpredict))
models.append(('KNeighborsClassifier', kn))
detection_accuracy.objects.create(names="KNeighborsClassifier",
ratio=accuracy_score(y_test, knpredict) * 100)

obj = detection_accuracy.objects.all()

return render(request,'SProvider/Train_Test_DataSets.html', {'objs': obj})

4.6 RESULT ANALYSIS

From the classification reports and confusion matrices for SVM and KNeighborsClassifier
models, we can observe distinct performance characteristics between the two classifiers in the
context of fake profile identification.

For the SVM model, the overall accuracy is approximately 82.90%, indicating that it
correctly identifies fake and genuine profiles 82.90% of the time. The precision for class 0
(genuine profiles) is 0.75, suggesting that when the model predicts a profile as genuine, it is
correct 75% of the time. However, the recall for class 1 (fake profiles) is lower at 0.65,
indicating that the model fails to capture 35% of the actual fake profiles. The F1-score, which
balances precision and recall, is 0.79 for class 1, reflecting the overall effectiveness of the
model in identifying fake profiles. The confusion matrix further reveals that out of 883
profiles, the SVM model correctly classifies 732 profiles, with 151 false negatives and no
false positives.

34
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

In contrast, the KNeighborsClassifier model exhibits a lower overall accuracy of


approximately 70.78%, indicating that it performs slightly less accurately compared to the
SVM model. The precision for class 0 (genuine profiles) is 0.64, indicating a lower
proportion of correctly identified genuine profiles compared to the SVM model. Additionally,
the recall for class 1 (fake profiles) is higher at 0.40, suggesting that the
KNeighborsClassifier model captures a higher proportion of actual fake profiles compared to
the SVM model. However, the overall F1-score for class 1 is lower at 0.57, indicating a trade-
off between precision and recall. The confusion matrix reveals that out of 883 profiles, the
KNeighborsClassifier model correctly classifies 625 profiles, with 258 false negatives and no
false positives.

In summary, while both models achieve reasonable accuracy in fake profile


identification, the SVM model demonstrates superior precision for genuine profiles and
comparable recall for fake profiles compared to the KNeighborsClassifier model. However,
the KNeighborsClassifier model exhibits a higher recall for fake profiles at the expense of
precision. Therefore, the choice between the two models depends on the specific
requirements and priorities of the fake profile identification task, such as the importance of
minimizing false positives or false negatives.

4.5: Result Analysis

35
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

5. SCREENSHOTS

36
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

5.1: home page

The project's homepage interface serves as the gateway for users, offering a seamless
login experience. Users input their credentials in designated fields, ensuring secure access to
the platform. With a focus on user-friendly design and robust security measures, the interface
sets the stage for a positive user interaction.

5.2: Service Provider login page

The service provider login page facilitates secure access for providers using their
credentials. Users enter their login details in the designated fields, ensuring a streamlined and
authenticated experience. With a focus on security and user-friendly design, the interface
enhances the service provider's login process.

37
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

5.3: user register page

The user registration page allows new users to sign up by providing necessary details.
Users input their information in the designated fields, ensuring a straightforward and secure
registration process. With an emphasis on simplicity and data protection, the interface
enhances the user's experience during registration.

5.4: Predict Profile Identification Status page

38
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

The Predict Profile Identification Status page enables users to input details for
identifying whether a profile is genuine or fake. Users provide relevant profile information in
the designated fields, initiating the identification process. With a user-centric design, this
page facilitates an intuitive and informative experience for users seeking to assess the
authenticity of a profile.

5.5: Profile Status Prediction Type Ratio Page

The Profile Status Prediction Type Ratio page presents the ratio of fake to genuine
profiles in a user-friendly format. Users can view a clear representation of the predicted
profile types, fostering insights into the overall distribution. This page enhances user
understanding by providing an informative ratio analysis of identified fake and genuine
profiles.

39
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

5.6: Profile Datasets Trained and Tested Results

The Profile Datasets Trained and Tested Results page provides insights into the
accuracy of the algorithm utilized in our fake profile identification project. It presents the
outcomes of training and testing phases, offering a comprehensive view of the algorithm's
performance. This page serves as a key analytics tool, empowering users to assess the
effectiveness of the employed algorithm in accurately identifying fake profiles.

40
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

6. TESTING

41
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

6. TESTING

6.1 INTRODUCTION TO TESTING


The purpose of testing is to discover errors. Testing is the process of trying
to discover every conceivable fault or weakness in a work product. It provides a
way to check the functionality of components, subassemblies, assemblies and/or a
finished product. It is the process of exercising software with the intent of ensuring
that the Software system meets its requirements and user expectations and does not
fail in an unacceptable manner. There are various types of tests. Each test type
addresses a specific testing requirement.

6.2 TYPES OF TESTING

6.2.1 UNIT TESTING

Unit testing involves the design of test cases that validate that the internal
program logic is functioning properly, and that program inputs produce valid
outputs. All decision branches and internal code flow should be validated. It is the
testing of individual software units of the application .It is done after the
completion of an individual unit before integration. This is a structural testing that
relies on knowledge of its construction and is invasive. Unit tests perform basic
tests at component level and test a specific business process, application and/or
system configuration. Unit tests ensure that each unique path of a business process
performs accurately to the documented specifications and contains clearly defined
inputs and expected results.

6.2.2 INTEGRATION TESTING

Integration tests are designed to test integrated software components to


determine if they actually run as one program. Integration tests demonstrate that
although the components were individually satisfactory, as shown by successfully
unit testing, the combination of components is correct and consistent. Integration
testing is specifically aimed at exposing the problems that arise from the

42
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

combination of components.

6.2.3 FUNCTIONAL TESTING

Functional tests provide systematic demonstrations that functions tested are


available as specified by the business and technical requirements, system
documentation, and user manuals.
Functional testing is centered on the following items:

Valid input : identified classes of valid input must be accepted.

Invalid input : identified classes of invalid input must be rejected.

Functions : identified functions must be exercised.

Output : identified classes of application outputs must be exercised.

Systems/Procedures: interfacing systems or procedures must be invoked.


Organization and preparation of functional tests is focused on requirements, key
functions, or special test cases

43
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

6.3 TEST CASES

Test Test Case Input Expected Actual Test Case


Case ID Name output Output Pass/Fail

Username: dhanya It should move to It moves to the user


1 User Password : user home page home page Pass
credentials dhanya@123

Username: XYZ It shows the error It shows the error


2 Check (Which is invalid) The username is The username is Pass
Username not available not available

Username: hello Gives the error Gives the error that


3 Creating an (if username is Username already username already Pass
account already taken) exists exists

Shows the message Shows the message


4 registration Mail ID Account exists with Account exists with pass
(Already exists) the given Mail ID. the given Mail ID.
Try login Try login

5 Registration Invalid Phone Gives the message Gives the message Pass
details number “Invalid Details” “Invalid Details”
(more than 10
numbers)

6.3: TEST CASES

44
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

7.CONCLUSION

45
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

7. CONCLUSION & FUTURE SCOPE

7.1 CONCLUSION

In the context of our project, we introduced a robust framework by leveraging


machine learning algorithms coupled with natural language processing techniques. The
primary objective of this framework is to facilitate the efficient detection of fake profiles
within social network sites. The approach involves the comprehensive analysis of several
social network datasets, wherein the application of natural language processing pre-
processing techniques plays a pivotal role. These techniques serve to refine and prepare the
datasets, optimizing them for subsequent analysis. To enhance the accuracy of profile
classification, machine learning algorithms, specifically Support Vector Machine (SVM) and
Naïve Bayes, are employed. These algorithms contribute significantly to the classification
process, enabling a more precise identification of fake profiles within the dataset.

In our project, we utilized these techniques on various social network datasets to


evaluate their effectiveness in identifying fake profiles. The implementation of NLP pre-
processing techniques, combined with SVM and Naïve Bayes algorithms, proved to be
instrumental in improving the overall detection accuracy rate. Notably, the accuracy of our
project reached an impressive 82%, underscoring the efficacy of the proposed methodology
in distinguishing between genuine and fake profiles. This achievement highlights the
potential of integrating advanced machine learning and NLP techniques for robust and
accurate fake profile identification within the dynamic landscape of social network platforms.

7.2 FUTURE SCOPE

The potential idea behind the project of Fake Profile Identification In


Social Networks Using Machine Learning and NLP involves refining algorithms, exploring
deep learning, and incorporating larger datasets for improved accuracy. Behavioral analysis
and anomaly detection methods could be integrated for a comprehensive approach, while
advancements in NLP and algorithmic interpretability will enhance reliability. Ethical
considerations in model development will become increasingly important for responsible and
privacy-aware solutions in the evolving digital landscape

46
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

BIBLIOGRAPHY

47
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

BIBLIOGRAPHY

REFERENCES

[1] Michael Fire et al. (2012). "Strangers intrusiondetection-detecting spammers and fake
profiles in social networks based on topology anomalies." Human Journal 1(1): 26-
39.Günther, F. and S. Fritsch (2010). "neuralnet: Training of neural networks." The R Journal
2(1): 30-38

[2] Dr. S. Kannan, Vairaprakash Gurusamy, “Preprocessing Techniques for Text Mining”, 05
March 2015.

[3] Shalinda Adikari and Kaushik Dutta, Identifying Fake Profiles in LinkedIn, PACIS 2014
Proceedings, AISeL

[4] Z. Halim, M. Gul, N. ul Hassan, R. Baig, S. Rehman, and F. Naz,“Malicious users’ circle
detection in social network based on spatiotemporal co-occurrence,” in Computer Networks
and Information Technology (ICCNIT),2011 International Conference on, July, pp. 35–390.

[5] Liu Y, Gummadi K, Krishnamurthy B, Mislove A,” Analyzing Facebook privacy settings:
Userexpectations vs. reality”, in: Proceedings of the 2011 ACM SIGCOMM conference on
Internet measurement conference,ACM,pp.61–70.

[6] Dataset link : https://www.kaggle.com/datasets/whoseaspects/genuinefake-user-profile-


dataset

GITHUB LINK

[1] Project Code GitHub Link: https://github.com/ShreshtaReddy1910/Fake-Profile-Identification-


Using-Machine-Learning-and-NLP

48
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP

49
CMRTC

You might also like