Sample Copy of Major Project Report
Sample Copy of Major Project Report
Project Report
On
FAKE PROFILE IDENTIFICATION IN SOCIAL
NETWORKS USING MACHINE LEARNING
AND NLP
Submitted in partial fulfillment of the requirements for the award of Degree
BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING (AI&ML)
by
P.SHRESHTA REDDY (207R1A66G5)
K.NIKHIL (207R1A66E6)
AYUSH SHRIVASTHAV (207R1A66C6)
CERTIFICATE
The results embodied in this thesis have not been submitted to any other University or
Institute for the award of any degree or diploma.
EXTERNAL EXAMINER
Apart from the efforts of us, the success of any project depends largely on the
encouragement and guidelines of many others. We take this opportunity to express our
gratitude to the people who have been instrumental in the successful completion of this
project.
We take this opportunity to express my profound gratitude and deep regard to
my guide Dr.S Rao Chintalapudi, Associate Professor and HOD CSE(AI&ML) for his
exemplary guidance, monitoring and constant encouragement throughout the project work.
The blessing, help and guidance given by him shall carry us a long way in the journey of
life on which we are about to embark.
We also take this opportunity to express a deep sense of gratitude to the
Project Review Committee (PRC) Dr. G. Vinoda Reddy, Dr. K. Mahesh, N. Sateesh & B.
Mamatha for their cordial support, valuable information and guidance, which helped us in
completing this task through various stages.
We are also thankful to Dr. S Rao Chintalapudi, Head, Department of Computer
Science and Engineering (AI&ML) for providing encouragement and support for completing
this project successfully.
We are obliged to Dr. A. Raji Reddy, Director for being cooperative throughout
the course of this project. We also express our sincere gratitude to Sri. Ch. Gopal Reddy,
Chairman for providing excellent infrastructure and a nice atmosphere throughout the course
of this project.
The guidance and support received from all the members of CMR Technical
Campus who contributed to the completion of the project. We are grateful for their constant
support and help.
Finally, we would like to take this opportunity to thank our family for their
constant encouragement, without which this assignment would not be completed. We
sincerely acknowledge and thank all those who gave support directly and indirectly in the
completion of this project.
At present social network sites are part of the life for most of the people. Every day
several people are creating their profiles on the social network platforms and they are interacting
with others independent of the user’s location and time. The social network sites not only
providing advantages to the users and also provide security issues to the users as well their
information. To analyze, who are encouraging threats in social network we need to classify the
social networks profiles of the users. From the classification, we can get the genuine profiles and
fake profiles on the social networks. Traditionally, we have different classification methods for
detecting the fake profiles on the social networks. But, we need to improve the accuracy rate of
the fake profile detection in the social networks.
In this project we are proposing Machine learning and Natural language Processing (NLP)
techniques to improve the accuracy rate of the fake profiles detection. We can use the Support
Vector Machine (SVM) and Naïve Bayes algorithm. Consequently, the convergence of cutting-
edge Machine Learning and Natural Language Processing methodologies stands as a beacon of
hope in this realm. By harnessing the prowess of Support Vector Machine (SVM) and Naïve
Bayes algorithms, this project embarks on a journey towards not just detecting fake profiles, but
elevating the very foundations of online security. As we navigate the digital age, this initiative
embodies a proactive stride towards ensuring authenticity, bolstering user trust, and fostering a
secure social network environment for all.
LIST OF FIGURES
27
4.3 Confusion Matrix
34
4.5 Result Analysis
5.1 Main Home Page 37
6.3 TESTCASES 44
TABLE OF CONTENTS
ABSTRACT i
LIST OF FIGURES ii
LIST OF TABLES iii
1. INTRODUCTION 1
1.1 PROJECT SCOPE 1
1.2 PROJECT PURPOSE 2
1.3 PROJECT FEATURES 2
2. SYSTEM ANALYSIS 4
2.1 PROBLEM DEFINITION 4
2.2 EXISTING SYSTEM / LITERATURE REVIEW 4
2.2.1 EXISTING SYSTEM 6
2.2.2 LIMITATIONS OF EXISTING SYSTEMS 7
2.3 PROPOSED SYSTEM 7
2.3.1 PROPOSED APPROACH 8
2.3.2 ADVANTAGES OF PROPOSED SYSTEM 9
2.4 HARDWARE & SOFTWARE REQUIREMENTS 9
2.4.1 HARDWARE REQUIREMENTS 9
2.4.2 SOFTWARE REQUIREMENTS 10
3. ARCHITECTURE 11
3.1 PROJECT ARCHITECTURE 12
3.2 USE CASE DIAGRAM 15
3.3 CLASS DIAGRAM 17
3.4 SEQUENCE DIAGRAM 19
4. IMPLEMENTATION 22
4.1 SUPPORT VECTOR MACHINE 23
4.2 NAÏVE BAYES 24
4.3 DATASET DESCRIPTION 25
4.4 PERFORMANCE METRICS 26
4.5 SAMPLE CODE 28
4.6 RESULT ANALYSIS 34
5. SCREENSHOTS 36
6. TESTING 41
REFERENCES 48
GITHUB LINK 48
1. INTRODUCTION
Fake Profile Identification in Social Networks Using Machine Learning and NLP
1. INTRODUCTION
1
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
2
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
3
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
2.SYSTEM ANALYSIS
4
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
2. SYSTEM ANALYSIS
SYSTEM ANALYSIS
System Analysis is the important phase in the system development process. The
System is studied to the minute details and analyzed. The system analyst plays an important role
of an interrogator and dwells deep into the working of the present system. In analysis, a detailed
study of these operations performed by the system and their relationships within and outside the
system is done. A key question considered here is, “what must be done to solve the problem?”
The system is viewed as a whole and the inputs to the system are identified. Once analysis is
completed the analyst has a firm understanding of what is to be done.
The general statement for Fake Profile Identification is it employs advanced machine
learning algorithms and Natural Language Processing (NLP) to automatically discern genuine
from fraudulent accounts in Online Social Networks (OSNs). This system analyzes user profiles
and activities, identifying patterns and anomalies that may indicate the presence of false profiles.
By scrutinizing user behaviors, posting patterns, and engagement metrics, along with linguistic
aspects of user-generated content, the system aims to proactively mitigate risks associated with
social engineering and impersonation. This innovative approach contributes to a safer online
environment, upholding the integrity of digital interactions and enhancing user trust in OSNs
Existing systems grapple with the intricate task of striking a balance between user
preferences and efficient dialog management in natural language dialog systems. This challenge
is compounded in LinkedIn profile identification by privacy constraints, which curtail the
effectiveness of conventional methods. The limitations imposed by privacy considerations hinder
the seamless integration of user preferences and pose obstacles to optimal dialog management
strategies within these systems.
5
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
In previous systems for fake profile identification, the k-Nearest Neighbors (KNN)
algorithm has been employed by first representing profiles as vectors in a high-dimensional
feature space. Each dimension corresponds to a characteristic of the profile, such as posting
frequency, friend count, or profile completeness. During classification, KNN identifies the k
nearest neighbors of a given profile based on a distance metric (e.g., Euclidean distance) and
assigns a label to the profile based on the majority class among its neighbors. This approach
benefits from simplicity and interpretability, as it directly compares new profiles to existing ones
for classification, without assuming underlying distributions.
However, despite its simplicity, KNN may not be the most effective approach for
fake profile identification due to several reasons. Firstly, KNN tends to be computationally
expensive, especially as the dataset size increases, since it requires calculating distances to all
data points for each prediction. Additionally, KNN is sensitive to noise and irrelevant features in
the dataset, which can lead to inaccurate classifications. Furthermore, KNN's performance
heavily relies on the choice of the number of neighbors (k) and the distance metric, which may
not be optimal for all datasets. Lastly, KNN may struggle with high-dimensional data, as the
6
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
curse of dimensionality
can cause the feature space to become sparse, impacting the effectiveness of distance-based
computations. As a result, while KNN offers simplicity and interpretability, it may not always
provide the desired level of accuracy and scalability required for robust fake profile
identification systems.
Complementing the SVM classifier, our system integrates the Naive Bayes
algorithm to further enhance detection accuracy. Despite its assumption of feature independence,
Naive Bayes proves to be exceptionally effective in evaluating the likelihood of a profile being
false based on various features. This algorithm's ability to consider the joint probabilities of
multiple features adds a layer of sophistication to the analysis, providing a comprehensive
understanding of profile authenticity. Notably, our proposed system achieves an impressive
accuracy of 82 percent, surpassing the performance of existing systems. This accomplishment
underscores the effectiveness of the Support Vector Machine classifier and Naive Bayes
algorithm in tandem, forming a robust framework that not only elevates the accuracy of false
7
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
profile detection but also enhances the overall security of online social networks.
A proposed approach for fake profile identification using Support Vector Machines
(SVM) and Naive Bayes involves several steps. Firstly, the data preprocessing phase includes
collecting a dataset containing profiles' attributes such as posting frequency, friend count, profile
completeness, and textual information like profile descriptions and status updates. Then, the
dataset is cleaned by removing duplicates, irrelevant features, and handling missing values.
Textual data undergoes tokenization, stop-word removal, and possibly stemming or
lemmatization.
Following preprocessing, feature extraction is conducted to convert the textual and non-textual
attributes into a suitable format for classification. This may involve techniques like TF-IDF
(Term Frequency-Inverse Document Frequency) for text data and normalization or scaling for
numerical features. Once the features are extracted, they are used to train both SVM and Naive
Bayes classifiers separately.
In the training phase, the SVM classifier learns to separate genuine and fake profiles by finding
the hyperplane that maximizes the margin between the two classes in the feature space. On the
other hand, Naive Bayes calculates the probabilities of each feature given the class labels and
uses Bayes' theorem to compute the posterior probability of a profile being genuine or fake.
After training, the performance of both classifiers is evaluated using metrics such as accuracy,
precision, recall, and F1 score on a held-out validation set or through cross-validation. This
evaluation helps in selecting the best-performing model for deployment. It's worth noting that
SVM and Naive Bayes have different assumptions and properties, so comparing their
performance provides insights into which algorithm is more suitable for the specific
characteristics of the dataset.
Once the best-performing model is chosen, it can be deployed for real-time classification of new
profiles. The deployed model takes the profile attributes as input and predicts whether the profile
is genuine or fake. This classification can be used to flag suspicious profiles for further
8
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
investigation or to automate actions such as suspending fake accounts.
Finally, the deployed model should be monitored regularly to ensure its performance remains
satisfactory over time. This may involve retraining the model periodically with updated data and
fine-tuning its hyper parameters to adapt to evolving patterns of fake profile creation and
detection. Additionally, continuous evaluation of the model's performance and incorporating
feedback from users or domain experts can help in further improving the effectiveness of the
fake profile identification system.
The proposed system implemented using the machine learning techniques, the proposed system
is processing in the following way.
Improved accuracy in fake profile identification
Comprehensive analysis considering various factors and behaviors
Adaptive learning capability for staying effective over time
Increased efficiency through automation
Minimization of false positives
Effective handling of diverse textual data with NLP
PROCESSOR : i5 or above
RAM : 4GB (min)
HARD DISK : 20 GB
KEYBOARD : Standard Windows Keyboard
MOUSE : Two or Three Button Mouse
MONITOR : SVGA
9
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
BACK-END : Django-ORM
DESIGNING : HTML, CSS, JavaScript
DATABASE : MySQL
10
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
3. ARCHITECTURE
11
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
3.ARCHITECTURE
12
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
DESCRIPTION
Our methodological journey unfolds with a meticulous extraction process, where
we acquire a diverse and comprehensive dataset from various online social networks. This
dataset lays the groundwork for an in-depth analysis focused on unraveling the intricacies of
false profile identification. As we traverse the realms of user profiles across platforms, our
emphasis intensifies on capturing a rich and varied set of data, ensuring a representative
sample that mirrors the diversity inherent in online social interactions.
At the core of our false profile identification system lies the deployment of the
Support Vector Machine (SVM), a robust and versatile classification algorithm. SVM excels
in distinguishing between different classes based on the features extracted during the
preprocessing stage. By maximizing the separation between genuine and fake profiles in a
high-dimensional space, SVM contributes to a precise and reliable classification process,
ultimately enhancing the overall accuracy of our system.
13
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
Executing the SVM algorithm seamlessly transitions us into the evaluation phase,
a critical step in assessing the system's performance. Key parameters, particularly the True
Positive Rate (TPR) and False Positive Rate (FPR), assume pivotal roles in gauging the
accuracy of the classification. This nuanced evaluation mechanism ensures a reliable
distinction between authentic and deceptive profiles. A profile is deemed legitimate if the
TPR exceeds the FPR; otherwise, it is classified as a fake profile. This meticulous approach
enhances the overall efficacy of our proposed system, reinforcing its capability to identify and
mitigate the presence of false profiles within the dynamic landscape of online social networks.
14
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
In the use case diagram, we have basically one actor who is the user in the
trained model.
A use case diagram is a graphical depiction of a user's possible interactions
with a system. A use case diagram shows various use cases and different types of users
the system has. The use cases are represented by either circles or ellipses. The actors
are often shown as stick figures.
3.2: Use Case Diagram for Fake Profile Identification in Social Networks Using
Machine Learning and NLP
15
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
DESCRIPTION
The use case diagram encapsulates the functionalities available to the service
provider and remote user entities. For the service provider, a set of methods is defined to
facilitate access to train and test user profile datasets, view accuracy results in a bar chart,
examine detailed accuracy outcomes, register and log in to their account, predict profile
identification statuses, view their own profile, explore all predicted profile identification results,
find and view specific prediction ratio results, inspect their own profile identity ratio results, and
download predicted datasets. These methods collectively empower the service provider to
interact comprehensively with the system, from data access and analysis to user management.
On the other hand, the remote user class is equipped with methods for registering
and logging into their account, viewing their own profile, predicting profile identification
statuses, and obtaining an overview of all registered remote users. The simplicity of the remote
user's functionality aligns with the limited scope of operations expected from this entity.
Within the use case diagram, a clear relationship is established between the service
provider and the remote user. This association allows the service provider to view all registered
remote users, fostering a collaborative environment within the system. This structural depiction
offers a concise overview of the system's architecture, highlighting the essential functionalities
available to both service providers and remote users, and the relationships that facilitate a
seamless interaction between these entities.
16
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
Class diagram is a type of static structure diagram that describes the structure
of a system by showing the system’s classes, their attributes, operations(or methods),
and the relationships among objects.
3.3: Class Diagram for Fake Profile Identification in Social Networks Using
Machine Learning and NLP
17
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
DESCRIPTION
In the envisioned class diagram for fake profile identification, four distinct classes play
integral roles in orchestrating the system's functionality: Service Provider, Remote User, Register, and
Login. The Service Provider class embodies the core entity responsible for overseeing and managing the
various aspects of fake profile identification. Its methods include accessing and manipulating user profile
datasets, viewing accuracy results through bar charts, predicting profile identification statuses, managing
user accounts through registration and login processes, and analyzing detailed accuracy outcomes. This
class serves as the orchestrator, overseeing the intricate processes involved in the identification of
fraudulent profiles within the online social network.
Complementing the Service Provider, the Remote User class represents the end-user entity
interacting with the system. While its functionalities are more streamlined, focusing on user-specific
actions such as registration, login, profile viewing, and predicting profile identification statuses, the
Remote User class is essential for the system's user-centric interactions. It acts as the consumer of the
fake profile identification services, benefiting from the predictions made by the system to assess the
legitimacy of profiles encountered within the online social network.
The Register and Login classes in the diagram cater specifically to the user authentication
and account management processes. These classes facilitate the creation of new user accounts through
registration and the secure login of users into the system. By segregating these functionalities into
dedicated classes, the system achieves a modular and organized structure, enhancing maintainability and
scalability. Overall, this class diagram encapsulates the intricate relationships and responsibilities of the
Service Provider, Remote User, Register, and Login classes, providing a comprehensive framework for
effective fake profile identification within online social networks
18
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
A sequence diagram shows object interactions arranged in time sequence. It depicts the
objects involved in the scenario and the sequence of messages exchanged between the objects needed to
carry out the functionality of the scenario. Sequence diagrams are typically associated with use case
realizations in the logical view of the system under development.
Figure 3.4.1: Sequence Diagram for building fake profile identification model Using
Machine Learning and NLP
19
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
Figure 3.4.2: Sequence Diagram for identifying fake profile Using Machine
Learning and NLP
20
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
DESCRIPTION
The sequence diagram delineates the interaction and flow of messages between
entities in the system, specifically focusing on the service provider, web server for building fake
profile identification, remote user, and the web server for identifying fake profiles. Let's
elaborate on this sequence in two paragraphs.
‘
The sequence initiates with the Service Provider sending a request to the Web Server
for Building Fake Profile Identification, signalling the commencement of the fake profile
identification process. The web server, upon receiving the request, orchestrates the retrieval of
necessary datasets and begins the pre-processing of the data. This entails tokenization, stop-word
removal, stemming, and lemmatization, all of which are pivotal for refining the dataset for
subsequent analysis. Following this pre-processing step, the web server proceeds to apply
dimensionality reduction techniques and feature extraction methods to optimize the dataset,
making it conducive for analysis. Once the data is prepared, the web server employs a Support
Vector Machine (SVM) algorithm to train the system on the authentic profiles, enhancing its
ability to distinguish between genuine and fake profiles. This training process is encapsulated in
a sequence of messages exchanged between the Service Provider and the Web Server for
Building Fake Profile Identification, illustrating a collaborative effort in the system's
construction.
21
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
4. IMPLEMENTATION
22
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
During the training phase, the SVM model learns to discriminate between genuine
and fake profiles by optimizing a decision boundary in the feature space. The objective is to
find the hyperplane that maximizes the margin between the two classes while minimizing
classification errors. Through iterative optimization, the SVM identifies a decision boundary
that best separates the genuine and fake profiles based on their feature representations.
Additionally, SVMs are capable of handling non-linear decision boundaries through kernel
methods, allowing them to capture complex relationships within the data.
In the identification phase, the trained SVM classifier is utilized to classify new
profiles as either genuine or fake. When presented with the attributes of a profile, the SVM
model calculates its position relative to the decision boundary learned during training.
Profiles lying on one side of the boundary are classified as genuine, while those on the other
side are classified as fake. This decision-making process enables our system to automatically
identify and flag suspicious profiles in real-time, providing a proactive approach to
combating fraudulent activities on social media platforms and online communities. Overall,
SVMs serve as a powerful tool in our project for fake profile identification, offering robust
performance and accurate classification capabilities to help ensure the integrity and security
of online environments.
23
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
24
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
Our dataset, sourced from the Kaggle website, encompasses a diverse range of
attributes aimed at discerning the authenticity of user profiles, comprising of 2676 instances.
https://www.kaggle.com/datasets/whoseaspects/genuinefake-user-profile-dataset.The
attributes include identifiers such as ID, name, and screen name, facilitating unique
identification and tracking of individual profiles. Additionally, categorical attributes like
default profile, which indicates whether a user has customized their profile layout, and
location, denoting the geographical information provided by the user, contribute to
understanding user behavior and characteristics. These attributes serve as valuable features
for discerning genuine profiles from fake ones, offering insights into user engagement and
interaction patterns.
25
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
online communities.
ACCURACY
Accuracy= (Number of correctly identified fake profiles/ Total number of profiles) ×100%
CLASSIFICATION REPORT
Precision: Precision measures the proportion of correctly identified fake profiles among all
profiles classified as fake. It is calculated as the ratio of true positives (correctly identified
fake profiles) to the sum of true positives and false positives (genuine profiles misclassified
as fake). High precision indicates that when the model predicts a profile as fake, it is likely to
26
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
be correct.
Precision= True Positives /( True Positives + False Positives )
Recall: Recall, also known as sensitivity, measures the proportion of correctly identified fake
profiles among all actual fake profiles in the dataset. It is calculated as the ratio of true
positives to the sum of true positives and false negatives (fake profiles misclassified as
genuine). High recall indicates that the model effectively captures most of the fake profiles in
the dataset.
Recall= True Positives /( True Positives + False Negatives)
F1 Score: The F1 score is the harmonic mean of precision and recall, providing a balance
between the two metrics. It takes into account both false positives and false negatives and is
calculated as 2 * (precision * recall) / (precision + recall). F1 score ranges from 0 to 1, where
higher values indicate better performance in terms of both precision and recall.
F1=2× (Precision×Recall/ Precision+Recall)
Support: Support represents the number of occurrences of each class in the dataset. It
provides insights into the distribution of classes and helps interpret the significance of
precision, recall, and F1 score. For instance, if one class has significantly higher support than
the other, it may influence the interpretation of the model's performance metrics.
CONFUSION MATRIX
True positives (TP) represent the number of fake profiles correctly identified by the
model. These are instances where the model correctly predicts a profile as fake when it is
indeed fake. False positives (FP) indicate the number of genuine profiles incorrectly
classified as fake by the model. These occur when the model mistakenly predicts a genuine
27
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
profile as fake. True negatives (TN) represent the number of genuine profiles correctly
identified as genuine by the model. These instances occur when the model correctly predicts
a genuine profile as genuine. Lastly, false negatives (FN) denote the number of fake profiles
incorrectly classified as genuine by the model. These occur when the model fails to identify a
fake profile accurately and mistakenly predicts it as genuine.
By analyzing the values in the confusion matrix, one can gain insights into the
strengths and weaknesses of the classification model. For instance, a high number of true
positives and true negatives relative to false positives and false negatives indicates that the
model performs well in accurately identifying both genuine and fake profiles. Conversely, a
higher number of false positives or false negatives may indicate areas where the model's
performance can be improved. The confusion matrix provides a clear and concise
visualization of the classification results, enabling stakeholders to understand the model's
performance and make informed decisions about its effectiveness in fake profile
identification tasks.
28
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
import datetime
import xlwt
from django.http import HttpResponse
import string
import re
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re
from sklearn.ensemble import VotingClassifier
import warnings
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score
import numpy as np
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
import openpyxl
def serviceproviderlogin(request):
if request.method == "POST":
admin = request.POST.get('username')
password = request.POST.get('password')
if admin == "Admin" and password =="Admin":
return redirect('View_Remote_Users')
return render(request,'SProvider/serviceproviderlogin.html')
def View_Profile_Identity_Prediction(request):
obj = profile_identification_type.objects.all()
return render(request, 'SProvider/View_Profile_Identity_Prediction.html', {'objs': obj})
def View_Profile_Identity_Prediction_Ratio(request):
29
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
detection_ratio.objects.all().delete()
ratio = ""
kword = 'Genuine Profile'
print(kword)
obj = profile_identification_type.objects.all().filter(Prediction=kword)
obj1 = profile_identification_type.objects.all()
count = obj.count();
count1 = obj1.count();
ratio = (count / count1) * 100
if ratio != 0:
detection_ratio.objects.create(names=kword, ratio=ratio)
ratio1 = ""
kword1 = 'Fake Profile'
print(kword1)
obj1 = profile_identification_type.objects.all().filter(Prediction=kword1)
obj11 = profile_identification_type.objects.all()
count1 = obj1.count();
count11 = obj11.count();
ratio1 = (count1 / count11) * 100
if ratio1 != 0:
detection_ratio.objects.create(names=kword1, ratio=ratio1)
obj = detection_ratio.objects.all()
return render(request, 'SProvider/View_Profile_Identity_Prediction_Ratio.html', {'objs':
obj})
def View_Remote_Users(request):
obj=ClientRegister_Model.objects.all()
return render(request,'SProvider/View_Remote_Users.html',{'objects':obj})
def charts(request,chart_type):
chart1 = detection_ratio.objects.values('names').annotate(dcount=Avg('ratio'))
return render(request,"SProvider/charts.html", {'form':chart1, 'chart_type':chart_type})
def charts1(request,chart_type):
chart1 = detection_accuracy.objects.values('names').annotate(dcount=Avg('ratio'))
return render(request,"SProvider/charts1.html", {'form':chart1, 'chart_type':chart_type})
def likeschart(request,like_chart):
charts =detection_accuracy.objects.values('names').annotate(dcount=Avg('ratio'))
30
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
def likeschart1(request,like_chart):
charts =detection_ratio.objects.values('names').annotate(dcount=Avg('ratio'))
return render(request,"SProvider/likeschart1.html", {'form':charts, 'like_chart':like_chart})
def Download_Trained_DataSets(request):
response = HttpResponse(content_type='application/ms-excel')
# decide file name
response['Content-Disposition'] = 'attachment; filename="Predicted_Datasets.xls"'
# creating workbook
wb = xlwt.Workbook(encoding='utf-8')
# adding sheet
ws = wb.add_sheet("sheet1")
# Sheet header, first row
row_num = 0
font_style = xlwt.XFStyle()
# headers are bold
font_style.font.bold = True
# writer = csv.writer(response)
obj = profile_identification_type.objects.all()
data = obj # dummy method to fetch data.
for my_row in data:
row_num = row_num + 1
31
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
wb.save(response)
return response
def Train_Test_DataSets(request):
detection_accuracy.objects.all().delete()
df = pd.read_csv('Profile_Datasets.csv')
def clean_text(text):
'''Make text lowercase, remove text in square brackets,remove links,remove punctuation
and remove words containing numbers.'''
text = text.lower()
text = re.sub('\[.*?\]', '', text)
text = re.sub('https?://\S+|www\.\S+', '', text)
text = re.sub('<.*?>+', '', text)
text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
text = re.sub('\n', '', text)
text = re.sub('\w*\d\w*', '', text)
text = re.sub('"@', '', text)
text = re.sub('@', '', text)
text = re.sub('https: //', '', text)
text = re.sub('â€â€', '', text)
text = re.sub('\n\n', '', text)
return text
def apply_results(label):
if (label == 0):
return 0 # Fake
elif (label == 1):
return 1 # Genuine
df['results'] = df['Label'].apply(apply_results)
cv = CountVectorizer(lowercase=False)
y = df['results']
X = df["id"].apply(str)
32
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
print("X Values")
print(X)
print("Labels")
print(y)
X = cv.fit_transform(X)
models = []
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
X_train.shape, X_test.shape, y_train.shape
print("X_test")
print(X_test)
print(X_train)
# print("Naive Bayes")
#from sklearn.naive_bayes import MultinomialNB
#NB = MultinomialNB()
#NB.fit(X_train, y_train)
#predict_nb = NB.predict(X_test)
#naivebayes = accuracy_score(y_test, predict_nb) * 100
#print("ACCURACY") #print(naivebayes)
#print("CLASSIFICATION REPORT")
#print(classification_report(y_test, predict_nb))
#print("CONFUSION MATRIX")
#print(confusion_matrix(y_test, predict_nb))
#detection_accuracy.objects.create(names="Naive Bayes", ratio=naivebayes)
# SVM Model
print("SVM")
from sklearn import svm
lin_clf = svm.LinearSVC()
lin_clf.fit(X_train, y_train)
predict_svm = lin_clf.predict(X_test)
svm_acc = accuracy_score(y_test, predict_svm) * 100
print(svm_acc)
print("CLASSIFICATION REPORT")
print(classification_report(y_test, predict_svm))
print("CONFUSION MATRIX")
print(confusion_matrix(y_test, predict_svm))
models.append(('svm', lin_clf))
detection_accuracy.objects.create(names="SVM", ratio=svm_acc)
33
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
print("KNeighborsClassifier")
from sklearn.neighbors import KNeighborsClassifier
kn = KNeighborsClassifier()
kn.fit(X_train, y_train)
knpredict = kn.predict(X_test)
print("ACCURACY")
print(accuracy_score(y_test, knpredict) * 100)
print("CLASSIFICATION REPORT")
print(classification_report(y_test, knpredict))
print("CONFUSION MATRIX")
print(confusion_matrix(y_test, knpredict))
models.append(('KNeighborsClassifier', kn))
detection_accuracy.objects.create(names="KNeighborsClassifier",
ratio=accuracy_score(y_test, knpredict) * 100)
obj = detection_accuracy.objects.all()
From the classification reports and confusion matrices for SVM and KNeighborsClassifier
models, we can observe distinct performance characteristics between the two classifiers in the
context of fake profile identification.
For the SVM model, the overall accuracy is approximately 82.90%, indicating that it
correctly identifies fake and genuine profiles 82.90% of the time. The precision for class 0
(genuine profiles) is 0.75, suggesting that when the model predicts a profile as genuine, it is
correct 75% of the time. However, the recall for class 1 (fake profiles) is lower at 0.65,
indicating that the model fails to capture 35% of the actual fake profiles. The F1-score, which
balances precision and recall, is 0.79 for class 1, reflecting the overall effectiveness of the
model in identifying fake profiles. The confusion matrix further reveals that out of 883
profiles, the SVM model correctly classifies 732 profiles, with 151 false negatives and no
false positives.
34
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
35
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
5. SCREENSHOTS
36
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
The project's homepage interface serves as the gateway for users, offering a seamless
login experience. Users input their credentials in designated fields, ensuring secure access to
the platform. With a focus on user-friendly design and robust security measures, the interface
sets the stage for a positive user interaction.
The service provider login page facilitates secure access for providers using their
credentials. Users enter their login details in the designated fields, ensuring a streamlined and
authenticated experience. With a focus on security and user-friendly design, the interface
enhances the service provider's login process.
37
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
The user registration page allows new users to sign up by providing necessary details.
Users input their information in the designated fields, ensuring a straightforward and secure
registration process. With an emphasis on simplicity and data protection, the interface
enhances the user's experience during registration.
38
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
The Predict Profile Identification Status page enables users to input details for
identifying whether a profile is genuine or fake. Users provide relevant profile information in
the designated fields, initiating the identification process. With a user-centric design, this
page facilitates an intuitive and informative experience for users seeking to assess the
authenticity of a profile.
The Profile Status Prediction Type Ratio page presents the ratio of fake to genuine
profiles in a user-friendly format. Users can view a clear representation of the predicted
profile types, fostering insights into the overall distribution. This page enhances user
understanding by providing an informative ratio analysis of identified fake and genuine
profiles.
39
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
The Profile Datasets Trained and Tested Results page provides insights into the
accuracy of the algorithm utilized in our fake profile identification project. It presents the
outcomes of training and testing phases, offering a comprehensive view of the algorithm's
performance. This page serves as a key analytics tool, empowering users to assess the
effectiveness of the employed algorithm in accurately identifying fake profiles.
40
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
6. TESTING
41
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
6. TESTING
Unit testing involves the design of test cases that validate that the internal
program logic is functioning properly, and that program inputs produce valid
outputs. All decision branches and internal code flow should be validated. It is the
testing of individual software units of the application .It is done after the
completion of an individual unit before integration. This is a structural testing that
relies on knowledge of its construction and is invasive. Unit tests perform basic
tests at component level and test a specific business process, application and/or
system configuration. Unit tests ensure that each unique path of a business process
performs accurately to the documented specifications and contains clearly defined
inputs and expected results.
42
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
combination of components.
43
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
5 Registration Invalid Phone Gives the message Gives the message Pass
details number “Invalid Details” “Invalid Details”
(more than 10
numbers)
44
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
7.CONCLUSION
45
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
7.1 CONCLUSION
46
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
BIBLIOGRAPHY
47
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
BIBLIOGRAPHY
REFERENCES
[1] Michael Fire et al. (2012). "Strangers intrusiondetection-detecting spammers and fake
profiles in social networks based on topology anomalies." Human Journal 1(1): 26-
39.Günther, F. and S. Fritsch (2010). "neuralnet: Training of neural networks." The R Journal
2(1): 30-38
[2] Dr. S. Kannan, Vairaprakash Gurusamy, “Preprocessing Techniques for Text Mining”, 05
March 2015.
[3] Shalinda Adikari and Kaushik Dutta, Identifying Fake Profiles in LinkedIn, PACIS 2014
Proceedings, AISeL
[4] Z. Halim, M. Gul, N. ul Hassan, R. Baig, S. Rehman, and F. Naz,“Malicious users’ circle
detection in social network based on spatiotemporal co-occurrence,” in Computer Networks
and Information Technology (ICCNIT),2011 International Conference on, July, pp. 35–390.
[5] Liu Y, Gummadi K, Krishnamurthy B, Mislove A,” Analyzing Facebook privacy settings:
Userexpectations vs. reality”, in: Proceedings of the 2011 ACM SIGCOMM conference on
Internet measurement conference,ACM,pp.61–70.
GITHUB LINK
48
CMRTC
Fake Profile Identification in Social Networks Using Machine Learning and NLP
49
CMRTC