Social Media Fake Account Detection Report 20pages
Social Media Fake Account Detection Report 20pages
1. Introduction
Fake accounts are a growing concern not only because they skew
analytics and user engagement statistics, but also because they can be
weaponized in coordinated campaigns to mislead the public or execute
financial fraud. The significance of this issue is underscored by numerous
case studies in which bot networks have influenced public opinion or
engagement metrics on major platforms like Twitter and Instagram.
Social media platforms have revolutionized the way people communicate and share
information. However, with the increasing popularity of these platforms,
there is also a surge in the number of fake accounts created for malicious purposes such as
spreading misinformation, phishing, spamming, and impersonation.
This project aims to develop a system for detecting such fake accounts using machine
learning techniques. The system leverages user profile features and
behavioral patterns to classify accounts as real or fake.
2. Objective
The primary objective of this project is to build a reliable fake account detection system that
can identify suspicious users on social media platforms.
The goal includes data preprocessing, feature selection, model building, evaluation, and
result visualization to support decision-making.
3. System Design
Security and reliability are built into the system through access control
mechanisms, encryption protocols for data in transit and at rest, and
regular model audits. The pipeline is also designed to be fault-tolerant
with retry mechanisms and logging for monitoring.
The admin interacts with the system by uploading the user data, initiating fake account
detection, viewing analytics, and exporting results.
The system performs preprocessing, prediction, and visualization based on the uploaded
dataset.
Data is stored in structured CSV files with fields like User_ID, Username, Follower_Count,
Following_Count, etc. These features are used for model training
and prediction. Data is stored in cloud platforms (e.g., AWS S3, Firebase Storage) for
scalability and accessibility.
Sequence Flow:
1. Admin uploads user data via UI.
2. System performs preprocessing using Pandas.
3. Machine Learning model (Scikit-learn or ANN) is loaded.
4. Predictions are made.
5. Matplotlib is used to generate visual analytics.
6. User downloads/export reports.
Components:
- Frontend: Flask/Django Web UI or Jupyter Notebook
- Backend: Python ML scripts (Pandas, Scikit-learn, ANN)
- Storage: Cloud storage (CSV format)
4. Technologies Used
Preliminary Results:
- Accuracy: 92%
- Precision: 90%
- Recall: 93%
- F1 Score: 91%
6. Challenges and Limitations