115 DM
115 DM
Microproject report on :
Protect User Data on Social Network using Genetic Algorithm.
Submitted By
Prof.L.S.Korade
(Academic Year:2024-25)
1
Government Polytechnic, Pune
(An Autonomous Institute of Government of Maharashtra)
CERTIFICATE
This is to certify that Mare Shriniwas Pralhad with Enrollment No. 2306115 of
user data on social network using genetic algorithm.’ in the Course Data
Mining as part of his curriculum in academic year 2024-25.
2
Abstract
This project implements a user data protection system that utilizes a genetic
algorithm to safeguard personal information on social networking platforms.
The system simulates privacy optimization by evaluating different data-sharing
configurations and selecting the most secure setup based on user preferences
and data sensitivity.
Users can input their desired privacy level, and the program intelligently
determines the optimal combination of privacy settings to reduce the risk of
unauthorized access or data leakage. The system categorizes various types of
user data, such as posts, profile information, and friend lists, and assigns them
sensitivity levels to guide the algorithm.
• Data Sensitivity Modeling: The system classifies and prioritizes user data
based on its confidentiality level.
• Privacy Optimization Using Genetic Algorithm: It evolves multiple privacy
configurations using genetic operations such as selection, crossover, and
mutation to identify the best solution.
• Visualization: The selected configuration is displayed through bar graphs,
helping users understand how their data is being protected across different
visibility levels.
• User Interaction: Users choose from predefined privacy levels (e.g., High,
Medium, Low), and the system outputs the optimal privacy configuration
accordingly.
This project integrates artificial intelligence, data privacy, and user
interaction to deliver a smart, adaptable system for protecting personal
information online. It demonstrates the practical use of genetic algorithms in
the field of cybersecurity and showcases how intelligent optimization can
enhance data protection strategies on social media platforms.
3
Index
5 Application 11
6 Actual Code and Output 12-14
7 Advantages and 23
Challenges
8 Conclusion 24
4
Introduction
In the digital age, social networking platforms have become an integral part of
everyday life, but they also pose significant privacy risks. With increasing
concerns about data breaches, unauthorized access, and personal information
misuse, ensuring user data protection has become more critical than ever.
The results are presented using intuitive bar graphs for a clear comparison of
visibility levels across data types. Additionally, the system provides a risk
score to help users understand the potential exposure associated with their
current settings.
This project simplifies the complex task of managing privacy on social
networks by combining AI-driven optimization with interactive
visualization, ultimately empowering users to make informed decisions about
protecting their digital identitie
5
OBJECTIVE
Data Categorization:
To classify different types of user data (e.g., personal info, posts, contact list) based
on their sensitivity and assign visibility levels accordingly.
Risk Visualization:
To visually present the optimized privacy settings using bar graphs, helping users
understand the visibility of each data type and their associated exposure risks.
Educational Value:
To demonstrate the practical application of artificial intelligence—specifically
genetic algorithms—in the field of cybersecurity and data privacy.
6
System Design and Architecture
System Overview
The system consists of several key components:
• User Interface:
Provides an interface for users to select their desired privacy level (e.g.,
high, medium, low) and view the optimized privacy settings.
• Visualization Module:
Displays the optimized privacy settings using bar graphs to clearly
compare data visibility levels across different categories.
7
Flow Diagram
8
Uses of the Project
1. Enhanced Data Privacy: Helps protect users’ personal and sensitive
information on social networks by optimizing privacy settings using
genetic algorithms.
9
Actual code and Output
app.py:
from flask import Flask, render_template, request, redirect,
url_for, session, flash
import os
from werkzeug.utils import secure_filename
import pandas as pd
from ga_protect import genetic_protect, plot_fitness
app = Flask(__name__)
app.config['UPLOAD_FOLDER'] = 'uploads'
app.config['SECRET_KEY'] = os.urandom(24)
app.config['ALLOWED_EXTENSIONS'] = {'csv'}
@app.route('/')
def home():
return render_template('index.html')
file = request.files['file']
if file.filename == '':
flash('No selected file', 'warning')
return redirect(request.url)
10
filename = secure_filename(file.filename)
file.save(os.path.join(app.config['UPLOAD_FOL
DER'], filename))
session['filename'] = filename
flash('File uploaded successfully!', 'success')
return redirect(url_for('protect'))
return render_template('upload.html')
@app.route('/protect', methods=['GET'])
def protect():
if 'filename' not in session:
flash('No file uploaded!', 'warning')
return redirect(url_for('upload'))
filename = session['filename']
filepath =
os.path.join(app.config['UPLOAD_FOLDER'], filename)
11
plot_path = os.path.join(plot_folder, plot_filename)
plot_fitness(fitness_progress, plot_path)
session['protected_filename'] = protected_filename
session['plot_filename'] = plot_filename
@app.route('/result')
def result():
if 'protected_filename' not in session:
flash('No protected data found.', 'danger')
return redirect(url_for('home'))
return render_template('result.html',
filename=session['protected_filename'],
plot_filename=session['plot_filename'])
if __name__ == '__main__':
app.run(debug=True)
ga_protect.py:
import pandas as pd
import random
import string
import matplotlib.pyplot as plt
12
def mask_field(field_value):
return random_string(len(str(field_value)))
13
# Verify that the sensitive_columns exist in the dataset
missing_columns = [col for col in sensitive_columns if
col not in df_original.columns]
if missing_columns:
raise ValueError(f"The following columns are missing
in the dataset: {', '.join(missing_columns)}")
# Initialize population
population = []
for _ in range(population_size):
individual = df_original.copy()
for col in sensitive_columns:
individual[col] = individual[col].apply(lambda x:
mask_field(x)) # Mask the sensitive fields
population.append(individual)
fitness_progress = []
14
new_population.append(child)
population = new_population
Output:
15
16
17
Advantages
1. Improved Privacy Protection
Genetic algorithms help in finding the best privacy settings, minimizing
the chances of personal data leakage.
18
Challenges
1. Complexity of Genetic Algorithms
Designing and tuning genetic algorithms can be complex, requiring
careful selection of parameters like population size, crossover, and
mutation rates.
2. Data Sensitivity and Privacy
Accessing real user data for testing and validation can raise privacy
concerns and ethical issues.
3. Platform Dependency
Different social networks have varied privacy settings, making it
challenging to create a one-size-fits-all solution.
4. Dynamic Nature of Social Networks
Frequent updates and changes in social media platforms may affect the
algorithm’s effectiveness and require regular updates.
5. User Behavior Diversity
Users have different privacy preferences, which can make it hard to
create a universally optimal privacy model.
6. Computational Cost
Running genetic algorithms, especially on large datasets, can be resource-
intensive and time-consuming.
7. Scalability Issues
Ensuring the algorithm works efficiently across millions of users and data
points can be a significant technical challenge.
19
Conclusion:
20