0% found this document useful (0 votes)
7 views20 pages

115 DM

The microproject report details a system developed by students at Government Polytechnic, Pune, aimed at protecting user data on social networks using a genetic algorithm. The system optimizes privacy settings based on user preferences and data sensitivity, providing a user-friendly interface and visualizations to enhance understanding of data protection. Key features include data sensitivity modeling, privacy optimization, and risk visualization, showcasing the practical application of genetic algorithms in cybersecurity.

Uploaded by

shrutimanval104
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views20 pages

115 DM

The microproject report details a system developed by students at Government Polytechnic, Pune, aimed at protecting user data on social networks using a genetic algorithm. The system optimizes privacy settings based on user preferences and data sensitivity, providing a user-friendly interface and visualizations to enhance understanding of data protection. Key features include data sensitivity modeling, privacy optimization, and risk visualization, showcasing the practical application of genetic algorithms in cybersecurity.

Uploaded by

shrutimanval104
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Government Polytechnic, Pune

(An Autonomous Institute of Government of Maharashtra)

Microproject report on :
Protect User Data on Social Network using Genetic Algorithm.

Submitted By

Sr.No Name Enrollment Number

1 Mane Sayali Santosh 2306113

2 Manval Shruti Sandip 2306114

3 Mare Shriniwas Pralhad 2306115

Under the guidance of

Prof.L.S.Korade
(Academic Year:2024-25)

1
Government Polytechnic, Pune
(An Autonomous Institute of Government of Maharashtra)

Department of Computer Engineering

CERTIFICATE

This is to certify that Mare Shriniwas Pralhad with Enrollment No. 2306115 of

fourth Semester of Diploma in Computer Engineering of Institute, Government

Polytechnic, Pune has successfully completed the Microproject Titled ‘Protect

user data on social network using genetic algorithm.’ in the Course Data
Mining as part of his curriculum in academic year 2024-25.

……….……… ………………….. ……………

(Project Guide) (Head Of Department) (Principal)


Prof.L.S.Korade Mrs. J.R.Hange Dr.R.K.Patil

2
Abstract
This project implements a user data protection system that utilizes a genetic
algorithm to safeguard personal information on social networking platforms.
The system simulates privacy optimization by evaluating different data-sharing
configurations and selecting the most secure setup based on user preferences
and data sensitivity.

Users can input their desired privacy level, and the program intelligently
determines the optimal combination of privacy settings to reduce the risk of
unauthorized access or data leakage. The system categorizes various types of
user data, such as posts, profile information, and friend lists, and assigns them
sensitivity levels to guide the algorithm.

Key features of the system:

• Data Sensitivity Modeling: The system classifies and prioritizes user data
based on its confidentiality level.
• Privacy Optimization Using Genetic Algorithm: It evolves multiple privacy
configurations using genetic operations such as selection, crossover, and
mutation to identify the best solution.
• Visualization: The selected configuration is displayed through bar graphs,
helping users understand how their data is being protected across different
visibility levels.
• User Interaction: Users choose from predefined privacy levels (e.g., High,
Medium, Low), and the system outputs the optimal privacy configuration
accordingly.
This project integrates artificial intelligence, data privacy, and user
interaction to deliver a smart, adaptable system for protecting personal
information online. It demonstrates the practical use of genetic algorithms in
the field of cybersecurity and showcases how intelligent optimization can
enhance data protection strategies on social media platforms.

3
Index

Sr.no. Content Page


no:
1 Introduction 5
2 Objective 6
3 Architecture and design 7
4 Flow Diagram 10

5 Application 11
6 Actual Code and Output 12-14
7 Advantages and 23
Challenges
8 Conclusion 24

4
Introduction
In the digital age, social networking platforms have become an integral part of
everyday life, but they also pose significant privacy risks. With increasing
concerns about data breaches, unauthorized access, and personal information
misuse, ensuring user data protection has become more critical than ever.

This project aims to develop a privacy-enhancing system that leverages a


genetic algorithm to optimize the privacy settings for users on social media
platforms. By simulating different configurations of data-sharing permissions,
the system identifies the most secure setup tailored to individual user
preferences and data sensitivity levels.

The application analyzes various types of personal data—such as profile


information, posts, photos, and friend lists—and categorizes them based on
sensitivity. Users can choose a preferred privacy level, and the system uses a
genetic algorithm to generate the optimal combination of privacy settings.

The results are presented using intuitive bar graphs for a clear comparison of
visibility levels across data types. Additionally, the system provides a risk
score to help users understand the potential exposure associated with their
current settings.
This project simplifies the complex task of managing privacy on social
networks by combining AI-driven optimization with interactive
visualization, ultimately empowering users to make informed decisions about
protecting their digital identitie

5
OBJECTIVE

Enhance User Privacy:


To develop a system that protects sensitive user data on social networking platforms
by identifying and minimizing potential privacy risks.

Privacy Optimization Using Genetic Algorithm:


To implement a genetic algorithm that intelligently evolves and selects the most
secure configuration of privacy settings based on data sensitivity and user
preferences.

Data Categorization:
To classify different types of user data (e.g., personal info, posts, contact list) based
on their sensitivity and assign visibility levels accordingly.

Interactive User Input:


To allow users to select a desired level of privacy (e.g., high, medium, low) and
generate a customized privacy configuration through the genetic algorithm.

Risk Visualization:
To visually present the optimized privacy settings using bar graphs, helping users
understand the visibility of each data type and their associated exposure risks.

Promote Privacy Awareness:


To raise awareness about online privacy by simulating how different privacy
configurations impact data safety on social networks.

Educational Value:
To demonstrate the practical application of artificial intelligence—specifically
genetic algorithms—in the field of cybersecurity and data privacy.

6
System Design and Architecture
System Overview
The system consists of several key components:

• User Interface:
Provides an interface for users to select their desired privacy level (e.g.,
high, medium, low) and view the optimized privacy settings.

• Data Classification Module:


Categorizes different types of user data (e.g., posts, profile info, contact
lists) and assigns a sensitivity level to each type.

• Genetic Algorithm Engine:


Implements selection, crossover, and mutation operations to evolve
optimal privacy configurations that minimize data exposure based on user
preferences.

• Privacy Configuration Evaluator:


Calculates a risk score for each generated configuration, ensuring the best
one is selected based on minimal exposure and user-defined constraints.

• Visualization Module:
Displays the optimized privacy settings using bar graphs to clearly
compare data visibility levels across different categories.

• Feedback and Output Renderer:


Presents the final privacy configuration, risk score, and visual charts
using Matplotlib, helping users understand the trade-offs involved.

7
Flow Diagram

8
Uses of the Project
1. Enhanced Data Privacy: Helps protect users’ personal and sensitive
information on social networks by optimizing privacy settings using
genetic algorithms.

2. Automated Security Optimization: Automatically adjusts privacy


configurations to minimize data exposure, reducing the risk of
unauthorized access and misuse.

3. User-Friendly Protection: Simplifies the process of securing user


profiles without requiring technical expertise, making it accessible to
everyday users.

4. Educational Value: Demonstrates the application of genetic algorithms


in real-world scenarios, serving as a learning tool for students and
researchers in data mining and cybersecurity.

5. Scalability and Future Improvements: Can be extended to include


features such as real-time threat detection, dynamic policy adaptation,
and integration with multiple social platforms.

9
Actual code and Output
app.py:
from flask import Flask, render_template, request, redirect,
url_for, session, flash
import os
from werkzeug.utils import secure_filename
import pandas as pd
from ga_protect import genetic_protect, plot_fitness

app = Flask(__name__)
app.config['UPLOAD_FOLDER'] = 'uploads'
app.config['SECRET_KEY'] = os.urandom(24)
app.config['ALLOWED_EXTENSIONS'] = {'csv'}

# Check if file extension is valid


def allowed_file(filename):
return '.' in filename and filename.rsplit('.', 1)[1].lower()
in app.config['ALLOWED_EXTENSIONS']

@app.route('/')
def home():
return render_template('index.html')

@app.route('/upload', methods=['GET', 'POST'])


def upload():
if request.method == 'POST':
if 'file' not in request.files:
flash('No file part', 'warning')
return redirect(request.url)

file = request.files['file']

if file.filename == '':
flash('No selected file', 'warning')
return redirect(request.url)

if file and allowed_file(file.filename):

10
filename = secure_filename(file.filename)
file.save(os.path.join(app.config['UPLOAD_FOL
DER'], filename))
session['filename'] = filename
flash('File uploaded successfully!', 'success')
return redirect(url_for('protect'))

return render_template('upload.html')

@app.route('/protect', methods=['GET'])
def protect():
if 'filename' not in session:
flash('No file uploaded!', 'warning')
return redirect(url_for('upload'))

filename = session['filename']
filepath =
os.path.join(app.config['UPLOAD_FOLDER'], filename)

# Sensitive columns to protect


sensitive_columns = ['email', 'phone']

# Apply Genetic Algorithm to protect sensitive columns


protected_df, fitness_progress =
genetic_protect(filepath, sensitive_columns)

# Save the protected file


protected_folder = 'protected'
os.makedirs(protected_folder, exist_ok=True)
protected_filename = 'protected_' + filename
protected_path = os.path.join(protected_folder,
protected_filename)
protected_df.to_csv(protected_path, index=False)

# Save the plot of fitness progress


plot_folder = 'static/plots'
os.makedirs(plot_folder, exist_ok=True)
plot_filename = 'fitness_' + filename.replace('.csv', '.png')

11
plot_path = os.path.join(plot_folder, plot_filename)
plot_fitness(fitness_progress, plot_path)

session['protected_filename'] = protected_filename
session['plot_filename'] = plot_filename

flash('Data protection and plotting done!', 'success')


return redirect(url_for('result'))

@app.route('/result')
def result():
if 'protected_filename' not in session:
flash('No protected data found.', 'danger')
return redirect(url_for('home'))

return render_template('result.html',
filename=session['protected_filename'],
plot_filename=session['plot_filename'])

if __name__ == '__main__':
app.run(debug=True)

ga_protect.py:
import pandas as pd
import random
import string
import matplotlib.pyplot as plt

# Preprocess the dataset


def preprocess_data(df):
df.fillna('Unknown', inplace=True)
return df

# Mask sensitive data with random strings


def random_string(length=5):
return ''.join(random.choice(string.ascii_letters) for _ in
range(length))

12
def mask_field(field_value):
return random_string(len(str(field_value)))

# Fitness function (protecting sensitive data)


def fitness(individual, sensitive_columns):
score = 0
for col in sensitive_columns:
score += individual[col].apply(lambda x: isinstance(x,
str) and all(c.isalpha() for c in x)).sum()
return score

# Mutation function (randomly mutate sensitive fields)


def mutate(individual, sensitive_columns,
mutation_rate=0.1):
for col in sensitive_columns:
individual[col] = individual[col].apply(
lambda x: mask_field(x) if random.random() <
mutation_rate else x
)
return individual

# Crossover function (combine two parents to produce


children)
def crossover(parent1, parent2, sensitive_columns):
child = parent1.copy()
for col in sensitive_columns:
child[col] = [random.choice([a, b]) for a, b in
zip(parent1[col], parent2[col])]
return child

# Main function for genetic protection of user data


def genetic_protect(file_path, sensitive_columns,
generations=10, population_size=5):
# Load the dataset
df_original = pd.read_csv(file_path)

# Preprocess the data


df_original = preprocess_data(df_original)

13
# Verify that the sensitive_columns exist in the dataset
missing_columns = [col for col in sensitive_columns if
col not in df_original.columns]
if missing_columns:
raise ValueError(f"The following columns are missing
in the dataset: {', '.join(missing_columns)}")

# Initialize population
population = []
for _ in range(population_size):
individual = df_original.copy()
for col in sensitive_columns:
individual[col] = individual[col].apply(lambda x:
mask_field(x)) # Mask the sensitive fields
population.append(individual)

fitness_progress = []

# Run the genetic algorithm for the specified number of


generations
for gen in range(generations):
fitness_scores = [fitness(ind, sensitive_columns) for
ind in population]
best_score = max(fitness_scores)
fitness_progress.append(best_score)

# Selection: sort population based on fitness scores


sorted_pop = [x for _, x in sorted(zip(fitness_scores,
population), key=lambda pair: pair[0], reverse=True)]
parent1, parent2 = sorted_pop[0], sorted_pop[1]

# New population creation


new_population = []
for _ in range(population_size):
child = crossover(parent1, parent2,
sensitive_columns)
child = mutate(child, sensitive_columns)

14
new_population.append(child)

population = new_population

# Return the best individual and fitness progress


best_individual = population[0]
return best_individual, fitness_progress

# Function to plot the fitness over generations


def plot_fitness(fitness_progress, save_path):
plt.figure(figsize=(8,5))
plt.plot(range(1, len(fitness_progress)+1),
fitness_progress, marker='o')
plt.title('Fitness Improvement Over Generations')
plt.xlabel('Generation')
plt.ylabel('Fitness Score')
plt.grid()
plt.savefig(save_path)
plt.close()

Output:

15
16
17
Advantages
1. Improved Privacy Protection
Genetic algorithms help in finding the best privacy settings, minimizing
the chances of personal data leakage.

2. Adaptive and Intelligent


The algorithm adapts to new threats and user behavior patterns, making
the system more robust and smart over time.

3. Reduces Manual Effort


Users don't have to manually adjust every privacy setting — the system
suggests or applies the most secure configurations automatically.

4. Efficient Data Handling


Handles large volumes of user data efficiently by optimizing privacy
configurations without compromising performance.

5. Customizable Privacy Levels


Offers flexibility to users to choose privacy levels according to their
preferences while still ensuring security.

6. Applicable to Multiple Platforms


The approach can be applied to various social networks, increasing its
usability and relevance.

7. Supports Data Mining and Research


Demonstrates how genetic algorithms can solve real-world problems,
making it valuable for academic and industry research.

18
Challenges
1. Complexity of Genetic Algorithms
Designing and tuning genetic algorithms can be complex, requiring
careful selection of parameters like population size, crossover, and
mutation rates.
2. Data Sensitivity and Privacy
Accessing real user data for testing and validation can raise privacy
concerns and ethical issues.
3. Platform Dependency
Different social networks have varied privacy settings, making it
challenging to create a one-size-fits-all solution.
4. Dynamic Nature of Social Networks
Frequent updates and changes in social media platforms may affect the
algorithm’s effectiveness and require regular updates.
5. User Behavior Diversity
Users have different privacy preferences, which can make it hard to
create a universally optimal privacy model.
6. Computational Cost
Running genetic algorithms, especially on large datasets, can be resource-
intensive and time-consuming.
7. Scalability Issues
Ensuring the algorithm works efficiently across millions of users and data
points can be a significant technical challenge.

19
Conclusion:

The Protecting User Data on Social Networks Using a Genetic


Algorithm project demonstrates the power of adaptive privacy solutions in
the digital age. By employing a genetic algorithm to optimize and evolve
data protection strategies, this system ensures that user information on social
networks remains secure amidst evolving privacy threats. The integration of
data visualization tools like Matplotlib and Folium not only provides
insights into the effectiveness of the protection mechanisms but also
enhances user understanding of how privacy is managed across regions.

20

You might also like