Information Security Project
Information Security Project
You will undertake a project to detect and classify malicious attacks using machine learning or
deep learning techniques. This project will allow you to explore practical applications of security
concepts and develop technical skills in machine learning, dataset handling, and model
implementation.
Project Guidelines
1. Topic Selection:
2. Group Formation:
3. Submission Requirements:
Final Report:
1|Page
The report must include:
Conclusion and Future Work: Summarize findings and propose next steps.
Presentation:
Demonstration:
4. Collaborate with group members to prepare the report, presentation, and demonstration.
5. Submit your work by the deadline and be ready to present during class end.
2|Page
Project Topics
1. Intrusion Detection Systems (IDS)
Objective: Detect and classify network intrusions such as reconnaissance attacks, DoS, and
access attacks.
Datasets:
1. NSL-KDD: Improved version of the KDD’99 dataset; widely used for network intrusion
detection research.
o Link to NSL-KDD dataset
2. CIC-IDS2017: Includes modern network traffic and attack patterns like DDoS, brute
force, and infiltration.
o Link to CIC-IDS2017
3. UNSW-NB15: Focuses on network traffic anomalies.
o Link to UNSW-NB15 dataset
Guidelines:
1. Feature Engineering: Use packet-level data (e.g., protocol, flow duration, byte count).
2. Machine Learning Models: Random Forest, Gradient Boosting, SVM, and any of them..
3. Deep Learning Models: RNN for sequence analysis or CNN for packet image
representation.
4. Tools: Python (sci-kit-learn, TensorFlow, Keras), Wireshark (for traffic analysis).
2. Malware Classification
Objective: Classify malware (e.g., virus, worm, Trojan) based on their characteristics.
Datasets:
1. Malware Bazaar: Offers detailed malware samples in various categories.
o Malware Bazaar
2. Microsoft Malware Classification Challenge Dataset: Contains disassembly and binary
malware files.
o Microsoft Malware Dataset
3. VirusShare: Offers a vast repository of malicious samples (requires permission).
o VirusShare
Guidelines:
3|Page
1. Data Preprocessing: Convert malware binaries into grayscale images (for CNNs) or
extract disassembly files.
2. Model:
o CNN for image classification (malware images).
o MLP (Multilayer Perceptron) for feature-based classification.
3. Tools: Python libraries (Pandas, NumPy), Malware analysis tools (IDA Pro, Ghidra).
3. Denial-of-Service (DoS) and Distributed DoS (DDoS) Detection
Objective: Identify and mitigate network flooding attacks using packet analysis.
Datasets:
1. CIC-DDoS2019: Contains network traffic related to DDoS attacks.
o Link to CIC-DDoS2019
2. CAIDA Dataset: Includes data for DDoS attack traffic.
o CAIDA Traffic
3. UNSW-NB15: For broader attack types, including DoS.
o UNSW-NB15
Guidelines:
1. Feature Engineering: Focus on packet intervals, payload size, and anomalous traffic
patterns.
2. Model:
o ML: Decision Trees, Naive Bayes for anomaly detection.
o DL: LSTM for time-series analysis.
3. Tools: Wireshark for traffic visualization, Python for ML/DL.
4. Ransomware Detection
Objective: Detect ransomware based on behavioral or signature analysis.
Datasets:
1. EFS Ransomware Dataset: Real ransomware behavior data.
o Ransomware Dataset
2. Kaggle's Malware Family Dataset: Includes ransomware families.
o Malware Families
4|Page
Guidelines:
1. Behavioral Analysis: Focus on file access patterns and encryption detection.
2. Model: Decision Trees or Deep Belief Networks (DBN).
3. Tools: Sysinternals Suite for monitoring ransomware activities.
5. Phishing Website Detection
Objective: Classify websites as phishing or legitimate based on URL, metadata, and content
features.
Datasets:
1. PhishTank: An active database of phishing websites.
o PhishTank Dataset
2. UCI Phishing Dataset: Includes features like URL length, domain age, and HTTPS
usage.
o UCI Phishing Dataset
Guidelines:
1. Features: URL-based (length, special characters) and domain-based (age, DNS record).
2. Model: SVM for feature-based classification, BERT for NLP-based analysis of content.
3. Tools: Python, Flask (for developing a tool).
6. Social Engineering Detection (Phishing Emails and Social Media)
Objective: Detect phishing emails or malicious messages in social engineering attacks.
Datasets:
1. Enron Email Dataset: Includes labeled spam and legitimate emails.
o Enron Dataset
2. Phishing Email Dataset: Contains phishing emails for analysis.
o Phishing Emails
Guidelines:
1. Preprocessing: Tokenize emails, remove stop words, and use TF-IDF for vectorization.
2. Model:
o ML: Naive Bayes for text classification.
o DL: LSTM for sequence analysis.
5|Page
3. Tools: Python (NLTK, scikit-learn), TensorFlow/Keras.
7. Spyware Detection
Objective: Identify spyware based on system behavior (e.g., unauthorized file access).
Datasets:
1. Spyware Data from Kaggle: Behavioral analysis datasets.
o Kaggle Spyware
2. Android Malware Dataset (Drebin): Includes spyware targeting mobile devices.
o Drebin Dataset
Guidelines:
1. Feature Extraction: Focus on unauthorized access patterns or resource usage spikes.
2. Model:
o Random Forest for feature-based classification.
o RNN for behavior sequence prediction.
3. Tools: Android Studio (for app analysis), Python.
Tools and Techniques for All Projects:
1. Python Libraries: scikit-learn, Pandas, NumPy, TensorFlow, Keras, Matplotlib, Seaborn.
2. Traffic Analysis: Wireshark, Snort, Zeek.
3. Platforms: Google Colab, Jupyter Notebooks, or AWS EC2 (for computational needs).
6|Page
5. Model Training:
o Implement baseline and advanced models.
6. Evaluation:
o Use appropriate metrics like precision, recall, and F1-score.
7. Deployment (Optional):
o Deploy the model using Flask/Django or on cloud services.
7|Page