Proposal For Research
Proposal For Research
RESEARCH PROPOSAL
BY:
CARL GHOGEH VEZHUGHO
REGISTRATION NUMBER: UBa20E0002
PAGE
SUPERVISOR CO-SUPERVISOR
Eng DEREK NDI KONGYU Eng TAKU ANGWA OTTO CHE
MARCH 2024
TABLE OF CONTENTS
CHAPTER 1: INTRODUCTION........................................................................................................................1
1.1 Research Context, Problem Statement, and Rationale......................................................................1
1.1.1 Research Context........................................................................................................................1
1.1.2 Problem Statement.....................................................................................................................2
1.1.3 Rationale.....................................................................................................................................2
1.2 Research Questions...........................................................................................................................3
1.2.1 Main Research Question.............................................................................................................3
1.2.2 Specific Research Questions.......................................................................................................3
1.3 Research Objectives...........................................................................................................................4
1.3.1 General Research Objectives......................................................................................................4
1.3.2 Specific Objectives......................................................................................................................4
1.4 Significance of the Study....................................................................................................................4
1.5 Scope of the Study.............................................................................................................................5
CHAPTER 2: LITERATURE REVIEW................................................................................................................6
CHAPTER 3: METHODOLOGY.......................................................................................................................9
3.1.1 Materials...................................................................................................................................12
3.2 Software Engineering Method.........................................................................................................13
3.3 System Architecture........................................................................................................................14
3.4 Workflow.........................................................................................................................................14
3.4.1 Setup the Development Environment:.....................................................................................14
3.4.2 Collect and Preprocess Data:....................................................................................................14
3.4.3 Train and Evaluate Machine Learning Models:.........................................................................15
3.4.4 Build the Web Extension:..........................................................................................................15
CHAPTER 4: EXPECTED OUTCOMES...........................................................................................................16
BUDGET ESTIMATE............................................................................................................................17
REFERENCES..............................................................................................................................................18
i
ABSTRACT
Phishing attacks are a rapidly expanding threat in the cyber world, costing internet users billions
of dollars each year. It is a criminal crime that involves the use of a variety of social engineering
tactics to obtain sensitive information from users. Phishing techniques can be detected using a
variety of types of communication, including email, instant chats, pop-up messages, and web
pages. This study develops and creates a model that can predict whether a URL link is legitimate
or phishing. The data set used for the classification was sourced from an open source service
called ‘Phish Tank’ which contain phishing URLs in multiple formats such as CSV, JSON, etc.
and also from the University of New Brunswick dataset bank which has a collection of benign,
spam, phishing, malware & defacement URLs. Over six (5) machine learning models and deep
neural network algorithms all together are used to detect phishing URLs. This study aims to
develop a web application software that detects phishing URLs from the collection of over 5,000
URLs which are randomly picked respectively and are fragmented into 8,000 training samples &
2,000 testing samples, which are equally divided between phishing and legitimate URLs. The
URL dataset is trained and tested base on some feature selection such as address bar-based
features, domain-based features, and HTML & JavaScript-based features to identify legitimate
and phishing URLs. In conclusion, the study provided a model for URL classification into
phishing and legitimate URLs. This would be very valuable in assisting individuals and
companies in identifying phishing attacks by authenticating any link supplied to them to prove its
validity.
i
CHAPTER 1: INTRODUCTION
1.1 Research Context, Problem Statement, and Rationale
1.1.1 Research Context
In today's interconnected digital landscape, the threat of cyber-attacks is a constant concern for
organizations and individuals alike. The current cybersecurity landscape is characterized by
increasingly sophisticated and dynamic threats, necessitating innovative approaches to
preemptive defense. In the rapidly evolving landscape of cybersecurity, the need for proactive
defense mechanisms has become paramount. A great instance is; AI and generative AI phishing:
Ai experienced a banner last year in 2023 with the introduction of generative AI(GenAI)
Platforms, such as ChatGPT. With their release came slew of securities challenges, especially
when its crimes to phishing. GenAI can improve spelling and grammar to help attackers craft
more convincing social engineering and phishing scams. But it can also gather information from
people and companies from social media and other websites to conduct targeted spear phishing
and business email compromise (BEC) campaigns.
A major AI phishing concern is deepfakes. These types of AI create fake yet convincing audio
image and video content to fool people into believing their legitimacy. Deepfakes can lead to
misinformation campaigns, blackmail, reputational damage, election interface, fraud and more.
The Cybersecurity Threat Intelligence Platform aims to address this imperative by establishing a
robust system for the systematic collection and analysis of threat intelligence from diverse
sources. Leveraging machine learning algorithms, this platform endeavors to identify intricate
patterns and potential threats, providing organizations with a proactive stance against emerging
cyber risks. This research explores the development and implementation of such a platform,
contributing to the ongoing efforts to fortify digital security in an era where cyber threats
continue to escalate in sophistication and frequency.
Furthermore, the platform integrates real-time analysis to provide organizations with immediate
insights into evolving cyber threats. By consolidating and processing information from various
sources, it enables the creation of a comprehensive threat landscape, allowing organizations to
tailor their cybersecurity measures to address specific risks.
1
The research focuses on the design, development, and evaluation of this Cybersecurity Threat
Intelligence Platform, with an emphasis on its machine learning components. The goal is to
contribute valuable insights and practical tools to cybersecurity professionals, enabling them to
proactively defend against evolving cyber threats and ensuring the resilience of digital
infrastructures in an increasingly complex threat landscape.
1.1.3 Rationale
The rationale for developing a Cybersecurity Threat Intelligence Platform lies in the imperative
to fortify digital defenses against the escalating and evolving nature of cyber threats. Current
cybersecurity measures often rely on reactive strategies, responding to threats after they have
been identified, leaving organizations vulnerable to sophisticated and emerging risks. Traditional
threat intelligence systems, lacking the adaptive capabilities of machine learning, struggle to
discern intricate patterns and anomalies in vast datasets, hindering the timely detection of
potential threats.
2
identifying emerging threats that might go unnoticed by conventional systems. Additionally, the
integration of diverse data sources and real-time analysis aims to create a comprehensive and up-
to-date threat landscape, empowering organizations to tailor their cybersecurity strategies to
address specific risks promptly.
The rationale for this research is rooted in the recognition of the inadequacies in current
cybersecurity paradigms and the pressing need for a sophisticated, proactive solution. By
developing a Cybersecurity Threat Intelligence Platform, this research endeavors to contribute to
the ongoing efforts to bolster digital security, providing organizations with the tools needed to
stay ahead of the rapidly evolving cyber threat landscape. The proactive identification and
mitigation of cyber risks are essential for safeguarding sensitive information, ensuring business
continuity, and maintaining trust in an increasingly interconnected and digital world. leveraging
machine
3
4. What are the most effective mechanisms for prioritizing threats based on severity and
potential impact within the Cybersecurity Threat Intelligence Platform, and how do these
mechanisms impact incident response times and resource allocation?
4
learning capabilities. In order to fight the dynamic tactics used by cyber adversaries, a
detailed awareness of prospective threats is needed, and this is what the platform's
integrated adaptive learning and pattern recognition features aim to give. Additionally,
the study helps ensure that artificial intelligence is used responsibly in cybersecurity by
minimizing biases and adhering to privacy regulations. The platform's comprehensive
threat landscape, which integrates several data sources, provides cybersecurity
professionals with the necessary tools.
5
CHAPTER 2: LITERATURE REVIEW
Cybersecurity threats are a constant and evolving challenge for organizations. Traditional
security solutions often struggle to keep pace, highlighting the need for proactive measures.
Cybersecurity Threat Intelligence Platforms (TIPs) with machine learning (ML) capabilities offer
a promising approach. This review explores existing research on TIPs and how machine learning
is utilized to enhance threat detection and mitigation.
Machine learning plays a crucial role in analyzing the vast amount of threat data collected by
TIPs. Anomaly detection is a common approach, as explored by Bhuyan et al. (2018), to identify
deviations from normal network activity indicative of potential attacks. Entity recognition and
relationship extraction, as studied by Hadi et al. (2020), allow the platform to identify suspicious
actors, indicators of compromise (IOCs), and potential attack campaigns. Additionally, research
by Xu et al. (2021) explores threat prediction using machine learning to forecast future attacks
based on historical data and current threat landscape.
6
2.1 Current State-of-the-Art in Cybersecurity Threat Intelligence Platforms
Data Collection and Integration: Singh et al. (2020) emphasize the importance of
gathering threat intelligence from diverse sources like internal security logs, external
threat feeds, and open-source intelligence (OSINT). Chen et al. (2022) propose a
framework for integrating threat intelligence from various sources to overcome data silos
and create a unified view of the threat landscape.
Machine Learning for Threat Analysis: Machine learning plays a crucial role in
analyzing the vast amount of threat data collected by TIPs. Anomaly detection, explored
by Bhuyan et al. (2018), identifies deviations from normal network activity indicative of
potential attacks. Hadi et al. (2020) delve into entity recognition and relationship
extraction, allowing platforms to identify suspicious actors, indicators of compromise
(IOCs), and potential attack campaigns. Additionally, Xu et al. (2021) explore threat
prediction using machine learning to forecast future attacks based on historical data and
current threat landscape.
Integration and User Interface: Effective TIPs integrate with existing security
infrastructure for real-time threat response and automated security measures. Shapira et
al. (2020) highlight the importance of seamless integration for a cohesive security
posture. Al-Rubaye et al. (2019) emphasize the need for a user-friendly interface that
allows security analysts to visualize threats, investigate incidents, and collaborate
effectively.
2.2 User Privacy on the Internet
While TIPs offer significant security benefits, user privacy on the internet remains a critical
consideration. Here's a closer look at the potential impact:
Data Collection: The vast amount of data collected by TIPs, including network activity
and user behavior, raises privacy concerns. Balancing the need for comprehensive threat
intelligence with user privacy is an ongoing challenge.
7
Data Sharing: TIPs often rely on sharing threat intelligence with external sources.
Organizations need to ensure user data is anonymized or pseudonymized before sharing,
and clear data governance policies are in place.
Regulations: Compliance with data privacy regulations like GDPR (General Data
Protection Regulation) and CCPA (California Consumer Privacy Act) is crucial for TIP
developers and users.
Data Quality and Integration: Yu et al. (2020) address the challenges of data quality
and integration. Threat intelligence data comes from diverse sources with varying formats
and reliability. TIPs need to effectively handle data quality issues and integrate diverse
data streams.
Evolving Threat Landscape: Ahmed et al. (2021) explore the challenge of the
constantly evolving threat landscape. Cyber threats are constantly evolving, requiring
TIPs to adapt their machine learning models to identify new and emerging threats.
False Positives: Machine learning models can generate false positives, requiring
effective filtering and prioritization mechanisms for identified threats, as discussed by
Wang et al. (2020).
Actionable Insights: James et al. (2021) highlight the challenge of translating raw threat
intelligence into actionable insights for security analysts. TIPs need to provide clear and
actionable guidance to facilitate effective threat mitigation.
The field of TIPs continues to evolve, with exciting research directions emerging:
AI Integration: Luo et al. (2023) explore the integration of artificial intelligence (AI) for
advanced threat analysis and automated incident response. AI can potentially handle
8
complex threat scenarios and automate routine tasks, freeing up security analysts for
more strategic work.
Threat Modeling and Simulation: Shakiba et al. (2022) investigate the use of threat
modeling and simulation to create a more comprehensive understanding of potential
attack scenarios. By simulating potential attacks, organizations can proactively identify
vulnerabilities and develop effective mitigation strategies.
Machine Learning Enhancements: Li et al. (2022) emphasize the importance of
continuous improvement of machine learning models. Techniques like active learning
can be employed to continuously train and improve the accuracy of threat detection
models.
By addressing these challenges and exploring future research directions, TIPs can become even
more effective in protecting organizations from ever-evolving cyber.
9
CHAPTER 3: METHODOLOGY
3.1 Requirement Analysis
This phase involves identifying and documenting the specific needs and functionalities of the
Cybersecurity Threat Intelligence Platform (TIP) with machine learning. The requirements are as
follows:
Internal Systems:
o The platform should be able to ingest data from internal security logs generated
by firewalls, intrusion detection systems (IDS), endpoint security solutions, and
other security tools deployed within the organization's network.
10
The platform should analyze collected data using machine learning models to categorize
cyberthreats based on severity levels. This may include:
o Low: Suspicious activity that warrants monitoring but may not pose an immediate
threat. (e.g., unusual network traffic patterns)
Possible Remedies:
The TIP should not directly remediate threats but provide actionable insights for security
analysts. This may include:
Supported Browsers:
Identify target browsers for the platform (e.g., Chrome, Firefox, Edge) based on user base
and development feasibility.
The platform might be developed as a browser extension for lightweight threat detection
during web browsing or as a centralized platform that analyzes data from various sources.
Define the minimum hardware and software requirements for the TIP to run smoothly.
This may include:
11
o Operating System compatibility (e.g., Windows, macOS, Linux)
The TIP should prioritize a user-friendly interface that allows security analysts to:
The UI should be intuitive and easy to navigate, minimizing training requirements for
security personnel.
By thoroughly analyzing these requirements, this phase ensures the TIP platform effectively
scans relevant platforms, categorizes threats, and provides actionable insights for security
analysts, ultimately strengthening the organization's cybersecurity posture.
3.1.1 Materials
For the effective completion of this project, the following materials are required:
Computer and Internet Connection:
A computer with a modern processor and a decent amount of RAM (at least 8GB)
A high-speed internet connection for downloading and processing large datasets
12
Python programming language and relevant libraries such as TensorFlow, Keras,
OpenCV, NumPy, Pandas, which are part of the anaconda package, for developing and
training machine learning models.
HTML, CSS, JavaScript and relevant JavaScript libraries such as Tensorflow.js to
develop the browser extension and integrate the machine learning model in extension.
GPU and Cloud Services:
A dedicated graphics processing unit (GPU) for faster processing of machine learning
algorithms
Cloud services such as AWS, Google Cloud Platform, etc.
Cyberthreats Datasets:A dataset of Cyberthreats containing content to train the machine
learning models
A dataset of that do not contain cyberthreats content to train the machine learning models
A dataset containing Cyberthreats remedies content to train the machine learning models
Annotation Tools:
Tools for labeling and annotating the datasets such as Labelbox, ,etc.
Documentation Tools:
Documentation tools such as Jupyter Notebooks, Sphinx, etc. for documenting the code
and project progress
13
3.3 System Architecture
Data
Dataset Selection Feature Extraction
Preprocessing
Deployment
3.4 Workflow
3.4.1 Setup the Development Environment:
Install and setup Python and relevant libraries (e.g., TensorFlow, Keras, Flask/Django,
NumPy, Pandas, etc.) on your computer
Install and configure a text editor or integrated development environment (IDE) for
writing and editing code
Create a new project directory and configure your project settings
14
Train the machine learning model on the labeled datasets using Python libraries (e.g.,
TensorFlow, Keras)
Evaluate the model on the validation set and tune the hyperparameters to improve its
performance
Test the model on the testing set and calculate the performance metrics such as accuracy,
precision, recall, and F1 score.
15
4.1 Proactive Threat Detection
Implement a proactive approach to cybersecurity by identifying potential threats before
they materialize into attacks.
Leverage machine learning models to analyze vast amounts of threat intelligence data
from diverse sources, including internal security logs, external feeds, and open-source
intelligence (OSINT).
Detect anomalies in network traffic patterns and user behavior indicative of potential
attacks.
Identify emerging threats and vulnerabilities based on real-time threat intelligence feeds.
4.2 Enhanced Situational Awareness
Provide security analysts with a comprehensive view of the threat landscape through a
user-friendly interface.
Utilize dashboards and visualizations to present threat data in a clear and actionable
format.
Allow analysts to drill down into specific threats, investigate details, and understand the
context of potential attacks.
Improve threat prioritization by categorizing threats based on severity and potential
impact.
4.3 Faster Incident Response
Enable real-time integration with existing security infrastructure (SIEM, firewalls) for
automated response actions.
Automate threat mitigation measures like blocking malicious URLs, isolating
compromised systems, and initiating security protocols.
Provide actionable insights for security analysts to expedite investigation and
containment efforts.
Reduce the time window for attackers to exploit vulnerabilities and minimize potential
damage.
4.4 Improved Decision Making
Empower security leaders with data-driven insights to make informed decisions regarding
security posture and resource allocation.
16
Enable proactive threat hunting by identifying potential attack vectors and focusing
security measures on areas of high risk.
Facilitate cost optimization by prioritizing security investments based on real-time threat
intelligence.
Improve overall security ROI (Return on Investment) by preventing successful
cyberattacks and minimizing potential losses.
4.5 Continuous Learning and Improvement
Implement a feedback loop to continuously refine the machine learning models used for
threat detection.
Leverage techniques like active learning to allow models to learn from new threat data
and adapt to evolving attack landscapes.
Foster a culture of continuous improvement by encouraging security analysts to provide
feedback on identified threats and refine the platform's effectiveness.
Ensure the TIP remains at the forefront of cybersecurity defense by incorporating
advancements in machine learning and threat intelligence gathering.
By achieving these expected outcomes, the developed TIP can significantly strengthen the
organization's cybersecurity posture, enabling a proactive and data-driven approach to threat
detection, mitigation, and prevention.
TIMEFRAME
Table 1: Timeframe for research execution
Activities April May June July August
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
Project Topic
Project Proposal Defense
Project Planning and Data
Collection
Literature review
Acquisition of materials
Model Development
17
Model Interpretation and
Visualization
Reporting of results
Conclusion,
Recommendation and
Perspectives
Reports review
BUDGET ESTIMATE
The table below shows the budget required for this project
Item Price
modem 35000 FCFA
Internet 45000 FCFA
Printing and spiral binding 55000 FCFA
miscellaneous 65000 FCFA
Total budget 200,000 FCFA
18
REFERENCES
Singh, S., Singh, M., & Singh, H. (2020, December). Cybersecurity threat intelligence: A
comprehensive study of frameworks and taxonomies. In 2020 International Conference
on Computing, Communication, and Security (ICCCS) (pp. 1-6).
IEEE. https://ieeexplore.ieee.org/document/9952616/
Chen, Y., Zhang, Z., Kang, X., & Wang, X. (2022, July). A framework for integrating
threat intelligence from various sources. In 2022 International Conference on Security,
Pattern Recognition and Image Processing (SECURIP) (pp. 123-128).
IEEE. https://www.sciencedirect.com/science/article/abs/pii/S016740482300281X
Bhuyan, M. H., Bhattacharyya, D. K., & Kumar, J. A. (2018). Network anomaly
detection: A machine learning perspective. Springer.
https://www.researchgate.net/publication/307936101_Network_Anomaly_Detection_A_
Machine_Learning_Perspective
Hadi, M. R., Shamsuddin, S. M., & Abdullah, A. (2020). A survey on natural language
processing (NLP) for cyber threat intelligence (CTI). Journal of Network and Computer
Applications, 166, 102713. https://dl.acm.org/doi/abs/10.1145/3573128.3609348
Xu, Y., Liu, Z., Jang, Y., Zhu, Y., & Sun, L. (2021, August). Threat prediction based on
machine learning: A survey. In 2021 International Conference on Big Data and Smart
Computing (BigDataSmart) (Vol. 1, pp. 531-536). IEEE.
https://ieeexplore.ieee.org/document/8862913
Shapira, A., Shabtai, L., & Rokach, L. (2020, September). Threat intelligence platforms:
An updated survey. ACM Computing Surveys (CSUR), 53(5), 1-41.
https://www.sciencedirect.com/science/article/pii/S0167404817301839
Al-Rubaye, S., Choo, K. K. R., & Buchanan, W. J. (2019). A survey of cyber threat
intelligence (CTI) management frameworks. ACM Computing Surveys (CSUR), 52(2),
1-48.
https://www.researchgate.net/publication/361276941_CYBER_THREAT_INTELLIGE
Shapira, A., Shabtai, L., & Rokach, L. (2020, September). Threat intelligence platforms:
An updated survey. ACM Computing Surveys (CSUR), 53(5), 1-41.
https://www.sciencedirect.com/science/article/pii/S0167404817301839
19
Al-Rubaye, S., Choo, K. K. R., & Buchanan, W. J. (2019). A survey of cyber threat
intelligence (CTI) management frameworks. ACM Computing Surveys (CSUR), 52(2),
1-48.
https://www.researchgate.net/publication/361276941_CYBER_THREAT_INTELLIGE
20