Project Report 02
Project Report 02
MINING USING AI
A PROJECT REPORT
Submitted by
of
BACHELOR OF ENGINEERING
IN
MAY – 2024
2
ANNA UNIVERSITY: CHENNAI 600 025
BONAFIDE CERTIFICATE
Certified that this project report “Cyber Threat Intelligence Mining using AI” is the Bonafide
work of the following students, ANANDHI PRIYA T [723720104004], NITHYA M [723720104038],
SOWNDARYA M [723720104056], who carried out the project work under my supervision.
SIGNATURE SIGNATURE
Mrs. V. RADHA, M.E., Mrs. M. RAMADEVI, M.E.,
We record our indebtedness to our principal Dr.V. Velmurugan Ph.D., for his
guidance and sustained encouragement for the successful of the project.
We extend our heartfelt salutation to our beloved parents and friends who have
always been an integral part in helping us through tough times and all teaching and non-
teaching staff for providing their moral support making herculean success of our
project.
ABSTRACT
REFERENCE
APPENDIX 1
APPENDIX 2
LIST OF FIGURES
TITLE PAGE NO
FIG.NO
3.3.2 RAM 9
3.3.3 Processor 9
3.4.1 Windows 10 10
3.4.2 Python 11
3.4.3 Anaconda 12
AI Artificial Intelligence
CTI Cyber Threat Intelligence
ML Machine Learning
NLP Natural Language Processing
CHAPTER 1
INTRODUCTION
The ever-expanding digital landscape has unfortunately become a breeding ground for
cyber threats. The complexity and volume of cyberattacks are constantly evolving,
making it increasingly challenging for security professionals to keep pace. Traditional
methods of threat detection are often overwhelmed by the sheer amount of data
generated on networks. This project explores the power of Artificial Intelligence (AI)
in Cyber Threat Intelligence (CTI) mining.
2
CHAPTER2
LITERATURE SURVEY
Nan Sun, Ming Ding, Senior Member, IEEE, Jiao Jiao Jiang , Weifang Xu,
Xiaoxing Mo , Yong hang Tai , and Jun Zhang , Senior Member , IEEE Cyber
Threat Intelligence Mining for Proactive Cybersecurity Defense: A Survey and
New Perspectives IEEE Communications Surveys & Tutorials, VOL. 25, NO. 3,
Third Quarter 2023
Today’s cyber-attacks have become more severe and frequent, which calls for a new
line of security defenses to protect against them. The dynamic nature of new-
generation threats, which are evasive, resilient, and complex, makes traditional
security systems based on heuristics and signatures struggle to match. Organizations
aim to gather and share real-time cyber threat information and then turn it into threat
intelligence for preventing attacks or, at the very least, responding quickly in a
proactive manner. Cyber Threat Intelligence (CTI) mining, which uncovers,
processes, and analyzes valuable information about cyber threats, is booming.
However, most organizations today mainly focus on basic use cases, such as
integrating threat data feeds with existing network and firewall systems, intrusion
prevention systems, and Security Information and Event Management systems
(SIEMs), without taking advantage of the insights that such new intelligence can
deliver. In order to make the most of CTI so as to significantly strengthen security
postures, we present a comprehensive review of recent research efforts on CTI
mining from multiple data sources in this article. Specifically, we provide and
devise a taxonomy to summarize the studies on CTI mining based on the intended
purposes (i.e., cybersecurity-related entities and events, cyber-attack tactics,
techniques and procedures, profiles of hackers, indicators of compromise,
vulnerability exploits and malware implementation, and threat hunting), along with
a comprehensive review of the current state-of-the-art. Lastly, we discuss research
challenges and possible future research directions for CTI mining
3
Shivangi Gupta, A. Sai Sabitha, Ritu Punhani Cyber Security Threat
Intelligence using Data Mining Techniques and Artificial Intelligence
International Journal of Recent Technology and Engineering (IJRTE) ISSN:
2277-3878, Volume-8 Issue-3, September 2019
Md Sharon Abu1 , Siti Rahayu Selamat , Aswani Ariffin , Robia Yusof 1,3
Malaysian Computer Emergency Response Team, Cybersecurity Malaysia
2,4Faculty of Information Technology and Communication, University Technical
Malaysia Melaka, Malaysia Cyber Threat Intelligence – Issue and Challenges
4
Indonesian Journal of Electrical Engineering and Computer Science Vol. 10, No.
1, April 2018
Today threat landscape evolving at the rapid rate with much organization continuously
face complex and malicious cyber threats. Cybercriminal equipped by better skill,
organized and well-funded than before. Cyber Threat Intelligence (CTI) has become a
hot topic and being under consideration for many organizations to counter the rise of
cyber-attacks. The aim of this paper is to review the existing research related to CTI.
Through the literature review process, the most basic question of what CTI is examines
by comparing existing definitions to find common ground or disagreements. It is found
that both organization and vendors lack a complete understanding of what information
is considered to be CTI, hence more research is needed in order to define CTI. This
paper also identified current CTI product and services that include threat intelligence
data feeds, threat intelligence standards and tools that being used in CTI. There is an
effort by specific industry to shared only relevance threat intelligence data feeds such
as Financial Services Information Sharing and Analysis Center (FS-ISAC) that
collaborate on critical security threats facing by global financial services sector only.
While research and development center such as MITRE working in developing a
standards format (e.g.; STIX, TAXII, CybOX) for threat intelligence sharing to solve
interoperability issue between threat sharing peers. Based on the review for CTI
definition, standards and tools, this paper identifies four research challenges in cyber
threat intelligence and analyses contemporary work carried out in each. With an
organization flooded with voluminous of threat data, the requirement for qualified
threat data analyst to fully utilize CTI and turn the data into actionable intelligence
become more important than ever. The data quality is not a new issue but with the
growing adoption of CTI, further research in this area is needed.
Syed Rameem Zahra, Mohammad Ahsan Chishti, Asif Iqbal Baba, Fan Wu 2022,
Detecting Covid-19 chaos driven phishing/malicious URL attacks by a fuzzy logic
5
and data mining-based intelligence system, Egyptian Informatics Journal 23 (2),
197-214, 2022.
With confusion and uncertainty ruling the world, 2020 created near-perfect conditions
for cybercriminals. As businesses virtually eliminated in-person experiences, the
COVID-19 pandemic changed the way we live and caused a mass migration to digital
platforms. However, this shift also made people more vulnerable to cyber-crime.
Victims are being targeted by attackers for their credentials or financial rewards, or
both. This is because the Internet itself is inherently difficult to secure, and the
attackers can code in a way that exploits its flaws. Once the attackers gain root access
to the devices, they have complete control and can do whatever they want.
Consequently, taking advantage of highly unprecedented circumstances created by the
Covid-19 event, cybercriminals launched massive phishing, malware, identity theft,
and ransomware attacks. Therefore, if we wish to save people from these frauds in
times when millions have already been tipped into poverty and the rest are trying hard
to sustain, it is imperative to curb these attacks and attackers. This paper analyses the
impact of Covid-19 on various cyber-security related aspects and sketches out the
timeline of Covid-19 themed cyber-attacks launched globally to identify the modus
operandi of the attackers and the impact of attacks. It also offers a thoroughly
researched set of mitigation strategies which can be employed to prevent the attacks in
the first place. Moreover, this manuscript proposes a fuzzy logic and data mining-based
intelligence system for detecting Covid-19 themed malicious URL/phishing attacks.
The performance of the system has been evaluated against various malicious/phishing
URLs, and it was observed that the proposed system is a viable solution to this
problem.
8
CHAPTER 3
SYSTEM SPECIFICATION
The hard disk needs to be compatible with the interface of the computer system
it will be installed in. Common interfaces include SATA (Serial ATA), SAS (Serial
Attached SCSI), and PCIe (Peripheral Component Interconnect Express).
The physical size of the hard disk must match the form factor supported by the
computer chassis or storage enclosure. Common form factors for desktop computers
include 3.5-inch and 2.5-inch drives, while laptops typically use 2.5-inch or smaller
form factors.
The storage capacity of the hard disk should meet the requirements of the intended use.
Hard disks are available in a wide range of capacities, from several hundred gigabytes
to multiple terabytes.
9
3.3.2RAM
Machine learning models utilized in CTI mining, such as neural networks, often
require significant memory resources during both training and inference stages. During
model training, RAM is used to store training data batches, model parameters, and
intermediate computations. In the inference stage, RAM is utilized to load trained
models and process incoming data streams for real-time threat detection and analysis.
3.3.3PROCESSOR
Processors handle the cleaning and normalization of large datasets containing
threat intelligence. This may involve handling inconsistencies, removing duplicates,
and
converting data into a format suitable for AI algorithms.
10
Fig 3.3.3 Processor
Processors perform calculations to extract relevant features from raw data. These
features could be network traffic patterns, malware code characteristics, or attacker
behaviors indicators. Processors power the training of machine learning and deep
learning models. This involves running complex algorithms that learn from vast
amounts of threat intelligence data to identify patterns and anomalies indicative of
cyber threats
3.4SOTWARE REQUIREMENT
11
Fig 3.4.1 Windows 10
Many SIEM systems, which aggregate and analyze security data for threat detection,
are compatible with Windows 10. These SIEM systems can integrate with AI-powered
threat intelligence tools, providing a centralized platform for security professionals.
3.4.2PYTHON
Python boasts a vast collection of libraries like TensorFlow, Pyotr, scikit-learn,
and Kera’s. These libraries provide pre-built functions and tools for building, training,
and deploying machine learning and deep learning models crucial for CTI mining tasks
like anomaly detection and threat classification.
Libraries like Scapi, Nmap, and MISP (Malware Information Sharing Platform)
provide functionalities for network traffic analysis, vulnerability scanning, and threat
12
data integration. These tools seamlessly integrate with Python's AI libraries, creating a
comprehensive environment for CTI mining.
3.4.3ANACONDA
Anaconda eliminates the need for manual installation and configuration of various
Python libraries required for AI and data science in CTI mining. It comes pre-loaded
with essential libraries like TensorFlow, scikit-learn, Pandas, and NumPy, saving
security professionals valuable time and effort.
13
CHAPTER 4
SYSTEM ANALYSIS
4.1.1 Disadvantage
Lack of Comprehensive
14
4.2PROPOSED SYSTEM
15
4.1.2 Advantage
16
4.1.4 System Architecture
17
CHAPTER 5
MODULE DESCRIPTION
1. Data Collection
2. Pre- Processing
3. Feature Extraction
4. Module prediction
5. User-Driven Threat Prediction
The process of gathering diverse and relevant cyber threat data from multiple
sources is fundamental to effective threat detection and analysis. This module involves
systematically retrieving raw data, including logs, network traffic, threat feeds, and
social media content. Key steps include identifying data sources, employing retrieval
mechanisms such as APIs and scraping tools, normalizing and aggregating data,
ensuring quality assurance, and ensuring scalability and resilience. By executing this
module successfully, organizations can build a comprehensive dataset to drive
informed decision-making and strengthen their cybersecurity posture against evolving
threats.
The summary is clear and concise, but it's missing some specific details about the
importance of data collection in cybersecurity. Data collection serves as the
cornerstone of effective cyber threat intelligence, encompassing the systematic
gathering of diverse and relevant data from various sources. This includes logs,
network traffic, threat feeds, and social media platforms. Key steps involve identifying
sources, employing retrieval mechanisms like APIs and scraping tools, normalizing
and aggregating data, ensuring quality assurance, and establishing scalability and
resilience.
18
By executing this module meticulously, organizations can construct a robust dataset
vital for proactive threat detection and analysis. This data-driven approach enables
informed decision-making, empowers timely response to emerging threats, and fortifies
the organization's cybersecurity posture against evolving adversaries. Thus, data
collection lays the foundation for a comprehensive and dynamic cyber defense
strategy.
5.3Pre- Processing
At the forefront of data preparation lies the preprocessing module, a pivotal stage
where raw data undergoes meticulous cleaning, transformation, and organization. This
essential process targets the elimination of noise, the resolution of missing values, the
standardization of formats, and the assurance of data consistency before advancing to
subsequent analysis. Through methods such as data cleaning, missing value handling,
format standardization, and consistency checks, organizations lay a robust foundation
for insightful analysis and informed decision-making in the realm of cybersecurity.
This module serves as the cornerstone for deriving accurate insights and proactively
addressing cyber threats, ensuring that organizations navigate the data landscape with
clarity and precision.
5.4FEATURE EXTRACTION
Feature extraction serves as a vital bridge between raw data and actionable insights
in cyber threat intelligence mining employing AI. It involves transforming diverse data
19
sources into meaningful features that AI algorithms can analyse effectively. Key
aspects include data representation, dimensionality reduction, feature engineering, and
adaptation to both supervised and unsupervised learning approaches. Feature extraction
empowers organizations to identify patterns, anomalies, and indicators of malicious
activity, enhancing the efficiency and accuracy of threat detection systems. It's an
essential preprocessing step that enables AI to effectively combat evolving cyber
threats by extracting relevant information and providing actionable intelligence.
It involves the transformation of raw data into actionable insights by identifying and
extracting relevant features. Through techniques such as data representation,
dimensionality reduction, and feature engineering, organizations can effectively
analyse vast datasets to detect patterns and anomalies indicative of malicious activity.
Feature extraction enables AI algorithms to process and interpret information
efficiently, enhancing the accuracy and efficacy of threat detection systems. By
leveraging advanced AI techniques, organizations can stay ahead of evolving cyber
threats, bolstering their cybersecurity defence’s and safeguarding their digital assets
effectively.
20
systems empower organizations to adapt swiftly to evolving threats, fortify their
defence’s, and maintain a proactive stance against cyber adversaries.
The prescription module in AI-driven cyber threat intelligence mining offers proactive
defense strategies by identifying vulnerabilities, assessing risks, and recommending
tailored responses. Integrated seamlessly into existing cybersecurity frameworks, it
prioritizes threats, allocates resources efficiently, and orchestrates timely responses.
Through automation, it enables swift adaptation to evolving threats, strengthening
defenses, and maintaining a proactive stance against cyber adversaries.
In this paradigm, users become active participants in the threat prediction process,
contributing real-time observations, anecdotal evidence, and contextual information
that may not be captured by automated systems alone. This user-generated data
supplements machine learning algorithms, enriching the predictive models with
qualitative insights and enhancing their ability to identify emerging threats.
21
22
CHAPTER 6
RESULT
The Home Page serves as the central hub for the Cyber Threat Intelligence (CTI)
mining system, providing users with an overview of the system's capabilities,
features, and latest updates. The design is user-friendly, incorporating intuitive
navigation menus, quick access buttons, and informative widgets to guide users
through the platform's functionalities.
23
Fig 6.2 Create User Account
The Create User Account page offers a streamlined and secure registration process
for new users to access the CTI mining system. Users are prompted to provide
essential information, such as their name, email address, and password, which is
encrypted and stored securely.
24
Fig 6.3 Sign in page
The Sign In Page provides existing users with a secure and seamless authentication
process to access their personalized dashboards and threat intelligence reports.
Users are required to enter their registered email address and password, which
undergo encrypted verification to authenticate their identity.
25
Fig:6.4 Phishing prediction page
26
Fig 6.5 SQL injection prediction page
The SQL Injection Prediction Page utilizes predictive analytics to identify and
forecast potential SQL injection attacks targeting organizational databases and web
applications. The page offers users a comprehensive view of SQL injection
vulnerabilities, attack vectors, and associated risk scores across various assets and
infrastructure components. Users can drill down into specific incidents, analyze
attack patterns, and access remediation guidelines to secure their systems proactively.
27
Fig 6.6 Dos attack prediction page
The DoS (Denial of Service) Attack Prediction Page employs machine learning and
anomaly detection techniques to detect and predict potential DoS attacks aimed at
disrupting organizational networks and services
28
Fig 6.7 Ransome ware prediction page
29
CHAPTER 7
CONCLUSION
30
CHAPTER 8
FUTURE ENHANCEMENT
AI algorithms will better grasp the context of cyber threats, enabling more accurate
threat prioritization and tailored mitigation strategies.AI-driven models will predict
emerging threats, empowering organizations to preemptively address vulnerabilities
and defend against evolving attacks. AI will automate incident response processes,
swiftly containing and mitigating cyber-attacks with minimal human intervention AI-
powered platforms will facilitate greater collaboration and information sharing among
security teams, industry peers, and threat intelligence providers. There will be a focus
on developing AI models that provide transparent and interpretable results, fostering
trust in automated threat intelligence systems. Defense mechanisms will evolve to
detect and counter adversarial AI tactics employed by cyber attackers.AI algorithms
will continuously learn from new data and adapt to evolving threats in real-time,
ensuring agility and effectiveness in cyber defense strategies.
31
REFERENCES
[4] M. K. Hussein, N. Bin Zainal and A. N. Jaber, "Data security analysis for
DDoS defense of cloud-based networks," 2015 IEEE Student Conference on
Research and Development (Scored), Kuala Lumpur, 2015, pp. 305-310.
32
[8] Y. Shen, E. Marconi, P. Verviers, and Gianluca Stringham, "Tiresias:
Predicting Security Events Through Deep Learning," In Proc. ACM CCS 18,
Toronto, Canada, 2018, pp. 592-605.
[9] Kyle Soaks and Nicolas Christin, "Automatically detecting vulnerable websites
before they turn malicious,", In Proc. USENIX Security Symposium., San Diego,
CA, USA, 2014, pp.625-640.
[11] Mahmood Lavallee, Ebrahim Bagheri, Wei Lu and Ali A. Ghobadi, "A
detailed analysis of the kid cup 99 data set," In Proc. of the Second IEEE Int. Conf.
Comp. Int. for Sec. and Def. App., pp. 53-58, 2009.
[13] N. Shone, T. N. Ngoc, V. D. Phail and Q. Shi, "A deep learning approach to
network intrusion detection," IEEE Trans. Emerge. Topics Compute. Intel., vol. 2,
pp. 41-50, Feb. 2018
33
[15] W. Hu, W. Hu, S. Maybank, "Ad boost-based algorithm for network
intrusion detection," IEEE Trans. Syst. Man B Cybernet., vol. 38, no. 2, pp. 577-
583, Feb. 2008.
[16] T.-F. Yen et al., "Beehive: Large-scale log analysis for detecting suspicious
activity in enterprise networks", Proc. 29th Annul. Compute. Security Appl.
Conf., New York, NY, USA, 2013, pp. 199- 208.
[25] Z. Li, Z. Qin, K. Huang, X. Yang, and S. Ye, “Intrusion detection using
convolutional neural networks for representation learning,” In Proc. Int. Conf.
Neural Information Springer, 2017, pp. 858–866.
[29] Kene Wu, Zune Chen, Wei Li, "A Novel Intrusion Detection Model for a
Massive Network Using Convolutional Neural Networks", Access IEEE, vol.
35
6, pp. 5085050859, 2018
[30] Taejon Kim, Sang C. Suh, Hyunjoo Kim, Jongmyo Kim and Jingo Kim,
"An Encoding Technique for CNN-based Network Anomaly Detection," In
Proc. IEEE International Conference on Big Data (IEEE Bigdata), Seattle,
WA, USA, Jan. 2019, pp. 2960-2965.
[32] Y. Lacuna, Y. Bagnio, and G. Hinton, “Deep learning,” Nature, vol. 521,
no. 7553, pp. 436-444, 2015.
36
WEB REFERENCE
1. https://attack.mitre.org/
2. https://bigid.com/blog/ai-threat-intelligence/
3. https://www.ibm.com/topics/threat-intelligence
4. https://www.crowdstrike.com/cybersecurity-101/threat-intelligence/
5. https://cloud.google.com/blog/topics/threat-intelligence/ai-five-phases-
intelligence-lifecycle
6. https://www.forbes.com/sites/forbestechcouncil/2023/07/21/how-ai-enabled-
threat-intelligence-is-becoming-our-future/?sh=3056bc49727e
7. https://www.vmware.com/topics/glossary/content/threat-intelligence.html
8. https://ieeexplore.ieee.org/document/8601235
9. https://www.silobreaker.com/glossary/ai-in-threat-intelligence/
10. https://www.checkpoint.com/cyber-hub/cyber-security/what-is-threat-
intelligence/
11. https://www.fortinet.com/resources/cyberglossary/artificial-intelligence-in-
cybersecurity
37
APPENDIX – 1
SOURCE CODE
import tinker
import NumPy as np
import OS
import pandas as pd
main = tinker’s()
global X, Y
global doc
39
global label names
global Train, X test, y train, ytest
global lstm acc, cnn acc, svm acc, knn acc, dt acc, random acc, nb acc
globallstm_precision, cnn_precision,svm_precision,knn_precision
dt_precision,random_precision,nb_precision
globallstm_recall,cnn_recall,svm_recall,knn_recall,dt_acc,do_real
l,nb_recall
global lstm_fm,cnn_fm,svm_fm,knn_fm,dt_fm,random_fm,nb_fm
Y = Y. as type ('int')
doc= []
SCREENSHOTS
41
42
43