Cyber Security
Cyber Security
BY
is a bonafide student of this institute and the work has been carried out by him/her
under the supervision of Prof. A. B. C and it is approved for the partial fulfillment
of the requirement of Savitribai Phule Pune University, for the award of the degree
of Bachelor of Engineering (Computer Engineering).
Principal Name
Principal
Trinity College of Engineering and Research, Pune – 48
Place :- pune
Date:-
Acknowledgments
I would like to take this opportunity to thank my internal guide Prof. Guide Name
for giving me all the help and guidance I needed. I am really grateful to them for
their kind support. Their valuable suggestions were very helpful.
In the end our special thanks to Other Person Name for providing various resources
such as laboratory with all needed software platforms, continuous Internet connec-
tion, for Our Project.
Student Name1
Student Name2
Student Name3
Student Name4
(B.E. Computer Engg.)
Abstract
Cyberspace is one of the most complicated systems ever created by
humanity; many people use cybertechnology resources on a daily
basis, yet the bulk of them have little understanding of it. To use
of social media cannot replace the requirement for security experts
to conduct in-depth analyses of specific sorts of attacks, such as de-
tecting anomalies in network traffic, worms, and port scans, among
other things. Analysing social media data, on the other hand, can
help discover new patterns of cyber threat and security threats in-
cluding data theft, carding, and hijacking. We used machine learn-
ing to predict cyber threat in the proposed system. The best model
is created by training a dataset of Twitter cyber-Threat using the
SVM, NB, DT, RF and ANN algorithms. Used best model for pre-
dicting cyber threats and which categories.
INDEX
1 Introduction 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Literature Survey 3
2.1 Study Of Research Paper . . . . . . . . . . . . . . . . . . . . . . . 4
5
3.5.1 Database Requirements . . . . . . . . . . . . . . . . . . . 18
3.5.2 Software Requirements . . . . . . . . . . . . . . . . . . . 18
3.5.3 Hardware Requirements . . . . . . . . . . . . . . . . . . . 18
3.6 Analysis Models: SDLC Model to be applied . . . . . . . . . . . . 19
3.7 System Implementation Plan . . . . . . . . . . . . . . . . . . . . . 20
4 System Design 21
4.1 system Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Data Flow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3 UML DIAGRAMS . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5 Other Specification 31
5.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Annexure A 35
7 References 39
List of Figures
A.1 P Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
A.2 NP Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
A.3 NP Complete Problem . . . . . . . . . . . . . . . . . . . . . . . . 38
7
CHAPTER 1
INTRODUCTION
1.1 OVERVIEW
Cyberspace is one of the most complicated systems ever created by humanity; many
people use cyber technology resources on a daily basis, yet the bulk of them have
little understanding of it. To use of social media cannot replace the requirement
for security experts to conduct in-depth analyses of specific sorts of attacks, such as
detecting anomalies in network traffic, worms, and port scans, among other things.
Analysing social media data, on the other hand, can help discover new patterns of cy-
ber threat and security threats including data theft, carding, and hijacking. We used
machine learning to predict cyber threat in the proposed system. The best model is
created by training a dataset of Twitter cyber-Threat using the SVM, NB, DT, RF and
ANN algorithms. Used best model for predicting cyber threats and which categories.
1.2 MOTIVATION
To use of social media cannot replace the requirement for security experts to conduct
in-depth analyses of specific sorts of attacks, such as detecting anomalies in network
traffic, worms, and port scans, among other things. Analysing social media data, on
the other hand, can help discover new patterns of cyber threat and security threats
including data theft, carding, and hijacking.
CHAPTER 2
LITERATURE SURVEY
2.1 STUDY OF RESEARCH PAPER
A research paper is a document of a scientific article that contains relevant ex- per-
tise, including substantive observations, and also references to a specific subject of
philosophy and technique. Use-secondary references are reviewed in literature and
no current or initial experimental work is published.
1.Paper Name: Exploring Open Source Information for Cyber Threat Intelligence
Author:Victor Adewopo, Bilal Gonen, Festus Adewopo
Abstract ::- The cyberspace is one of the most complex systems ever built by hu-
mans, the utilization of cyber-technology resources are used ubiquitously by many,
but sparsely understood by the majority of the users. In the past, cyber attacks were
usually orchestrated in a random pattern of attack to lure unsuspecting targets. More
evidence has demonstrated that cyber attack knowledge is shared among individuals
and hacker forums in the virtual ecosystem. This paper proposes using open source
intelligence from the surface web (Twitter) and deep web hacker forums to identify
texts related to cyber threats. Our model can provide cybersecurity experts and law
enforcement agencies reliable information that can be adopted in developing control
and containment strategies for cyberattacks with 82extracted from the deep web and
technical indicators of threats from the surface web. In this paper, we analyzed more
than 10 billion records breached in over 8,000 reported cases between 2005 - 2019 in
the United States that were obtained from the Privacy Rights Clearinghouse (PRC)
Chronology of Data Breaches. Finally, we propose a future research direction on risk
profiling for cyberattacks using geo-spatial techniques. Index Terms—Cyberattack,
Deepweb, Cybersecurity, Cyberthreat
2.Paper Name: :- Using Deep Neural Networks to Translate Multi-lingual Threat
Intelligence
Author:Priyanka Ranade, Sudip Mittal, Anupam Joshi and Karuna Joshi
Abstract : The multilingual nature of the Internet increases complications in the
cybersecurity community’s ongoing efforts to strategically mine threat intelligence
from OSINT data on the web. OSINT sources such as social media, blogs, and dark
web vulnerability markets exist in diverse languages and hinder security analysts,
who are unable to draw conclusions from intelligence in languages they don’t un-
derstand. Although third party translation engines are growing stronger, they are
unsuited for private security environments. First, sensitive intelligence is not a per-
mitted input to third party engines due to privacy and confidentiality policies. In ad-
dition, third party engines produce generalized translations that tend to lack exclusive
cybersecurity terminology. In this paper, we address these issues and describe our
system that enables threat intelligence understanding across unfamiliar languages.
We create a neural network based system that takes in cybersecurity data in a differ-
ent language and outputs the respective English translation. The English translation
can then be understood by an analyst, and can also serve as input to an AI based
cyber-defense system that can take mitigative action. As a proof of concept, we have
created a pipeline which takes Russian threats and generates its corresponding En-
glish, RDF, and vectorized representations. Our network optimizes translations on
specifically, cybersecurity data.
3.Paper Name:Towards the Detection of Inconsistencies in Public Security Vulner-
ability Reports
Author name: Ying Dong1, Wenbo Guo2,4, Yueqi Chen2,4, Xinyu Xing2,4,
Yuqing Zhang1, and Gang Wang3
abstract : Public vulnerability databases such as the Common Vulnerabilities and
Exposures (CVE) and the National Vulnerability Database (NVD) have achieved
great success in promoting vulnerability disclosure and mitigation. While these
databases have accumulated massive data, there is a growing concern for their in-
formation quality and consistency. In this paper, we propose an automated sys-
tem VIEM to detect inconsistent information between the fully standardized NVD
database and the unstructured CVE descriptions and their referenced vulnerability
reports. VIEM allows us, for the first time, to quantify the information consistency
at a massive scale, and provides the needed tool for the community to keep the
CVE/NVD databases up-to-date. VIEM is developed to extract vulnerable software
names and vulnerable versions from unstructured text. We introduce customized
designs to deep-learning-based named entity recognition (NER) and relation extrac-
tion (RE) so that VIEM can recognize previous unseen software names and versions
based on sentence structure and contexts. Ground-truth evaluation shows the system
is highly accurate (0.941 precision and 0.993 recall). Using VIEM, we examine the
information consistency using a large dataset of 78,296 CVE IDs and 70,569 vulner-
ability reports in the past 20 years. Our result suggests that inconsistent vulnerable
software versions are highly prevalent. Only 59.82of the vulnerability reports/CVE
summaries strictly match the standardized NVD entries, and the inconsistency level
increases over time. Case studies confirm the erroneous information of NVD that
either overclaims or underclaims the vulnerable software versions.
4.Paper Name: Social Media Data Mining for Proactive Cyber Defense
SOFTWARE REQUIREMENTS
SPECIFICATION
3.1 INTRODUCTION
Discussion threads in darknet market places such as hacker forums are constantly
exchanging knowledge base and learning from each other. These threads in online
forums contain data that can assist in the discovery of cyber threat intelligence. pro-
posed a lightweight framework to predict cyber threat from darknet data. Scrappy
crawler was used with Polipo and Vidalia proxies to parse the contents of darknet
marketplaces for filtering out newly discovered terms related to cybersecurity. Their
model was able to predict 145 existing threats, 35 new threats, and newly developed
hacking tools in dark net marketplace.
In this system, the user must first login and register. If registration is successful,
the user can use the SVM algorithm to determine whether the cyber thread or not.
machine learning ,Artificial Intelligence domains.
In order to find a solution which can be used as a part of the Reg SOC system, it is
necessary to allow its integration with other modules. Research on anomaly-based
intrusion detection systems is the most often carried out on the preexisting data sets
or in laboratory environments in which simplification concerning infrastructure, data
collection or services have been applied. Due to legal and technical limitations, our
solution will detect threats through the analysis of Net Flow data and headers from
network protocols. In addition, in the real environment it is not possible to obtain
labeled teaching and validation datasets, which forces the introduction of adaptation
mechanisms already at the deployment stage. In our approach we will try to prepare
a suitably scaled model on the basis of the available datasets and then to adjust it
in the following steps to the existing network. The models prepared and tuned this
way will later become reference models during the implementation of the anomaly
detection module in the subsequent networks.
3.3 EXTERNAL INTERFACE REQUIREMENTS
When interacting with user interfaces, do users always get what they expect? For
each user interface element in thousands of Desktop App, we extracted the desktop
application they invoke as well as the text shown on their screen. This association
allows us to detect outliers: User interface elements whose text, context or icon sug-
gests one action, but which actually are tied to other actions.
mob detection is the process of finding outliers in a given dataset. Outliers are the
data objects that stand out amongst other objects in the dataset and do not conform
to the normal behavior in a dataset. Anomaly detection is a data science application
that combines multiple data science tasks like classification, regression, and cluster-
ing. The target variable to be predicted is whether a transaction is an outlier or not.
Since clustering tasks identify outliers as a cluster, distance-based and density-based
clustering techniques can be used in anomaly detection tasks.
• Maintainability: After the deployment of the project if any error occurs then
it can be easily maintained by the software developer.
• User Friendliness: Since, the software is a GUI application; the output gener-
ated is much user friendly in its behavior.
The Database Requirements involves the use of a lot of information, some which will
be needed several times and the most appropriate form of storage of this data is in a
database. This will allow data to be saved from input to the Database Requirements
and retrieved to be used by the Database Requirements.
As an important aspect of this project is use of Time Control System. In this section
several databases are reviewed for their suitability to this project.
RAM : 8 GB
Processor : Intel i5 Processor
IDE : Spyder
Coding Language : Python Version 3.8
Operating System : Windows 10
SDLC Models stands for Software Development Life Cycle Models. In this article,
we explore the most widely used SDLC methodologies such as Agile ... Each soft-
ware development life cycle model starts with the analysis, in which the Also, here
are defined the technologies used in the project, team load.
One of the basic notions of the software development process is SDLC models which
stands for Software Development Life Cycle models. SDLC – is a continuous pro-
cess, which starts from the moment, when it’s made a decision to launch the project,
and it ends at the moment of its full remove from the exploitation. There is no one
single SDLC model. They are divided into main groups, each with its features and
weaknesses.
The System Implementation plan table, shows the overall schedule of tasks compi-
lation and time duration required for each task.
SYSTEM DESIGN
4.1 SYSTEM ARCHITECTURE
In Data Flow Diagram,we Show that flow of data in our system in DFD0 we show
that base DFD in which rectangle present input as well as output and circle show
our system,In DFD1 we show actual input and actual output of system input of our
system is text or image and output is rumor detected like wise in DFD 2 we present
operation of user as well as admin.
Figure 4.2: Data Flow diagram
Class Diagram.
Activity Diagram.
Sequence diagram
Figure 4.5: Use case Diagram Diagram
Figure 4.6: Activity Diagram
Figure 4.7: Class Diagram
Figure 4.8: sequence Diagram
CHAPTER 5
OTHER SPECIFICATION
5.1 ADVANTAGES
easy to handle
5.2 LIMITATIONS
If the training not get successful or get interrupt because of any reason then system
can not work proper.
If the accuracy of training less then system can not work properly
5.3 APPLICATIONS
In Company
In Office
In Banking
All User
CHAPTER 6
What is P?
P : To identify road condition or road survey requires more man power, time
and money. To resolve these problems we need effective system.
What is NP?
NP means we can solve it in polynomial time if we can break the normal rules
of step-by-step computing.
What is NP Hard?
A problem is NP-hard if an algorithm for solving it can be translated into one
for solving any NP-problem (nondeterministic polynomial time) problem. NP-hard
therefore means ”at least as hard as any NP-problem,” although it might, in fact, be
harder.
NP-Hard:
Propose system analyze the road condition and road surface. It identify bad road
patches and gives notification to navigation system. For that we used inbuilt ac-
celerometer sensor and gyroscope sensor. To improve the system result we use deci-
sion tree algorithm. Propose system has self-managing database which collect data
from vehicle drivers android smart phones. This data update in real time periodically.
Application utilizes this data to inform other application users about road condition.
So here in this case the ‘P’ problem is NP hard.
i.e. P=NP-Hard
What is NP-Complete?
• Since this amazing ”N” computer can also do anything a normal computer can,
we know that ”P” problems are also in ”NP”.
• So, the easy problems are in ”P” (and ”NP”), but the really hard ones are
*only* in ”NP”, and they are called ”NP-complete”.
• It is like saying there are things that People can do (”P”), there are things that
Super People can do (”SP”), and there are things *only* Super People can do
(”SP-complete”).
NP-Complete:
REFERENCES
[1] Wang, S. (2010). Crawling Deep Web using a GA-based set covering algorithm.
[2] Zhou, S., Long, Z., Tan, L., Guo, H. (2018). Automatic identification of
indicators of compromise using neural-based sequence labelling. arXiv preprint
arXiv:1810.10156.
[4] Ninth Annual Cost if Cybercrime Study unlocking The Value of Improved Cy-
bersecurity Protection .The Cost of Cybercrime Contents.
[5] Ranade, P., Mittal, S., Joshi, A., Joshi, K. (2018, November). Using deep neural
networks to translate multi-lingual threat intelligence. In 2018 IEEE International
Conference on Intelligence and Security Informatics (ISI) (pp. 238-243). IEEE.
[6] Dong, Y., Guo, W., Chen, Y., Xing, X., Zhang, Y., Wang, G. (2019). Towards the
detection of inconsistencies in public security vulnerability reports. In 28th USENIX
Security Symposium (USENIX Security 19) (pp. 869-885).
[7] Rodriguez, A., Okamura, K. (2020). Social Media Data Mining for Proactive
Cyber Defense. Journal of Information Processing, 28, 230- 238.