0% found this document useful (0 votes)
28 views47 pages

Cyber Security

The document is a preliminary project report on 'Exploring Open Source Information for Cyber Threat Intelligence' submitted for a Bachelor of Engineering degree. It discusses the use of machine learning algorithms to predict cyber threats by analyzing social media data, specifically Twitter, to identify patterns of cyber attacks. The report includes sections on literature review, system design, software requirements, and future work in the field of cybersecurity.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views47 pages

Cyber Security

The document is a preliminary project report on 'Exploring Open Source Information for Cyber Threat Intelligence' submitted for a Bachelor of Engineering degree. It discusses the use of machine learning algorithms to predict cyber threats by analyzing social media data, specifically Twitter, to identify patterns of cyber attacks. The report includes sections on literature review, system design, software requirements, and future work in the field of cybersecurity.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

A PRELIMINARY PROJECT REPORT ON

Exploring Open Source Information for Cyber Threat Intelligence

SUBMITTED TO THE SAVITRIBAI PHULE PUNE UNIVERSITY, PUNE


IN THE PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE AWARD OF THE DEGREE

BACHELOR OF ENGINEERING (Computer Engineering)

BY

Student Name Exam No:


Student Name Exam No:
Student Name Exam No:
Student Name Exam No:

Under The Guidance of


Prof. Guide Name

DEPARTMENT OF COMPUTER ENGINEERING


JSPM’s RAJARSHI SHAHU COLLEGE OF ENGINEERING
TATHAWADE, PUNE 411033
SAVITRIBAI PHULE PUNE UNIVERSITY
2021 -2022
JSPM’s RAJARSHI SHAHU COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER ENGINEERING
CERTIFICATE
This is to certify that the Project Entitled

Exploring Open Source Information for Cyber Threat Intelligence


Submitted by

Student Name Exam No:


Student Name Exam No:

is a bonafide student of this institute and the work has been carried out by him/her
under the supervision of Prof. A. B. C and it is approved for the partial fulfillment
of the requirement of Savitribai Phule Pune University, for the award of the degree
of Bachelor of Engineering (Computer Engineering).

Prof. Name H.O.D. Name


Guide H.O.D
Dept. of Computer Engg. Dept. of Computer Engg.

Principal Name
Principal
Trinity College of Engineering and Research, Pune – 48

Place :- pune
Date:-
Acknowledgments

Please Write here Acknowledgment.Example given as


It gives us great pleasure in presenting the preliminary project report on ‘Exploring
Open Source Information for Cyber Threat Intelligence’.

I would like to take this opportunity to thank my internal guide Prof. Guide Name
for giving me all the help and guidance I needed. I am really grateful to them for
their kind support. Their valuable suggestions were very helpful.

I am also grateful to Prof. HOD Name, Head of Computer Engineering Department,


CollegeName for his indispensable support, suggestions.

In the end our special thanks to Other Person Name for providing various resources
such as laboratory with all needed software platforms, continuous Internet connec-
tion, for Our Project.

Student Name1
Student Name2
Student Name3
Student Name4
(B.E. Computer Engg.)
Abstract
Cyberspace is one of the most complicated systems ever created by
humanity; many people use cybertechnology resources on a daily
basis, yet the bulk of them have little understanding of it. To use
of social media cannot replace the requirement for security experts
to conduct in-depth analyses of specific sorts of attacks, such as de-
tecting anomalies in network traffic, worms, and port scans, among
other things. Analysing social media data, on the other hand, can
help discover new patterns of cyber threat and security threats in-
cluding data theft, carding, and hijacking. We used machine learn-
ing to predict cyber threat in the proposed system. The best model
is created by training a dataset of Twitter cyber-Threat using the
SVM, NB, DT, RF and ANN algorithms. Used best model for pre-
dicting cyber threats and which categories.
INDEX

1 Introduction 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Literature Survey 3
2.1 Study Of Research Paper . . . . . . . . . . . . . . . . . . . . . . . 4

3 Software Requirements Specification 12


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.1 project scope . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.2 User Classes and characteristics . . . . . . . . . . . . . . . 13
3.1.3 Assumptions and Dependencies . . . . . . . . . . . . . . 13
3.2 Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . 14
3.2.1 System Feature (Functional Requirement) . . . . . . . . . 14
3.3 External Interface Requirements . . . . . . . . . . . . . . . . . . 15
3.3.1 User Interfaces . . . . . . . . . . . . . . . . . . . . . . . 15
3.3.2 Hardware Interfaces and software interfaces . . . . . . . . 15
3.4 Nonfunctional Requirements . . . . . . . . . . . . . . . . . . . . 16
3.4.1 Performance Requirements . . . . . . . . . . . . . . . . . 16
3.4.2 Safety Requirements . . . . . . . . . . . . . . . . . . . . 16
3.4.3 Security Requirements . . . . . . . . . . . . . . . . . . . . 16
3.4.4 Software Quality Attributes . . . . . . . . . . . . . . . . . 17
3.5 System Requirements . . . . . . . . . . . . . . . . . . . . . . . . 18

5
3.5.1 Database Requirements . . . . . . . . . . . . . . . . . . . 18
3.5.2 Software Requirements . . . . . . . . . . . . . . . . . . . 18
3.5.3 Hardware Requirements . . . . . . . . . . . . . . . . . . . 18
3.6 Analysis Models: SDLC Model to be applied . . . . . . . . . . . . 19
3.7 System Implementation Plan . . . . . . . . . . . . . . . . . . . . . 20

4 System Design 21
4.1 system Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Data Flow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3 UML DIAGRAMS . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5 Other Specification 31
5.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

6 Conclusion and Future work 33


6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.2 Future Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Annexure A 35

7 References 39
List of Figures

3.1 waterfall model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.1 system Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 22


4.2 Data Flow diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3 Data Flow diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.4 Data Flow diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.5 Use case Diagram Diagram . . . . . . . . . . . . . . . . . . . . . . 27
4.6 Activity Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.7 Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.8 sequence Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

A.1 P Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
A.2 NP Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
A.3 NP Complete Problem . . . . . . . . . . . . . . . . . . . . . . . . 38

7
CHAPTER 1

INTRODUCTION
1.1 OVERVIEW

Cyberspace is one of the most complicated systems ever created by humanity; many
people use cyber technology resources on a daily basis, yet the bulk of them have
little understanding of it. To use of social media cannot replace the requirement
for security experts to conduct in-depth analyses of specific sorts of attacks, such as
detecting anomalies in network traffic, worms, and port scans, among other things.
Analysing social media data, on the other hand, can help discover new patterns of cy-
ber threat and security threats including data theft, carding, and hijacking. We used
machine learning to predict cyber threat in the proposed system. The best model is
created by training a dataset of Twitter cyber-Threat using the SVM, NB, DT, RF and
ANN algorithms. Used best model for predicting cyber threats and which categories.

1.2 MOTIVATION

• 1. Detect Cyber Threat using machine learning techniques.

• 2. To classify and Train dataset using Different Machine Learning algorithm.


3. To analysing social media data can provide meaningful insights in detecting
new patterns of cyber attack and security threats such as data breach, carding,
and hijacking.

1.3 PROBLEM STATEMENT

To use of social media cannot replace the requirement for security experts to conduct
in-depth analyses of specific sorts of attacks, such as detecting anomalies in network
traffic, worms, and port scans, among other things. Analysing social media data, on
the other hand, can help discover new patterns of cyber threat and security threats
including data theft, carding, and hijacking.
CHAPTER 2

LITERATURE SURVEY
2.1 STUDY OF RESEARCH PAPER

A research paper is a document of a scientific article that contains relevant ex- per-
tise, including substantive observations, and also references to a specific subject of
philosophy and technique. Use-secondary references are reviewed in literature and
no current or initial experimental work is published.

1.Paper Name: Exploring Open Source Information for Cyber Threat Intelligence
Author:Victor Adewopo, Bilal Gonen, Festus Adewopo
Abstract ::- The cyberspace is one of the most complex systems ever built by hu-
mans, the utilization of cyber-technology resources are used ubiquitously by many,
but sparsely understood by the majority of the users. In the past, cyber attacks were
usually orchestrated in a random pattern of attack to lure unsuspecting targets. More
evidence has demonstrated that cyber attack knowledge is shared among individuals
and hacker forums in the virtual ecosystem. This paper proposes using open source
intelligence from the surface web (Twitter) and deep web hacker forums to identify
texts related to cyber threats. Our model can provide cybersecurity experts and law
enforcement agencies reliable information that can be adopted in developing control
and containment strategies for cyberattacks with 82extracted from the deep web and
technical indicators of threats from the surface web. In this paper, we analyzed more
than 10 billion records breached in over 8,000 reported cases between 2005 - 2019 in
the United States that were obtained from the Privacy Rights Clearinghouse (PRC)
Chronology of Data Breaches. Finally, we propose a future research direction on risk
profiling for cyberattacks using geo-spatial techniques. Index Terms—Cyberattack,
Deepweb, Cybersecurity, Cyberthreat
2.Paper Name: :- Using Deep Neural Networks to Translate Multi-lingual Threat
Intelligence
Author:Priyanka Ranade, Sudip Mittal, Anupam Joshi and Karuna Joshi
Abstract : The multilingual nature of the Internet increases complications in the
cybersecurity community’s ongoing efforts to strategically mine threat intelligence
from OSINT data on the web. OSINT sources such as social media, blogs, and dark
web vulnerability markets exist in diverse languages and hinder security analysts,
who are unable to draw conclusions from intelligence in languages they don’t un-
derstand. Although third party translation engines are growing stronger, they are
unsuited for private security environments. First, sensitive intelligence is not a per-
mitted input to third party engines due to privacy and confidentiality policies. In ad-
dition, third party engines produce generalized translations that tend to lack exclusive
cybersecurity terminology. In this paper, we address these issues and describe our
system that enables threat intelligence understanding across unfamiliar languages.
We create a neural network based system that takes in cybersecurity data in a differ-
ent language and outputs the respective English translation. The English translation
can then be understood by an analyst, and can also serve as input to an AI based
cyber-defense system that can take mitigative action. As a proof of concept, we have
created a pipeline which takes Russian threats and generates its corresponding En-
glish, RDF, and vectorized representations. Our network optimizes translations on
specifically, cybersecurity data.
3.Paper Name:Towards the Detection of Inconsistencies in Public Security Vulner-
ability Reports

Author name: Ying Dong1, Wenbo Guo2,4, Yueqi Chen2,4, Xinyu Xing2,4,
Yuqing Zhang1, and Gang Wang3
abstract : Public vulnerability databases such as the Common Vulnerabilities and
Exposures (CVE) and the National Vulnerability Database (NVD) have achieved
great success in promoting vulnerability disclosure and mitigation. While these
databases have accumulated massive data, there is a growing concern for their in-
formation quality and consistency. In this paper, we propose an automated sys-
tem VIEM to detect inconsistent information between the fully standardized NVD
database and the unstructured CVE descriptions and their referenced vulnerability
reports. VIEM allows us, for the first time, to quantify the information consistency
at a massive scale, and provides the needed tool for the community to keep the
CVE/NVD databases up-to-date. VIEM is developed to extract vulnerable software
names and vulnerable versions from unstructured text. We introduce customized
designs to deep-learning-based named entity recognition (NER) and relation extrac-
tion (RE) so that VIEM can recognize previous unseen software names and versions
based on sentence structure and contexts. Ground-truth evaluation shows the system
is highly accurate (0.941 precision and 0.993 recall). Using VIEM, we examine the
information consistency using a large dataset of 78,296 CVE IDs and 70,569 vulner-
ability reports in the past 20 years. Our result suggests that inconsistent vulnerable
software versions are highly prevalent. Only 59.82of the vulnerability reports/CVE
summaries strictly match the standardized NVD entries, and the inconsistency level
increases over time. Case studies confirm the erroneous information of NVD that
either overclaims or underclaims the vulnerable software versions.
4.Paper Name: Social Media Data Mining for Proactive Cyber Defense

Author::- Ariel Rodriguez1,a) Koji Okamura1,b)


abstract : The Internet is constantly evolving, producing many new data sources that
can be used to help us gain insights into the cyber threat landscape and in turn, allow
us to better prepare for cyberattacks. With this in mind, we present an end-to-end
real-time cyber situational awareness system which aims to retrieve security-relevant
information from the social networking site Twitter.com. This system classifies and
aggregates the data extracted and provides real-time cyber situational awareness in-
formation based on sentiment analysis and data analytics techniques. This research
will assist security analysts in rapidly and efficiently evaluating the level of cyber risk
in their organization and allow them to proactively take actions to plan and prepare
for potential attacks before they happen.
5.Paper Name:Exploring the Dark Web for Cyber Threat Intelligence using Ma-
chine Leaning Author: Masashi KADOGUCHI, Shota HAYASHI, Masaki HASHIMOTO,
Akira OTSUKA Abstract:— In recent years, cyber attack techniques are increas-
ingly sophisticated, and blocking the attack is more and more difficult, even if a kind
of counter measure or another is taken. In order for a successful handling of this
situation, it is crucial to have a prediction of cyber attacks, appropriate precautions,
and effective utilization of cyber intelligence that enables these actions. Malicious
hackers share various kinds of information through particular communities such as
the dark web, indicating that a great deal of intelligence exists in cyberspace. This
paper focuses on forums on the dark web and proposes an approach to extract forums
which include important information or intelligence from huge amounts of forums
and identify traits of each forum using methodologies such as machine learning,
natural language processing and so on. This approach will allow us to grasp the
emerging threats in cyberspace and take appropriate measures against malicious ac-
tivities.
6.paper Name: Gathering Cyber Threat Intelligence from Twitter Using Novelty
Classification
Author: Ba-Dung Le, Guanhua Wang, Mehwish Nasim, M. Ali Babar
Abstract: Preventing organizations from Cyber exploits needs timely intelligence
about Cyber vulnerabilities and attacks, referred to as threats. Cyber threat intelli-
gence can be extracted from various sources including social media platforms where
users publish the threat information in real-time. Gathering Cyber threat intelli-
gence from social media sites is a timeconsuming task for security analysts that
can delay timely response to emerging Cyber threats. We propose a framework for
automatically gathering Cyber threat intelligence from Twitter by using a novelty
detection model. Our model learns the features of Cyber threat intelligence from the
threat descriptions published in public repositories such as Common Vulnerabilities
and Exposures (CVE) and classifies a new unseen tweet as either normal or anoma-
lous to Cyber threat intelligence. We evaluate our framework using a purpose-built
data set of tweets from 50 influential Cyber security-related accounts over twelve
months (in 2018). Our classifier achieves the F1-score of 0.643 for classifying Cyber
threat tweets and outperforms several baselines including binary classification mod-
els. Analysis of the classification results suggests that Cyber threat-relevant tweets
on Twitter do not often include the CVE identifier of the related threats. Hence, it
would be valuable to collect these tweets and associate them with the related CVE
identifier for Cyber security applications.
7. Paper Name:Social Media Data Mining for Proactive Cyber Defense
Author name: Ariel Rodriguez1,a) Koji Okamura1 Abstract:The Internet is
constantly evolving, producing many new data sources that can be used to help us
gain insights into the cyber threat landscape and in turn, allow us to better prepare
for cyberattacks. With this in mind, we present an end-to-end real-time cyber situ-
ational awareness system which aims to retrieve security-relevant information from
the social networking site Twitter.com. This system classifies and aggregates the
data extracted and provides real-time cyber situational awareness information based
on sentiment analysis and data analytics techniques. This research will assist se-
curity analysts in rapidly and efficiently evaluating the level of cyber risk in their
organization and allow them to proactively take actions to plan and prepare for po-
tential attacks before they happen.
8.Paper Name: Design of an Ontology based Adaptive Crawler for Hidden Web
Author:
Manvi, Ashutosh Dixit, Komal Kumar Bhatia

Abstract:-Deep Web is content hidden behind HTML forms. Since it repre-


sents a large portion of the structured, unstructured and dynamic data on the Web,
accessing Deep-Web content has been a long challenge for the database community.
This paper describes a crawler for accessing Deep-Web using Ontologies. Perfor-
mance evaluation of the proposed work showed that this new approach has promis-
ing results.
CHAPTER 3

SOFTWARE REQUIREMENTS
SPECIFICATION
3.1 INTRODUCTION

3.1.1 project scope

Discussion threads in darknet market places such as hacker forums are constantly
exchanging knowledge base and learning from each other. These threads in online
forums contain data that can assist in the discovery of cyber threat intelligence. pro-
posed a lightweight framework to predict cyber threat from darknet data. Scrappy
crawler was used with Polipo and Vidalia proxies to parse the contents of darknet
marketplaces for filtering out newly discovered terms related to cybersecurity. Their
model was able to predict 145 existing threats, 35 new threats, and newly developed
hacking tools in dark net marketplace.

3.1.2 User Classes and characteristics

In this system, the user must first login and register. If registration is successful,
the user can use the SVM algorithm to determine whether the cyber thread or not.
machine learning ,Artificial Intelligence domains.

3.1.3 Assumptions and Dependencies

Using Python language


Input as Textual data
Dependencies: Python is commonly used for developing websites and software, task
automation, data analysis, and data visualization. Since it’s relatively easy to learn,
Python has been adopted by many non-programmers such as accountants and scien-
tists, for a variety of everyday tasks, like organizing finances.

Python is a general-purpose programming language, so it can be used for many


things. Python is used for web development, AI, machine learning, operating sys-
tems, mobile application development, and video games.Python is a relatively easy
programming language to learn and follows an organized structure.
Python is a general purpose and high level programming language. You can use
Python for developing desktop GUI applications, websites and web applications.The
simple syntax rules of the programming language further makes it easier for you to
keep the code base readable and application maintainable.

3.2 FUNCTIONAL REQUIREMENTS

3.2.1 System Feature (Functional Requirement)

In order to find a solution which can be used as a part of the Reg SOC system, it is
necessary to allow its integration with other modules. Research on anomaly-based
intrusion detection systems is the most often carried out on the preexisting data sets
or in laboratory environments in which simplification concerning infrastructure, data
collection or services have been applied. Due to legal and technical limitations, our
solution will detect threats through the analysis of Net Flow data and headers from
network protocols. In addition, in the real environment it is not possible to obtain
labeled teaching and validation datasets, which forces the introduction of adaptation
mechanisms already at the deployment stage. In our approach we will try to prepare
a suitably scaled model on the basis of the available datasets and then to adjust it
in the following steps to the existing network. The models prepared and tuned this
way will later become reference models during the implementation of the anomaly
detection module in the subsequent networks.
3.3 EXTERNAL INTERFACE REQUIREMENTS

3.3.1 User Interfaces

When interacting with user interfaces, do users always get what they expect? For
each user interface element in thousands of Desktop App, we extracted the desktop
application they invoke as well as the text shown on their screen. This association
allows us to detect outliers: User interface elements whose text, context or icon sug-
gests one action, but which actually are tied to other actions.

3.3.2 Hardware Interfaces and software interfaces

Malware is a serious threat to network-connected embedded systems, as evidenced


by the continued and rapid growth of such devices, commonly referred to as the In-
ternet of Things. Their ubiquitous use in critical applications require robust protec-
tion to ensure user safety and privacy. That protection must be applied to all system
aspects, extending beyond protecting the network and external interfaces. Anomaly
detection is one of the last lines of defence against malware, in which data-driven ap-
proaches that require the least domain knowledge are popular. However, embedded
systems, particularly edge devices, face several challenges in applying data-driven
anomaly detection, including unpredictability of malware, limited tolerance to long
data collection windows, and limited computing/energy resources. In this article, we
utilize sub component timing information of software execution, including intrinsic
software execution, instruction cache misses, and data cache misses as features, to
detect anomalies based on ranges, multi-dimensional Euclidean distance, and clas-
sification at run time. Detection methods based on lumped timing range are also
evaluated and compared.
3.4 NONFUNCTIONAL REQUIREMENTS

3.4.1 Performance Requirements

In order to meet stringent performance requirements, system administrators must ef-


fectively detect undesirable performance behaviours, identify potential root causes,
and take adequate corrective measures. The problem of uncovering and understand-
ing performance anomalies and their causes (bottlenecks) in different system and
application domains is well studied. In order to assess progress, research trends,
and identify open challenges, we have reviewed major contributions in the area and
present our findings in this survey. Our approach provides an overview of anomaly
detection and bottleneck identification research as it relates to the performance of
computing systems. By identifying fundamental elements of the problem, we are
able to categorize existing solutions based on multiple factors such as the detection
goals, nature of applications and systems, system observability, and detection meth-
ods.

3.4.2 Safety Requirements

a cumbersome task and practically infeasible in many applications. Therefore, an


automated monitoring system is of both fundamental and practical interest.an intel-
ligent solution that uses live camera images to detect workers who breach safety rules
by not wearing high-visibility vests. The proposed solution is formulated in the form
of an anomaly detection algorithm developed in the random finite set (RFS) frame-
work.

3.4.3 Security Requirements

mob detection is the process of finding outliers in a given dataset. Outliers are the
data objects that stand out amongst other objects in the dataset and do not conform
to the normal behavior in a dataset. Anomaly detection is a data science application
that combines multiple data science tasks like classification, regression, and cluster-
ing. The target variable to be predicted is whether a transaction is an outlier or not.
Since clustering tasks identify outliers as a cluster, distance-based and density-based
clustering techniques can be used in anomaly detection tasks.

3.4.4 Software Quality Attributes

software has many quality attribute that are given below:-


• Adaptability: This software is adaptable by all users.

• Availability: This software is freely available to all users. The availability of


the software is easy for everyone.

• Maintainability: After the deployment of the project if any error occurs then
it can be easily maintained by the software developer.

• Reliability: The performance of the software is better which will increase


the reliabilityof the Software.

• User Friendliness: Since, the software is a GUI application; the output gener-
ated is much user friendly in its behavior.

• Integrity: Integrity refers to the extent to which access to software or data by


unauthorized persons can be controlled.

• Security: Users are authenticated using many security phases so reliable


secu- rity is provided.

• Testability: The software will be tested considering all the aspects..


3.5 SYSTEM REQUIREMENTS

3.5.1 Database Requirements

The Database Requirements involves the use of a lot of information, some which will
be needed several times and the most appropriate form of storage of this data is in a
database. This will allow data to be saved from input to the Database Requirements
and retrieved to be used by the Database Requirements.
As an important aspect of this project is use of Time Control System. In this section
several databases are reviewed for their suitability to this project.

3.5.2 Software Requirements

RAM : 8 GB
Processor : Intel i5 Processor
IDE : Spyder
Coding Language : Python Version 3.8
Operating System : Windows 10

3.5.3 Hardware Requirements

Speed : 1.1 GHz


Hard Disk : 40 GB
Key Board : Standard Windows Keyboard
Mouse : Two or Three Button Mouse
Monitor : LCD/LED
3.6 ANALYSIS MODELS: SDLC MODEL TO BE APPLIED

SDLC Models stands for Software Development Life Cycle Models. In this article,
we explore the most widely used SDLC methodologies such as Agile ... Each soft-
ware development life cycle model starts with the analysis, in which the Also, here
are defined the technologies used in the project, team load.
One of the basic notions of the software development process is SDLC models which
stands for Software Development Life Cycle models. SDLC – is a continuous pro-
cess, which starts from the moment, when it’s made a decision to launch the project,
and it ends at the moment of its full remove from the exploitation. There is no one
single SDLC model. They are divided into main groups, each with its features and
weaknesses.

Figure 3.1: waterfall model


3.7 SYSTEM IMPLEMENTATION PLAN

The System Implementation plan table, shows the overall schedule of tasks compi-
lation and time duration required for each task.

Sr. No. Name/Title Start Date End Date


1 Preliminary Survey
2 Introduction and Problem State-
ment
3 Literature Survey
4 Project Statement
5 Software Requirement And Specifi-
cation
6 System Design
7 Partial Report Submission
8 Architecture Design
9 Implementation
10 Deployement
11 Testing
12 Paper Publish
13 Report Submission
CHAPTER 4

SYSTEM DESIGN
4.1 SYSTEM ARCHITECTURE

Figure 4.1: system Architecture


4.2 DATA FLOW DIAGRAM

In Data Flow Diagram,we Show that flow of data in our system in DFD0 we show
that base DFD in which rectangle present input as well as output and circle show
our system,In DFD1 we show actual input and actual output of system input of our
system is text or image and output is rumor detected like wise in DFD 2 we present
operation of user as well as admin.
Figure 4.2: Data Flow diagram

Figure 4.3: Data Flow diagram


Figure 4.4: Data Flow diagram
4.3 UML DIAGRAMS

Unified Modeling Language is a standard language for writing software blueprints.The


UML may be used to visualize,specify,construct and document the artifacts of a soft-
wareintensive system.UML is process independent,although optimally it should be
used in process that is use case driven,architecture-centric,iterative,and incremen-
tal.The Number of UML Diagram is available.

Use case Diagram.

Class Diagram.

Activity Diagram.

Sequence diagram
Figure 4.5: Use case Diagram Diagram
Figure 4.6: Activity Diagram
Figure 4.7: Class Diagram
Figure 4.8: sequence Diagram
CHAPTER 5

OTHER SPECIFICATION
5.1 ADVANTAGES

Improved security from cyberspace to the real world.

Provide more security

easy to handle

5.2 LIMITATIONS

If the training not get successful or get interrupt because of any reason then system
can not work proper.

If the accuracy of training less then system can not work properly

5.3 APPLICATIONS

In Company

In Office

In Banking

All User
CHAPTER 6

CONCLUSION AND FUTURE WORK


6.1 CONCLUSION

In this Proposed System, we present a unique approach of collecting data on kag-


gle website to analyse information about cyber threats and issues an early warn-
ing/detection system. Using only Twitter data for predicting cyber threats. A senti-
ment analysis on hacker forums to predict cyber threats. Machine learning algorithm
used to detect cyber threats.

6.2 FUTURE SCOPE

Our future research direction will focus on:


1) Training the model on a larger data set from the deep web to perform a com-
parative analysis of information embedded in the surface web (Twitter) and darknet
forums using Long short-term memory (LSTM), an artificial recurrent neural net-
work (RNN) architecture
.2) Collecting and annotating more data from surface web deep web which will be
made publicly available.
(3) Finally, we will also use geospatial techniques in profiling risk factors of cy-
berthreat in various regions in the United States to establish correlations between
darkweb and surface-web virtual ecosystem in orchestrating criminal activities by
nefarious netizens.
ANNEXURE A
NP-Hard NP-Complete:

What is P?

• P is set of all decision problems which can be solved in polynomial time by a


deterministic.

• Since it can be solved in polynomial time, it can be verified in polynomial


time.

• Therefore P is a subset of NP.

P : To identify road condition or road survey requires more man power, time
and money. To resolve these problems we need effective system.

Figure A.1: P Problem

What is NP?

NP means we can solve it in polynomial time if we can break the normal rules
of step-by-step computing.

What is NP Hard?
A problem is NP-hard if an algorithm for solving it can be translated into one
for solving any NP-problem (nondeterministic polynomial time) problem. NP-hard
therefore means ”at least as hard as any NP-problem,” although it might, in fact, be
harder.

NP-Hard:
Propose system analyze the road condition and road surface. It identify bad road
patches and gives notification to navigation system. For that we used inbuilt ac-
celerometer sensor and gyroscope sensor. To improve the system result we use deci-
sion tree algorithm. Propose system has self-managing database which collect data
from vehicle drivers android smart phones. This data update in real time periodically.
Application utilizes this data to inform other application users about road condition.
So here in this case the ‘P’ problem is NP hard.
i.e. P=NP-Hard

Figure A.2: NP Problem

What is NP-Complete?

• Since this amazing ”N” computer can also do anything a normal computer can,
we know that ”P” problems are also in ”NP”.

• So, the easy problems are in ”P” (and ”NP”), but the really hard ones are
*only* in ”NP”, and they are called ”NP-complete”.

• It is like saying there are things that People can do (”P”), there are things that
Super People can do (”SP”), and there are things *only* Super People can do
(”SP-complete”).

Figure A.3: NP Complete Problem

NP-Complete:

We have used inbuilt mobile sensor to identify road conditions.


Hence the ‘P’ is NP-Complete in this case.
CHAPTER 7

REFERENCES
[1] Wang, S. (2010). Crawling Deep Web using a GA-based set covering algorithm.

[2] Zhou, S., Long, Z., Tan, L., Guo, H. (2018). Automatic identification of
indicators of compromise using neural-based sequence labelling. arXiv preprint
arXiv:1810.10156.

[3] Guo, M., Wang, J. A. (2009, April). An ontology-based approach to model


common vulnerabilities and exposures in information security. In ASEE Southest
Section Conference.

[4] Ninth Annual Cost if Cybercrime Study unlocking The Value of Improved Cy-
bersecurity Protection .The Cost of Cybercrime Contents.

[5] Ranade, P., Mittal, S., Joshi, A., Joshi, K. (2018, November). Using deep neural
networks to translate multi-lingual threat intelligence. In 2018 IEEE International
Conference on Intelligence and Security Informatics (ISI) (pp. 238-243). IEEE.

[6] Dong, Y., Guo, W., Chen, Y., Xing, X., Zhang, Y., Wang, G. (2019). Towards the
detection of inconsistencies in public security vulnerability reports. In 28th USENIX
Security Symposium (USENIX Security 19) (pp. 869-885).

[7] Rodriguez, A., Okamura, K. (2020). Social Media Data Mining for Proactive
Cyber Defense. Journal of Information Processing, 28, 230- 238.

You might also like