Sat - 48.Pdf - Malicious Attacks Detection Using Machine Learning
Sat - 48.Pdf - Malicious Attacks Detection Using Machine Learning
Bot detection using machine learning (ML), with network flow-level features, has
been extensively studied in the literature. However, existing flow-based
approaches typically incur a high computational overhead and do not completely
capture the network communication patterns, which can expose additional aspects
of malicious hosts. Recently, bot detection systems that leverage communication
graph analysis using ML have gained attention to overcome these limitations.
v
TABLE OF CONTENTS
CHAPTER TITLE PAGE
NO NO
ABSTRACT V
LIST OF FIGURES ix
LIST OF ABBREVIATIONS X
1 INTRODUCTION 1
1.1. OVERVIEW 2
1.2. OBJECTIVE 2
1.3. SCOPE 2
2 LITERATURE SURYEY 3
3 METHODOLOGY 10
3.4.1 K-MEANS 11
3.5 ADVANTAGES 14
vi
TABLE OF CONTENTS
CHAPTER TITLE PAGE
NO NO
3.6 SOFTWARE AND 14
HARDWARE
3.7 SYSTEM STUDY 15
3.7.1 ECONOMICAL 15
FEASIBILITY
3.7.2 ECHNICAL FEASIBILITY 15
3.9.7 COLLABORATION 24
DIAGRAM
3.9.8 COMPONENT DIAGRAM 25
vii
CHAPTER TITLE PAGE
NO NO
3.9.9 DEPLOYMENT DIAGRAM 26
3.10 MODULES 26
3.10.1 ALGORITHM 28
REFERENCE APPENDICES 31
A.SCREENSHOTS 33
B.SOUREC CODE 37
C.PLAGARISM REPORT 39
viii
LIST OF FIGURES
FIGURE NO NAME OF THE FIGURE PAGE NO
1 SYSTEM ARCHITECTURE 16
4 CLASS DIAGRAM 19
5 OBJECT DIAGRAM 20
6 STATE DIAGRAM 21
7 ACTIVITY DIAGRAM 23
8 SEQUENCE DIAGRAM 24
9 COLLABORATION DIAGRAM 25
10 COMPONENT DIAGRAM 25
11 DEPLOYMENT DIAGRAM 26
ix
LIST OF ABBREVIATIONS
PD Pandas
NX Networkx
TTK Tkinter
x
xi
CHAPTER 1
INTRODUCTION
Now a days everyone is storing their information in their systems. Here comes a
problem in providing security to their systems. On other hand cyber-attacks are also
increasing randomly which can hack your personal data like photos, social media and
chats. Bot attacks increased worldwide. There are also some servers getting hacked
which contains data of some lakhs people, where hacking a server is equal to
hacking some lakhs people data.
Botnet is also a type of cyber-attack which is a collection of internet-connected
devices, where these devices are called as bot. By using this bots the attacker can
also hack a big servers. These bots all together called as bot army. Botnet can make
time-consuming tasks easier because of its army. Botnet also perform helpful tasks
people are using it for malicious works. It is also a source of many malicious
activities. The different models of botnet are Client/Server .There are many types in
botnet like centralized, client-server, decentralized and peer-to-peer models and
attacks such as DDoS, phishing, cryptojacking, snooping, bricking, Brute force and
spambots. Common Botnet actions are Email spam, Financial breach, Targeted
intrusions. A bot herder can do a collective of hijacked devices by using remote
commands. Once your machine is infected, it becomes a bot, you may not even
know. Botnet leads to Financial theft, Informational theft, Sabotage of services,
Selling access to other criminals. The 3 main components of botnet are the bots,
Botnet attacks has been increased in the recent years at the same time different
types of Botnet detection frameworks are also increased.
The hacker can access the device only when his application was in the device. Once
his application started running in the device then he can steal, change or destroy
information. The hacker can also steal money, username and passwords. The hacker
can also change your confidential data. Also install and run any application in your
system he want. All the devices which are connected to the internet can be hacked
by the hacker. The more targeted devices like desktop and laptops which runs on
Windows OS or macOS. Mobiles are next target devices as more people are using by
connecting them to the internet. Recent years connecting devices to the internet has
increased rapidly botnets also create from connected devices has become more
noted.
First the hacker will start by injecting the malware infection to your device. some
download links to the target device to hack the device. For example Trojan Horse
(Happy New Year! Click here to see magic). If the owner of the device does not know
about whether the download link is an attacker link and if he click on the link then the
hacker application will get download in the device and sit around wait for command
from the main system (hacker system). Now the hacker can access everything from
his device. In order not to get attacked by hackers he should know all the malware
links, so he can save his device from hacker. To stay away from malware links his
device should able to find the malware links or prevent the initial infection or identify
1
an existing infection. Botnet attacks are hard to detect. Preventing botnet attacks is
more difficult. Yet we can still take certain measures to prevent botnet attacks.
1.1 OVERVIEW
Cyberattacks are on the rise these days. Many systems are getting infected by
attacks to overcome these attacks, In the past, we used signature-based research.
However, as technology developed, attacks became more sophisticated and we used
k-means and decision trees to see how many bots were targeted and how many were
not. If there is an attack, we will find how many bots were attacked or detected and
we will give the number.
1.2 OBJECTIVE
A botnet is a collection of bots, agents in compromised hosts, controlled by
botmasters via command and control (C2) channels. A malevolent adversary controls
the bots through botmaster, which could be distributed across several agents that
reside within or outside the network. Hence, bots can be used for tasks ranging from
distributed denial-of-service (DDoS), to massive-scale spamming, to fraud and
identify theft. While bots thrive for different sinister purposes, they exhibit a similar
behavioral pattern when studied up-close. The intrusion kill-chain dictates the
general phases a malicious agent goes through in-order to reach and infest its target.
1.3 SCOPE
For this phase in BotChase, we evaluate four SL techniques, namely DT, LR, SVM
and FNN. We use DT with Gini instance split rule algorithm, LR without
regularization, and SVM with the Gaussian kernel and a soft margin penalty of 1.
Moreover, NN is configured to use cross entropy as an error function and 10 hidden
layers of 1000 units each. The DT classifier shows the best performance with the
small dataset, as depicted in Table IV. It successfully detects all bots in the test
dataset, with only a single FP out of the 366871 benign hosts. In contrast, all other
classifiers are lackluster and unable to recall even a single bot from the dataset. We
believe this is because all classifiers, except DT, rely on gradient-descent for
errorcorrection. This implies that every single node in the dataset will affect the end-
hypothesis function. Thus, with a dataset that is unbalanced, the hypothesis will be
biased towards the benign hosts, which is the case for LR, SVM and FNN.
2
CHAPTER 2
LITERATURE SURVEY