Mini Project Report
Mini Project Report
Mini Project Report
ii Acknowledgement iv
iii Abstract 1
1 Introduction 2
2 LiteratureSurvey 3
3 MathematicalModelling 7
4 Overviewof project 9
ProblemStatement
ExistingStatement
Proposedsystem
SystemArchitecture
ProposedArchitecture
i
5 SystemDesign 15
UML Diagrams
FlowDiagram
UsecaseDiagram
6 SystemImplementation 18
Intrusion Detection
System
Classesof IDS
MachineLearning
Types of Intrusion
detection
Denial of Service
(DOS)
Data Type
Probing(D>P)
Implementation
&Result analysis
Experimentalsetup
Resultanalysis
Best algorithm
ii
7 coding 32
8 Futurescope 29
9 conclusion 30
ListofFigure
4.1 SystemArchitecture 12
5.1 Flowdiagram 16
iii
6.5 AccuracyResult(a) 25
6.6 AccuracyResult(b) 26
6.7 EvaluationMatrix 27
6.9 ROC,curves,LR,SVM,DT,RF 30
andANN
iv
Abstract
Withthecontinualdevelopmentofnetworktechnology,securityproblemswithin the
network are emerging one after another, and it’s becoming more and harder
toignore.Forthepresentnetworkadministrators,thewaytosuccessfullyprevent
malicious network hackers from invading, in order that network systems and
computersareatSafeandnormaloperationisanurgenttask.Thispaperproposes
anetworkintrusiondetectionmethodsupporteddeeplearning.Thismethoduses big
data with deep confidence neural network to extract features of network
monitoring data, and uses BP neural network as top level classifier to classify
intrusion types with the help of machine learning algorithms. In this research
paper, our main objective is on the IoT NIDS deployment via Machine learning
algorithmsanddeeplearningwhichhavegoodsuccessprobabilityinsecurityand
privacy. This survey provides a comprehensive review of NIDS’s deployment
over different aspects of machine learning techniques for Internet of Things,
likewise other top surveys focusing on the traditional systems. The results show
that the proposed method features a significant improvement over the normal
machine learning accuracy.
Keywords:Intrusion,privacy,security,machine-learning,networking,deep
learning
9
Chapter1
Introduction
10
Chapter2
LiteratureSurvey
The discussed papers in NIDS and NIDSS For IoT System Based on Learning
Techniques are based on the following criteria: The papers deal with intrusion
detection in IoT.
• NIDSstargettheIoTsystemsingeneral(e.g.,notjustWSNnetworks)with their
heterogeneity, mobility and all the IoT specific challenges.
• AuthorspresenttheirNIDSarchitectureindetails.
• The state-of-art articles are mainly from indexed and top IEEE, ACM,
Elsevier and Springer journals, and top conference venues published
between 2013 until October 2018.
11
private information. Solving the problem of attack detection using intrusion
detection against computer networks is being a major problem in the area of
network security. The intrusion detection system meets some challenges, and
there are different approaches to deal with these challenges, neural network and
machine learning is the best approaches to deal with it. In this paper we will
illustratedifferentapproachesofIntrusiondetectionsystemusingneuralnetwork in
briefly, and their advantages and disadvantages.
TherehavebeenseveralsimilarworksdoneinIoTfields.Still,researchersare
workinginthisarea.Phaletal.[1]havemainlydevelopedadetectorandfirewall for an
anomaly
of IoT microservices in IoT site. Clustering methods like K-Means and BIRCH
havebeenimplemented[2]fordifferentmicroservicesinthiswork.Inclustering,
different clusters were grouped in the same if the centre is in the three times of
standarddeviationdistance.Theclusteringmodelhasbeenupdatedusingan
onlinelearningtechnique.Withthealgorithmsimplemented,theoverallaccuracy
obtainedbythesystemis96.3Adetaileddescriptionofasmarthomesystem where
security breaches were detected by deep learning method Dense Random
NeuralNetwork(DRNN)[3]havebeenintroducedin[4].Theyhavemainly
describedDenialofServiceattackandDenialofSleepattackinasimpleIoTsite. Liuetal.
[5]proposedadetectorforOnandOffattack byamaliciousnetwork node in industrial
IoT site. By On and Off attack, they meant that IoT network could be attacked by a
malicious node when it is in an active state or on state.Furthermore, the IoT
network behaves normal when its malicious node is in the
inactiveoroffstate.Thesystemwasdevelopedusingalightproberouting
mechanismwiththecalculationoftrustestimationofeachneighbournodeforthe
detectionofananomaly.Diraetal.discussedthedetectionofattackusingfog-
to-thingsarchitecture.
12
Software and Hardware requirement
Hardwarerequirement:-
• OS:-windows7orAboveoperatingsystem
• RAM:-2GB
• HDD:-500HDD
Softwarerequirement:-
1. Python(Python3):-
Pythonisaninterpreted,object-oriented,high-levelprogramminglanguage
with dynamic semantics.Python’s simple, easy to learn syntax emphasizes
readability and therefore reduces the costof program maintenance. Python
supportsmodulesandpackages,whichencouragesprogrammodularityand
code reuse. Python is often used as a support language for software
developers, for build control and management, testing, and in many other
ways.
2. JUPYTERIDE:-
13
Chapter3
MathematicalModelling
The following metrics were calculated for evaluating the performance of the
developed system. Using these metrics, one can decide which technique is best
suited for this work.
1. Step1:FirstwefindConfusion matrix.Aconfusionmatrixisasummaryof
prediction results on a classification problem. We put True Positive (TP),
False Positive (FP), False Negative (FN) and True Negative (TN) values.
4. Step4:-Nextstepistofindrecall.Therecallisknownastheactualpositive rate
which means the number of positives in the model claims compared to
14
theactualnumberofpositivestherearethroughoutthedata.Therecallvalue for a
single class is given in the following equation:
6. Step5:-NowwehavetofindReceiveroperatingcharacteristiccurve(ROC)
curve. It is a commonly used graph that summarizes the performance of a
classifier over all possible thresholds. It is generated by plotting the True
PositiveRateagainsttheFalsePositiveRateasthevalueofthethresholdis varied
for assigning observations to a given class. The calculation of True Positive
Rate and False Positive Rate are given in the following equations:
15
Chapter4
OverviewofProject
Problem Statement
• While there have been extensive studies of denial of service (DoS) attacks
and DDoS attack mitigation, such attacks remain challenging to mitigate.
For example, DDoS attacks are known to be difficult to detect, particularly
in a IoT Network.
• Hence,creatingaflexiblemodulararchitecturethatallowstheidentification and
mitigation of DDoS attacks in IoT settings can be very crucial in fortifying
cyber security.
Existing Statement
• Many systems are proposed for DoS and DDoS attack(s) detection and
analysis but considering DDoS attack(s) detection only few methodologies
have been implemented providing
efficientresults.
• Most of the system proposed till now does not provide flexible and
structuredapproachintermsofIoTarchitectureandthereisnoindependent
functioning of modules present in the system.
Proposed System
16
• TheNIDSreturnstheresult,theNIDSmodulerunningonthecontrollerwill
processtheflowaccordinglytothemitigationstrategyofthearchitectureif the
flow is determined to be under attack.
• NIDS add an early warning capability to your defenses, alerting you to any
sortofsuspiciousactivitythattypicallyoccursbeforeandthroughanattack
inanIoTnetwork.WecreatedaNIDSwhichwilldetectscanlikeportscan using
SVM. It collects packet from the network for every 4 seconds. Using the
change in frequency for normal packet and
oattack packet we train our SVM with normal and attack packets. So
when an unknown International Journal of Computer Science
Information Technology (IJCSIT), packet is coming SVM can
classify easily whether it is a normal or attack packet. Using this
method, we could detect 95 of attack packets correctly and warning
the administrator about it.
According to the administrator’s decision log file is created and stored for
futurereference.Wecaptureallthepacketswithinthenetworkforanalysis.
Thepackets arecaptured with thehelp oftwopackagesnamely WINPCAP
and JPCAP. WINPCAP interacts with the OS and NIC to capture the
packets. While JPCAP is a java package which gets the captured packets
from the WINPCAP to the java program.
17
SystemArchitecture
Figure4.1:SystemArchitecture
18
ProposedArchitecture
Theoverallframeworkisacombinationofseveralindependentprocesses.Fig.
depictstheoverallframeworkofthesystem.Thefirstprocessofthis framework
is the dataset collection and dataset observation. In this process, the
datasetwascollected andobservedmeticulouslytofindoutthe typesof data.
Besides, data pre-processing was implemented on the dataset.
Data pre-processing consists of cleaning of data, visualization of data, feature
engineering and vectorization steps. These steps converted the data into feature
vectors. These feature vectors were then split into 80–20 ratio into training and
testing set. The training set was used in Learning Algorithm, and a final model
wasdevelopedusinganoptimizationtechnique.Differentclassifiersusedinthis work
employed different optimization techniques. Logistic Regression used
coordinate descent [17]. SVM and ANN used conventional gradient descent
technique.The optimizerisnot used in thecaseof DT and RFbecausethese are
non-parametric models
19
Figure4.2:OverallframeworkforattackandanomalydetectioninIoT.
20
Chapter5
SystemDesign
System design is the process of designing the elements of a system such as the
architecture, modules and components, the different interfaces of those
componentsandthedatathatgoesthroughthatsystem.ThepurposeoftheSystem
Design process is to provide sufficient detailed data and information about the
system and its system elements to enable the implementation consistent with
architectural entities as defined in models and views of the system architecture.
Systemdesign is the phasethatbridges thegap between problem domain and the
existingsysteminamanageableway.Thisphasefocusesonthesolutiondomain, i.e.,
“how to implement?” It is the phase where the SRS document is converted into a
format that can be implemented and decides how the system will operate. In this
phase, the complex activity of system development is divided into several
smaller sub activities, which coordinate with each other to achieve the main
objective of system development.
UML Diagrams
FlowDiagram
Figure5.1:FlowDiagram
21
Use-CaseDiagram
Figure5.2:Use-CaseDiagram
SequenceDiagram
Figure5.3:SequenceDiagram
22
Chapter 6
SystemImplementation
IntrusionDetectionSystem
Generally, IDS includes both software and hardware mechanisms and IDS is
responsible for identifying malicious activities by monitoring network
environmentsandsystems.Inotherwords,IDSisusedfordetectingcyber-attacks
andprovidingimmediatealerts.Overall,IDSactslikeasafeguardtothenetworks and
systems. IDS is normally deployed after the firewall and is used with an
intrusion prevention system IDS is not a new term in the fields of IoT research
regarding security and privacy. A significant number of publications have
appeared in recent years. Cyber security experts have been concerned about the
security and privacy of IoT environments for some time. This has led to the
introduction of the concept of IDS embedding into IoT architectures and devices
todealwithcyber-attacks[10][11]Researchersaremostlyinterestedininventing new
mechanisms and models to counter intruders in conventional network protocols.
However, traditional IDS mechanisms are incompatible with IoT devices
connected through IPv6 and other complex network structures. More
comprehensive research on the use of machine learning methods is essential for
IDS to secure and protect privacy in IoT.
Figure6.1:BlockdiagramofIDS
23
ClassesofIDS
IDSisclassifiedintwomaincategoriesasfollows:
1. Host-basedIDS
2. Network-basedIDS
Figure6.2:BlockdiagramofNIDS
24
MachineLearning
1. Supervisedlearning:-
2. UnsupervisedLearning:-
Unsupervisedlearningistrainingamodelwithoutusinglabels.
3. Semi-supervisedLearning:-
Along with the above mentioned two categories, there is yet another field
calledSemiSupervisedlearning,whichcontainsdatasetswithafewlabelled data
points in addition to predominantly unlabelled data. Machine Learning is
used for Network Intrusion Detection to make the process dynamic as
opposed to the current static detection techniques being used.
25
Figure6.3:BlockdiagramofMachineLearning Methods
of Machine Learning: -
1. Deep Learning
In recent years, the term “deep learning” has gained popularity among
researchers who work with artificial neural network (ANN) based machine
learning techniques. Like ANNs, deep learning.
2. Multi-AgentReinforcementLearning
Reinforcementlearning(RL)methods[14]enablethecomputertofinishthe
taskthroughcontinuoustrainingandlearningfromthestart.Inrecentyears, as
Alpha Go, in which DeepMind developed and has excelled in complex
tasks,
26
TypesofIntrusionDetection
Inthefollowing,wediscussscenariothesystemisunderattack.Agooddetection
system is the one which identifies the compromised situation and minimizes the
lossbyquicklyidentifyingtheattack(s).ThereareavarietyofIDSs.Indetection
methodologies are classified as
DenialofService(DoS):-
the DOS attack is caused by having too many unwanted traffic in a single source
orreceiver.Theattackersendstoomanyambiguouspacketstofloodoutthetarget and
make its services unavailable to other services. In the dataset, 5780 samples are
containing a DoS attack.
Incomputing,adenial-of-serviceattack(DoSattack)isacyber-attackinwhich the
perpetrator seeks to make a machine or network resource unavailable to its
intended users by temporarily or indefinitely disrupting services of a host
connectedtotheInternet.Denialofserviceistypicallyaccomplishedbyflooding the
targeted machine or resource with superfluous requests in an attempt to overload
systems and prevent some or all legitimate requests from
beingfulfilled.
DataTypeProbing(D.P):-
In this case, a malicious node writes different data type than intended data type
[1]. In the dataset, there are 342 samples of Data Type Probing.
In a Denial of Service (DoS) attack, the attacker makes some computing or
memory resource too busy, or too full, to handle legitimate users’ requests. But
beforeanattackerlaunchesanattackonagivensite,theattackertypicallyprobes
thevictim’snetworkorhostbysearchingthesenetworksandhostsforopenports. This
is done using a sweeping process across the different hosts on a network.
27
Implementation
• Collectthetrafficdatafromthe network.
• Analysethecollecteddata.
• Identifyrelevantsecurityevents.
• Detectandreportmaliciousevents
Figure6.4:DataflowdiagramofNIDS
28
FutureScope
IncludingnewerMLanddeeplearningtechniques,withtheaimofimprovingthe
performance for example against other attacks.As the ML algorithm learns and
adapts by using previous attacks data, it will become more and more robust and
accurate in predicting upcoming attacks. With the explosion of IoT, two new
paradigms come out: edge computing and fog computing. Both of them tend to
push intelligence and processing logic employment down near to data sources
(whichmeansascloseaspossibletosensorsandactuators)toreducethenetwork
bandwidthneededtocommunicatedatafromtheperceptionlayertodata-centers
whereanalyticsareusuallyprocessed.Themaindifferencebetweenedgeandfog
architecture lies in the place where the intelligent processing and the computing
power are located. Edge computing pushes them to the extremes of the network
such as edge gateways and devices (e.g. Programmable Automation Controllers
PACs). However, fog computing tends to place them in the local area network
level of the network architecture which means in hubs, routers or gateways (fog
nodes). These two concepts should be deeply explored and exploited for future
IoT IDS architecture. They enable the intrusion detection process to be
distributed. Consequently, this strategy should enable intrusion detection with
less resource needs which is suitable for IoT. Big Data is a solution to remedy
problemsrelatedtothebigvolumeofnetworktrafficgeneratedbyIoTnetworks.
29
Conclusion:-
30
Bibliography
M.B.E.Society’sCollegeofEngineering
Ambajogai
PresentedBy:underTheGuidanceof:
Sakshi BirajdarProf.PanchalMam Durga Naybal
PornimaMaske
SujataDhanasure
NetworkIntrusionDetectionforIoTSecurity
basedonLearning Techniques
2
18/12/2021
33
DOMAININTRODUCTION
Dataminingisthecomputingprocessofdiscoveringpatternsinlargedatasetsinvolving
methodsattheintersectionofmachinelearning,statisticsanddatabase systems.
Theoverallgoalofthedataminingprocess istoextractinformation fromadatasetand
transformitintoanunderstandablestructureforfurtheruse.
Dataminingistheanalysis stepofthe"knowledgediscoveryindatabases"process,orKDD.
Dataminingisaboutfindingnewinformationinalotsofdata.
Theinformationobtainedfromdataminingishopefullybothnewanduseful.
3
18/12/2021
SYNOPSIS
DomainIntroduction
Abstract
Introduction
Objectives
ExistingSystem&Disadvantages
ProposedSystem&Advantages
SystemRequirements
SystemArchitecture
FlowDiagram
Modules
ModuleDescription
4
18/12/2021
34
ABSTRACT
IDSaremainlytwotypes:HostbasedandNetworkbased.
ANetwork basedIntrusionDetection System (NIDS) is usuallyplaced atnetwork pointssuch asa gateway and routers to check for intru
Here C4.5 Decision tree algorithmis used. It is a machine learning algorithmwhich can be used for both classificationand regression c
ThisC4.5Decisiontreeclassificationandpredictionalgorithmwillincrease theperformanceoftheoverall
classificationandpredictionresults.
5
18/12/2021
INTRODUCTION
Adetailedinvestigationandanalysis ofvarious machine learning techniques havebeencarriedout forfindingthecauseofproblemsassociatedwithv
Attack classificationandmappingoftheattackfeaturesisprovidedcorrespondingtoeachattack.
Issueswhicharerelatedtodetectinglow-frequencyattacksusingnetworkattackdatasetarealso
discussedandviablemethodsaresuggestedforimprovement.
Machinelearningtechniqueshavebeenanalyzedandcomparedintermsoftheirdetection
capabilityfordetectingthevariouscategoryofattacks.
6
18/12/2021
35
OBJECTIVE
Toeffectivelyclassifyandpredictthedata.
Todecreasesparsityproblem.
Toenhancetheperformanceoftheoverallpredictionresults.
EXISTINGSYSTEM
HACKING incidents are increasingdaybydayas technologyrolls ou.tAlarge number of hacking incidents are reported by
Theexistingsystemdoesn'teffectivelyclassifyandpredicttheattackwhichispresentedin
thenetwork.
8
18/12/2021
36
DISADVANTAGES
Doesn'tEfficientforhandlinglargevolumeofdata.
TheoriticalLimits
IncorrectClassificationResults.
LessPredictionAccuracy
9
18/12/2021
ADVANTAGES
Highperformance.
Provideaccuratepredictionresults.
Itavoidsparsityproblems.
ReducestheinformationLossandthebiasoftheinferenceduetothemultipleestimates.
11
18/12/2021
37
SYSTEMREQUIREMENTS
SoftwareRequirements
OperatingSystem
Language :Windows8.1
IDE :Python
:Anaconda-Spyder
12
18/12/2021
SYSTEMREQUIREMENTS
HardwareRequirements
HardDisk:1000GB
Monitor:15VGAcolor
Mouse:Microsoft.
Keyboard:110keysenhanced
RAM:4GB
13
18/12/2021
38
SYSTEMARCHITECTURE
TRAIN
FORMATTED
DATASET DATSET CLASSIFICATION
TEST
PREDICTION
14
18/12/2021
FLOWDIAGRAM
START
SELECTDATSET
CLEANINGDATSET
SPLITTRAINANDTEST
CLASSIFICATION
PREDICTION
15
18/12/2021
39
MODULES
DataSelectionandLoading
Data Preprocessing
SplittingDatasetintoTrainandTestData
FeatureExtraction
Classification
Prediction
ResultGeneration
16
18/12/2021
MODULESDESCRIPTION
17
18/12/2021
40
DATASELECTIONANDLOADING
Thedataselectionistheprocessofselectingthedatafordetectingtheattacks.
Inthisproject,theKDDCUPdatasetis usedfordetectingattacks.
Thedatasetwhichcontainstheinformationabouttheduration,flag,service,src_bytes,
dest_bytesandclasslabels.
18
18/12/2021
DATAPREPROCESSING
Datapreprocessing is theprocessofremovingtheunwanted datafromthedataset. Missingdata removal
Encoding Categoricaldata
Missingdataremoval:Inthisprocess,thenullvaluessuchasmissingvaluesareremovedusing
imputerlibrary.
EncodingCategorical data: That categorical data is defined as variableswith a finiteset of label values.That most machine learning a
19
18/12/2021
41
SPLITTINGDATASETINTOTRAINANDTEST
DATA
Datasplitting is theactof partitioning availabledatainto.twoportions,usually for cross-
validatory purposes.
One. portionof the data is used to develop a predictive model. andthe otherto evaluatethe
model'sperformance.
Typically,whenyouseparateadatasetintoatrainingsetandtestingset,mostofthedatais
usedfortraining,andasmallerportionofthedataisusedfortesting.
20
18/12/2021
FEATUREEXTRACTION
• Feature scaling. Feature scaling is a method used to standardize the range of independent
variablesorfeaturesofdata. In data processing,it isalsoknownas data normalizationand is
generally performedduring the data pre-processingstep.
21
18/12/2021
42
CLASSIFICATION
The C4.5 algorithm is used in Data Mining as a Decision Tree Classifier whichcanbeemployedtogene
data.
PREDICTION
23
18/12/2021
43
RESULTGENERATION
TheFinal Result willgetgeneratedbasedonthe overall classificationandprediction.The performance of this proposed approach is eval
TruePositive
TrueNegative
FalsePositive
FalseNegative
Accuracy
Precision
Recall
F1-Score
24
18/12/2021
Conclusion
We reviewed several influential algorithmsfor intrusiondetection based on various machine learning techniques.Characteristics of ML
buildingefficientintrusiondetectionsystems.
25
18/12/2021
44
Futurework
Infuture,itispossibletoprovideextensionsormodificationstotheproposedclusteringand classificationalgorithmsusing intelligentagentstoa
canbeextendedasanintrusionpreventionsystemtoenhancetheperformanceofthe system.
26
18/12/2021
References
R.Abdulhammed,M.Faezipour,A.Abuzneidand A.AbuMallouh,"Deep and machinelearningapproachesforanomaly-basedintrusiondetectiono
G Kaur,"A Novel DistributedMachine LearningFrameworkfor Semi-SupervisedDetectionof Botnet Attacks", 2018Eleventh InternationalCon
X. Fan, C.-H. LungandS. Ajila, "AnAdaptive Diversity-BasedEnsembleMethod for BinaryClassification",Proc. ofthe 41 st IEEE InternationalC
S. Vijayarani, Maria Sylviaa andS, "IntrusionDetectionSystem– a study",InternationalJournalOf SecurityPrivacyand TrustManagement,vol.
MdNasimuzzamanChowdhury,KenFerensandMikeFerens ,"NetworkIntrusionDetectionUsingMachineLearning",
InternationalConferenceofsecurityandmanagement,2016.
27
45
28
18/12/2021
46