0% found this document useful (0 votes)
17 views7 pages

Cyber Security Using ML

Cyber Security Using ML

Uploaded by

sud yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views7 pages

Cyber Security Using ML

Cyber Security Using ML

Uploaded by

sud yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Machine Learning and Cyber Security

Rishabh Das Thomas H. Morris, Ph.D.


Department of Electrical and Computer Engineering Department of Electrical and Computer Engineering
The University of Alabama in Huntsville The University of Alabama in Huntsville
Huntsville, USA Huntsville, USA
[email protected] [email protected]

ABSTRACT Amomani et.al [4] puts forth an extensive survey about all the
The application of machine learning (ML) technique in cyber- major e-mail filtering and ML techniques that can be used to
security is increasing than ever before. Starting from IP traffic classify and recognize phishing emails from normal ones. The
classification, filtering malicious traffic for intrusion detection, state of the art research on such attacks have been enumerated and
ML is the one of the promising answers that can be effective a comparative study of the all those techniques have been
against zero day threats. New research is being done by use of performed. Tedero et. al. [5] presents a statistical, machine
statistical traffic characteristics and ML techniques. This paper is learning and knowledge based approaches for network intrusion
a focused literature survey of machine learning and its application detection. It focusses mainly in the domain of anomaly detection
to cyber analytics for intrusion detection, traffic classification and and not on signature based detection. Filtering or classifying
applications such as email filtering. Based on the relevance and traffic on the fly is an important aspect that has been researched
the number of citation each methods were identified and by a lot of cyber security personnel. Speroto et. al. [3] used
summarized. Because datasets are an important part of the ML NetFlow (Network Flow) data and proved that the packet
approaches some well know datasets are also mentioned. Some processing may not be possible at streaming speed if the amount
recommendations are also provided on when to use a given of network traffic is beyond certain threshold limit. These are
algorithm. An evaluation of four ML algorithms has been significant works that outlines the current works related to ML
performed on MODBUS data collected from a gas pipeline. and its application to the domain of cyber security and will be
Various attacks have been classified using the ML algorithms and helpful to all researchers new to field of ML.
finally the performance of each algorithm have been assessed. The contribution of the paper is to identify cyber security datasets
that can be used by researchers and to point out the algorithms
Keywords that can be applied to cyber specific problems. A set of machine
Machine learning; Data mining; cyber security; learning algorithm have been evaluated in the later part of the
paper on collected ICS dataset to identify various attacks while
1. INTRODUCTION analyzing remote terminal unit (RTU) in a gas pipeline. The data
This papers is a focused literature survey of machine learning and set used has 35 different types of simulated attacks against ICS.
data mining methods for cyber security applications. Few ML The accuracy of each ML algorithm in the segregation of
methods are described along with their application in the field of malicious traffic have been analyzed.
cyber security. A set of comparison criteria for ML method is
provided in the paper and a set of recommendations on the best 2. IMPORTANT CYBER SECURITY
method to use was made depending on the properties of the cyber
security problems. Secondly, a MODBUS data set [1] has been DATASET FOR MACHINE LEARNING
used to compare the effectiveness of five different algorithms Data is of utmost importance in ML approaches. A researcher of
when applied to ICS networks. Receiver operating characteristic machine learning has to understand the data set thoroughly before
(ROC) is often used to choose optimal models and to discard sub- doing any kind of analysis. Secondly, raw data like packet capture
optimal one independently from the cost content or the class (pcap), NetFlow and other network data is not directly usable in
distribution. Hence, a ROC curve has been plotted to assess the the ML analysis. The data has to be pre-processed to make it
performance of the binary classifier used with the data set under usable in popular ML tools like WEKA [6], R [7] and RapidMiner
study. [8]. Hence researchers using ML analysis on custom system has to
understand the data collection methodology and the methods that
This paper is intended for researchers willing to start their work in are used in preprocessing the data. This section will enlist few low
the field of ML and cyber security. Along with the description of level details on the data sets, and some popular tools used in
the machine learning some references to prominent works have capturing the data from the network.
been cited and some valuable examples are put forth how cyber
problems are often tackled by ML. From early 2000 several 2.1 Network Packet Data
prominent surveys on the ML research has already been There are a lot (144 as per Internet Engineering Task Force) of
described. Nguyen et. al. [2] puts forth a comprehensive study of internet protocols that are used by programs running on the user
IP traffic classification technique that does not rely on well- levels. These protocols uses data packets as the main mode of
known port numbers or known packet payloads. Techniques communication. The network traffic in the form of packet
involving ML along with statistical traffic characteristics used in received and transmitted at the interfaces (physical and wireless)
IP classification is reviewed in this paper. Nguyen et. al. reviewed can be captured and stored in the form of packet capture (pcap).
18 paper in this domain and is one of the most valued possession Libpcap and Wincap for UNIX and Windows respectively are
of any researcher starting their research in cyber security and ML very popular network tools. Some tools like wireshark, tcpdump
related domains. can also be used as protocol analyzer, packet sniffer and network
monitor.

Authorized licensed use limited to: Middlesex University. Downloaded on August 31,2020 at 20:10:07 UTC from IEEE Xplore. Restrictions apply.
The dataset of machine learning has distinct features and were defined in this dataset: Denial of service, User to Root,
attributes. These features defines the prime characteristics of each Remote to local and scanning. A similar dataset was prepared
set of data in the dataset. Hence when a bulk of raw data is again with more simulated attacks, the DARPA 1999 [12]. The
captured as pcap the researcher has to write some kind script to KDD 1999 was developed for the KDD cup challenge. This data
segregate the attributes needed from the pcap into ML tool usable has 41 detailed attributes and is very similar to the NetFlow data.
format. Fowler et.al. [9] studied the attribute relation file format Tavallae et. al. [13] studied the KDD 1999 data set
(arff) of Weka and developed a tool that can be used to convert comprehensively. The statistical analysis revealed that the dataset
any Packet Details Markup Language (pdml) format to weka had a huge number of redundant entries. 78%of the training set
minable arff format. To convert a pcap file to pdml tools like and 75 % of the testing set was found to be similar. These inherent
tshark can be used. problems renders the KDD 1999 unusable and hence a new
Tshark –T pdml –r <input file> < output file> dataset called the NSL-KDD was proposed.

Where the input file is the .pcap file and the output file is the 3. MACHINE LEARNING TECHNIQUES
name of the pdml file. Secondly the Fowler’s tool “pdml2arff.py” FOR CYBER-SECURITY
(available in GitHub) can be used to do perform the final
Few popular ML techniques are described in this section. For each
conversion.
method the application to cyber security have been identified.
pdml2arff.py <Input file>
3.1 Bayesian Network
Where the input file is the name of the pdml file. This will The network is developed as a random set of variable and their
generate an arff file called <Input file>.arff conditional dependencies via a directed acyclic graph. The nodes
Fowler’s results shows that for normal tcp traffic the tool representing the child are dependent on the parent nodes and each
performs well and successfully converts the raw data into weka node maintains the states of the conditional probability form and
usable format. For the final analysis of the paper this tool was the random variable. Fig 1 [10] shows the attack signature
used with MODBUS protocol to convert the captured pcap files detection using Bayesian network. Each state is an input to the
from the gas pipeline, it converted all the attributes into string underlying state with varying state values. The calculated
nominal attributes which was readable by WEKA but was probability tables are calculated and shown in the figure. Bayesian
unusable for any kind of analysis. No numeric data was found in networks can also be used to infer unobserved variables.
the arff file, hence Fowler’s tool is not suitable for the MODBUS
protocol.
A comprehensive list of packet headers of cyber security dataset
of few important protocol have been enumerated by Buczak et. al
[10].
2.2 Data from NetFlow
Cisco has its own feature called NetFlow to monitor the network
interface and collect IP network traffic as it enters and exits the
interface. A network administrator can determine things such as
the source and the destination traffic and class of service by
analyzing the data provided. A typical NetFlow architecture has
three main components Flow exporter- accumulates the network
traffic and exports the flow towards flow collectors, Flow
collector- receives and preprocesses the data and finally stores the
data, Analysis application- Segregates the flowing data and
profiles it on the basis of need. The compressed and preprocessed
version of actual network packets are included in NetFlow data.
2.3 Other Data Sets
The DARPA (Defense Advanced Research Project Agency) has
two datasets that are invaluable for cyber security researchers. The Fig 1. Attack signature detection using Bayesian network [10]
DARPA 1998 and 1999 dataset was developed by Cyber Systems
and Technology group of the Massachusetts Institute of Bayesian network can be used for anomaly detection and known
Technology Lincoln Laboratory (MIT/LL). KDD 1999 is another attack signature and patterns can also be compared with the
famous data set that is predominantly used by cyber security streaming data for known attacks. Jemili et. al. [14] developed an
researchers. Another prominent data set involving SCADA intrusion detection system using the Bayesian network. The KDD
protocol was generated by the Mississippi State University’s 1999 was used with 9 of its attributes to model the system. A
critical Infrastructure protection center [1]. This dataset will be performance of 88% and 89% was achieved in normal and attack
analyzed in the later sections to evaluate the accuracy of ML scenarios. The model provided detection rate of 99%, 21%,89%
algorithms on the SCADA protocols. This data set records the and 7% for Probe, scan , DOS and R2L. Since the number of
data from a simulated gas pipeline and documents 35 distinct training instances were very low in case of R2L the accuracy of
attacks on the SCADA system. the model suffered substantially.
The DARPA 1998 was built on the simulation of network data of 3.2 Decision trees
TCP/IP, a data of 9 week was collected. 7 weeks of data was used The decision tree is very much analogous to a tree. The trees have
to train the system while the remaining 2 weeks of data was used leaves which represents the various classifications and the
in the validation of the system [11].Four different types of attacks branches are the links or features that in-turn provides the path to

Authorized licensed use limited to: Middlesex University. Downloaded on August 31,2020 at 20:10:07 UTC from IEEE Xplore. Restrictions apply.
the classifications. ID3 and C4.5 are few popular algorithms for port, raw data length and raw data type. Then the study used the
generating decision trees automatically. normal and attack data to train the ANN. Canady et. al. report an
The comparing process of the SNORT rules with the incoming error rate of 0.058 and 0.070 during training and testing scenarios.
traffic is slow because of the large number of signatures. Kruegel Hence, an RMS of 0.070 translates to a normal accuracy of 93%
and Toth et al. [15] replaced 150 SNORT rules by using a variant for the testing phase. Here the data is either categorized as normal
of ID3 algorithm. Their aim was to replace these algorithm by a traffic or as malicious traffic.
decision tree model. This would be effective in increasing the 3.5 Genetic algorithm and genetic
speed of processing. Rule clustering was used to replace the Snort
rules. This minimizes the number of necessary comparisons. This programming
also allows parallel evaluation hence speeds up the comparison Two of the most popular computation method based on the
procedure. The clustered rule was applied to DARPA 1999 principle of survival of the fittest is- GA and GP. These
dataset. The processing speed and efficiency of the develop model algorithms functions on the population of the chromosomes that
was compared with the snort analysis. The model reached a evolve based on certain operators. The three basic operator used is
maximum speed up of 105% and the minimum speed up was of selection, crossover and mutation. The algorithm is started with a
5%. For further experimentation the number of rules replaced was randomly generated population, a fitness value is computed for
increased from 150 to 1581. Although Toth does not provide any each individual. This signifies the ability of the each individual to
kind of quantitative figures yet the study detected a profound solve the current problem and individuals with higher probability
speed up using the decision tree method, secondly the processing have higher chance of being chosen in the mating pool. Two
time was reduced drastically. capable individual will perform the next step called crossover and
finally each will undergo mutation. Among the two mutated
3.3 Clustering individual the highest fit chromosome will be rallied over to the
This is an unsupervised learning method where similarity measure next generation.
is used to group data together. Clustering algorithms can learn Abhram et. al. [18] used a simple GP model to develop a classifier
from audit data and explicit description of different attack classes for attacks. Three popular GP models were used in this analysis:
by the system administrator is not necessary. Linear Genetic Programming (LGP), Gene Expression
Hendry et. al. [16] demonstrates the application of real-time Programming (GEP) and Multi Expression Programming (MEP).
signature detection using clustering algorithm. The normal and The model made use of different mathematical operators as
anomalous network traffic was created by a density based function sets. The DARPA 1998 data set was used as the prime
clustering scheme known as Simple Logfile Clustering Tool dataset to validate the generated model. The dataset had 4 main
(SLCT). Two clustering schemes are used: Firstly, for detection of types (U2R, R2L, DoS and probing) of attacks with a total of 24
normal and attack scenarios, secondly the other scheme can be different attack scenarios. The False alarm rate (FAR) of the
used to determine the normal traffic in a supervised manner. In above model was as low as 0% to 5% depending on the type of
this model parameter M is used to define the feature that is attack being investigated.
contained in the cluster. By setting M parameter to 97%, 98%
attack data is detected with a 15% FAR. The signatures are 3.6 Hidden Markov Models (HMM)
created from the samples of the high density clusters of the model. This is a statistical Markov model with a set of states which are
The KDD dataset was used to validate the generated model. interconnected using transition probabilities that determines the
Cluster integrity was used as the performance metrics to improve topology of the model. The system is assumed to be a Markov
the accuracy of the model. An accuracy of 70 % to 80 % was process with unobserved parameters. This model provides a
achieved for unknown attacks. Considering the unknown nature forward- backward correlation which can be used to determine the
(new or zero-day) of the attacks this level of accuracy is quite hidden parameters from the observable parameters. Since the
impressive. probability distribution in each state is different the system can
change states overtime and is capable of representing non-
3.4 Artificial Neural Networks (ANN) stationary sequences.
The ANN behaves mainly like human brain. The neural network Joshi et. al. [19] made use of HMM to develop an intrusion
has a layer layout. The input from the data actuates the neuron the detection system. Five definite states are used each having six
second layer of the network. Which in turn outputs to the next observation symbol per state. The interconnection between the
layer of the hierarchy. This carries on and finally the output is states are developed in such a way that any state can transition
produced by the last layer of the network. The internal network into any different state. To estimate the HMM parameters the
which plays an important part in the neural network are black Baum-Welch method can be used. For the validation of the model
boxed from the environment and is known as hidden layers. One the KDD 1999 dataset was used. Out of the extensive 41 features
major drawback of neural network is the huge learning time due of the datasets 5 was chosen for the analysis. The positive
to the occurrence of local minima. This approach was prevalent in detection rate of the model amounted to 79% with a false positive
mid-nineties but due to the advent of support vector machines rate of 21%. If more than 5 features are used in the analysis the
(SVMs) ANN started to fade away. With the introduction of accuracy of the model might improve but no quantitative analysis
convolution NN the popularity of neural network is on the rise was performed by the authors to support this improvement claim.
again. Canady [17] describes an ANN model which makes use of
multi category classifier to detect anomalies. RealSecure network 3.7 Inductive Learning
monitor was used to generate the data. The attack signatures were The inference of certain information from a dataset is known as
built into the system. Around 3000 attacks were simulated by deduction. On the other hand the other approach of moving from
program like Satan and Internet Scanner out of the 10000 specific observation to develop theories and patterns is known as
recorded attacks. The data preprocessing was performed using inductive learning. These are the two primary methods used for
nine selected features: ICMP code, ICMP type, source address, the inference of information from the data. Inductive analysis
destination address, protocol identifier, source port, destination

Authorized licensed use limited to: Middlesex University. Downloaded on August 31,2020 at 20:10:07 UTC from IEEE Xplore. Restrictions apply.
develops some general patterns and which are used to develop Table I
some hypothetical conclusions.
Features
Fan et. al. [20] developed an artificial anomaly generator to
address reset rate
generate random events and anomalous traffic. Two prime
approaches of distribution based anomaly generation and the control scheme command response
filtered artificial anomalies was used to generate these random function deadband
anomaly. This generated data was randomly fused with the
DARPA 1998 dataset. This data was used by Fan et. al. to study pump time
the performance of the developed inductive learning model. The length cycle time
model showed a successful detection rate of 94 and a low FAR of
2% was achieved. This study enumerated the correct methodology solenoid binary result
to develop the dataset that can be used for anomaly detection and setpoint rate
showed the application of inductive learning model on the
pressure measurement categorized result
developed dataset.
gain system mode
4. ML RECOMMENDATIONS FOR crc rate specific result
ANOMALY DETECTION
Machine learning is used in cyber-security in three main areas:
IDS, Anomaly detection module and misuse detection. Anomaly Morris et. al.[1] Gives a comprehensive overview about each
detection is specifically aimed at segregating abnormal traffic features of the dataset and why each aspect is important from the
from normal one while misuse detection classifies attack signature perspective of cyber security and intrusion detection. In this
comparing it with known ones. dataset a total of 35 attack was performed. These attack can be
Clustering algorithm (Density based like DBSCAN) works the broadly classified into 7 categories: Naïve Malicious Response
best with anomaly detection. Apart from the high processing Injection (NMRI), Complex Malicious Response Injection
speed clustering algorithms are easy to implement and the (CMRI), Malicious State Command Injection (MSCI), Malicious
parameters to configure are also less in number. SVM also Parameter Command Injection (MPCI), Malicious Function Code
performs considerably well for anomaly detection. For misuse Injection (MFCI), Denial of Service (DoS) and Reconnaissance.
detection the classifiers has to have the capability to generate The Final class in the arff data set has these 7 attack catagories
signatures. Branch feature in a decision tree or chromosomes in along with normal traffic data. 97019 Instances were recorded in
genetic algorithm generates signatures that are apt for such task. the dataset. The Distribution of the final class is enlisted below in
Hence algorithms like ANN and SVMs which has hidden nodes the Table II.
are not well suited. Table II

5. EVALUATION OF ML ALGORITHMS Class Label Count

ON MODBUS DATA Normal 61156


The main aim of this evaluation is to test the applicability of Naïve Malicious Response Injection (NMRI) 2763
certain ML algorithms to detect cyber-attacks on MODBUS data.
Complex Malicious Response Injection (CMRI) 15466
Tenfold cross validation was used to develop the ML models.
This analysis was performed in Weka [6]. In 10 fold cross Malicious State Command Injection (MSCI) 782
validation Weka produces 10 different models for the data set Malicious Parameter Command Injection (MPCI) 7637
provided. Then the weighted average of these models are
calculated which is showed as the final result. The data set used Malicious Function Code Injection (MFCI) 573
was labeled telemetry data from gas pipeline developed by the Denial of Service (DoS) 1837
Critical Infrastructure Protection Centre of Mississippi state
university [1]. Reconnaissance 6805
Few standard classifier was considered for the evaluation. The .
methods used were:- 5.2 Accuracy and ROC curve for ML
1. Naïve Bayes- Bayes’ theorem based probabilistic Algorithm Evaluation
classifier. Beaver et. al. [21] have already used the ICS dataset for ML
2. Random Forest-A Ml based on decision tree algorithm analysis. But ROC curve was not plotted for any
algorithms. algorithm and hence it is very hard to make out the overall
3. OneR- Each feature of the rule set is evaluated and performance of the algorithms. The receiver operating
finally the optimum or the best one is chosen. characteristics (ROC) curve is the plot of false positive rate (FAR)
4. J48- A basic implementation of C4.5 decson tree in the x-axis versus the plot of test sensitivity in the y-axis. The
algorithm area under the curve of the ROC is an important parameter. It is
5.1 Information about the Dataset used to measure the sensitivity and the specificity. Where the
The dataset used was in Weka minable arff format. It had 20 total sensitivity is the number of true positive decisions and the
attributes. The Table I below enlists all the features present in the specificity is known as the number of true negative decisions.
dataset. Hence the area under the ROC curve is the combined measure of
the sensitivity and specificity. As the area under the ROC curve is
the measure of the overall performance of any test hence this

Authorized licensed use limited to: Middlesex University. Downloaded on August 31,2020 at 20:10:07 UTC from IEEE Xplore. Restrictions apply.
parameter can be used to assess the overall performance of the measured training time is time taken to validate all folds hence in
ML algorithms used in the classification of the MODBUS data. practical scenario the training time can be estimated to be 1/Kth of
Hence an AUC analysis of the ROC curve for different ML the plotted time in millisecond as depicted in the figure 4. The
algorithm will reveal the classification performance of the machine used for the training has Intel i7 6700HQ as primary
algorithms. processor with Nvidia GTX 1060 dedicated graphic support.
The Weka Knowledge flow model for the current analysis was Nvidia GTX 1060 has 1256 Cuda cores which greatly parallelizes
developed. It is shown in Figure 2. The Roc curve generated for the training performance.
the four ML algorithms are show in Figure 5. From the ROC
curve it is evident that j48 algorithm produces the most optimized
results in general overall classification for the power system
dataset. The AUC of the four algorithms is shown in the table
below.

Area under the Curve


Table III
Algorithm Area under the Precision Recall
Curve (AUC) 0.989 0.995
0.967
OneR 0.887 0.862 0.894
Naïve Bayes 0.967 0.947 0.936
0.887
Random Forest 0.989 0.988 0.988
J 48 0.995 0.992 0.992

OneR Naïve Bayes Random J 48


Forest
Fig. 3 Area under the curve for all 4 ML algorithm

Fig 2. Weka Knowledge flow model for generating the ROC


curves
This analysis was done as a binary classification problem. A
different approach of multiclass classification can also be used.
Secondly in the calculation involving the AUC, Precision and
Recall shown in Table III, the weighted average of all 8 class is Fig. 4 Training time taken by each algorithm
considered. Each class can be separately analyzed which will
provide information about the model’s capability to classify each Hence in this 9 bus IEEE system even though J48 might
type of attack from each other and from the normal traffic. The outperform Random forest in terms of accuracy a little
results given in Table III shows the model’s capability to classify compromise on the accuracy can yield better real-time
the traffic as whole. performance when the algorithms are implemented as a core part
of the intrusion detection system.
From Fig. 3 it is evident that the J48 performs the best in the
overall classification as the area under the curve value for the 6. CONCLUSION
ROC curves is closer to 1. The ROC graph generated from weka In this paper an elaborate survey was performed to enlist few
is shown in the appendix as figure 5. In industrial control systems popular datasets then few ML algorithms were discussed along
the execution efficiency of the machine learning intrusion with their application in cyber-security. Finally few
detection system being used is of utmost importance. Hence the recommendations were made regarding the choice of ML. In the
training must be optimal so that newly streamed data can be later part of the paper a brief analysis was performed with an ICS
trained within a reasonable amount of time and hence the data set and performance of a few ML algorithm was evaluated.
algorithm can still maintain its real-time data monitoring. Although J48 algorithm performs better than other algorithms in
Therefore, when an algorithm is chosen an optimal accuracy and the scope of analysis, more analysis needs to be performed to
training time pair is often preferred for each domain of the ascertain the performance of the algorithms because the
industrial control system. A python script was used to measure the performance of algorithms tends to skewed depending upon the
training time of the algorithm during the K-fold validation. The

Authorized licensed use limited to: Middlesex University. Downloaded on August 31,2020 at 20:10:07 UTC from IEEE Xplore. Restrictions apply.
dataset on which it is being applied on. Secondly, Random forest [11] R. Lippmann, J. Haines, D. Fried, J. Korba, and K. Das, “The
might be more suitable as a core IDS algorithm for its optimal 1999 DARPA offline intrusion detection evaluation,”
real-time performance in the current scenario being considered. Computer Networks, 34, 2000, pp. 579–595
[12] R. Lippmann, D. Fried, I. Graf, J. Haines, K. Kendall, D.
7. ACKNOWLEDGEMENT McClung, D. Weber, S. Webster, D. Wyschogrod, R.
I would like to thank Dr. Thomas Morris and his students at the
Cunningham, and M. Zissman,” Evaluating Intrusion
Mississippi State University’s Critical Infrastructure Protection
Detection Systems: the 1998 DARPA Offline Intrusion
Centre for developing the dataset that made this analysis possible.
Detection Evaluation,” Proceedings of the DARPA
Information Survivability Conference and Exposition,
Institute of Electrical and Electronics Engineers (IEEE)
8. REFERENCES Computer Society Press, Los Alamitos, CA, 2000, pp. 12–26
[1] Morris, T. H., Thornton, Z., & Turnipseed, I. (n.d.).
[13] M. Tavallaee, E. Bagheri, W. Lu, and A. Ghorbani, “A
Industrial Control System Simulation and Data Logging for
detailed analysis of the KDD Cup 1999 data set,”
Intrusion Detection System Research.
Proceedings of the Second IEEE Symposium on
[2] Nguyen, T. T. T., & Armitage, G. (2008). A survey of Computational Intelligence for Security and Defence
techniques for internet traffic classification using machine Applications, 2009
learning. Communications Surveys & Tutorials, IEEE, 10(4),
[14] F. Jemili, M. Zaghdoud, and A. Ben, “A framework for an
56–76. http://doi.org/10.1109/SURV.2008.080406
adaptive intrusion detection system using Bayesian
[3] A. Sperotto, G. Schaffrath, R. Sadre, C. Morariu, A. Pras, network,” Intelligence and Security Informatics, IEEE, 2007
and B. Stiller, “An overview of IP flow-based intrusion
[15] C. Kruegel and T. Toth, “Using decision trees to improve
detection,” IEEE Communications Surveys & Tutorials,
signature- based intrusion detection,” Proceedings of the 6th
12(3), 2010, pp. 343–356
International Workshop on the Recent Advances in Intrusion
[4] Almomani, A., Gupta, B. B., Atawneh, S., Meulenberg, A., Detection, West Lafayette, IN, 2003, pp. 173–191
& Almomani, E. (2013). A survey of phishing email filtering
[16] R. Hendry and S. J. Yang, “Intrusion signature creation via
techniques. IEEE Communications Surveys and Tutorials,
clustering anomalies,” SPIE Defense and Security
15(4), 2070–2090.
Symposium, International Society for Optics and Photonics,
http://doi.org/10.1109/SURV.2013.030713.00020
2008
[5] P. Garcia-Teodoro, J. Diaz-Verdejo, G. Maciá-Fernández,
and E. Vázquez, “Anomaly-based network intrusion [17] J. Cannady, “Artificial neural networks for misuse
detection,” Proceedings of the 1998 National Information
detection: Techniques, systems and challenges,” Computers
Systems Security Conference, Arlington, VA, 1998, pp. 443–
& security 28, no. 1, 2009, pp. 18–28
456
[6] M. Hall, E. Frank, J. Holmes, B. Pfahringer, P. Reutemann,
[18] A. Abraham, C. Grosan, and C. Martin-Vide, “Evolutionary
and I. Witten, “The WEKA data mining software: an
design of intrusion detection programs,” International
update,” ACM SIGKDD Explorations Newsletter, 11 (1),
Journal of Networks Security, 4 (3), 2007, pp. 328–339
2009, pp. 10–18
[19] S. S. Joshi and V. V. Phoha, “Investigating hidden Markov
[7] R. Core Team, “R Language Definition,” 2000
models capabilities in anomaly detection,” Proceedings of
[8] M. Graczyk, T. Lasota, and B. Trawinski, “Comparative the 43rd Annual Southeast Regional Conference, Vol. 1,
analysis of premises valuation models using KEEL, ACM, 2005, pp. 98–103
RapidMiner, and WEKA,” Computational Collective
[20] W. Fan, M. Miller, S. Stolfo, W. Lee, and P. Chan, “Using
Intelligence. Semantic Web, Social Networks and Multiagent
artificial anomalies to detect unknown and known network
Systems. Springer Berlin Heidelberg, 2009, pp. 800–812
intrusions,” Knowledge and Information Systems, 6 (5),
[9] Fowler, C. A., & Hammel, R. J. (2014). Converting PCAPs 2004, pp. 507–527
into Weka mineable data. 2014 IEEE/ACIS 15th
International Conference on Software Engineering, Artificial [21] Beaver, J. M., Borges-Hink, R. C., & Buckner, M. a. (2013).
An Evaluation of Machine Learning Methods to Detect
Intelligence, Networking and Parallel/Distributed
Malicious SCADA Communications.
Computing, SNPD 2014 - Proceedings.
2013 12th
http://doi.org/10.1109/SNPD.2014.6888681
International Conference on Machine Learning and
Applications, 2, 54–59.
[10] Buczak, A., & Guven, E. (2015). A Survey of Data Mining http://doi.org/10.1109/ICMLA.2013.105
and Machine Learning Methods for Cyber Security Intrusion
Detection. IEEE Communications Surveys & Tutorials, (1),
1–1. http://doi.org/10.1109/COMST.2015.2494502

Authorized licensed use limited to: Middlesex University. Downloaded on August 31,2020 at 20:10:07 UTC from IEEE Xplore. Restrictions apply.
9. APPENDIX

Figure 5. ROC Curves generated from the dataset using four ML algorithm

Authorized licensed use limited to: Middlesex University. Downloaded on August 31,2020 at 20:10:07 UTC from IEEE Xplore. Restrictions apply.

You might also like