8
8
when used today [15]. For example, in the UNSW-NB15 prediction, and statistical analysis) [26]. Classification tech-
dataset (modern dataset) there are several types of attacks that niques are genuinely relevant to use in IDS if the approach
are not present in the NSL-KDD dataset (old dataset). Mean- used is supervised learning. However, the clustering technique
while, modern datasets still contain many redundant attributes, is more appropriate for classifying intrusions if the system is
high dimensionality data [16], redundant and irrelevant [17]. unsupervised. In addition, several previous studies still build
For example, in the CSE-IDS-2018 dataset, there are very models using a single classifier, for example, Support Vector
significant imbalance data [18]. Thus, data reduction and Machine (SVM) [27], Neural Network (NN) [28], Decision
feature selection in datasets are particularly important things Tree (DT) [29], Naïve Bayes (NB) [30], or Random Forest
to consider in machine learning. (RF) [31], and others. However, from some of these studies,
Most of the IDS proposed in previous studies are based classification results are still found with performance that is
on outdated datasets [14]. Thus experiencing problems of not optimal when applied to several different datasets.
the inconsistency of attack distribution and lack of attack Model evaluation techniques can be conducted with various
scenarios when used today [15]. For example, in the UNSW- types of evaluation metrics. From several previous studies
NB15 dataset (modern dataset), several types of attacks are [14], [22], [23], evaluation metrics that are often used to
not present in the NSL-KDD dataset (old dataset). Meanwhile, evaluate models include accuracy, precision, detection rate,
current datasets still contain many redundant attributes, high false alarm rate, f-measure, ROC Curve, and others. However,
dimensionality data [16], redundant and irrelevant [17]. For most use accuracy, precision, and recall to evaluate the built
example, in the CSE-IDS-2018 dataset, there are very signif- model. Several previous studies [24], [25], [32] explained
icant imbalanced data [18]. Thus, data reduction and feature that the calculation of accuracy, detection rate, and false
selection in datasets are essential in machine learning. alarm rate was based only on the values obtained from the
Based on the intrusion detection method used, several stud- classification results carried out on the NSL-KDD dataset.
ies related to IDS have applied a machine learning approach Meanwhile, the type of intrusion currently developing is more
[14], [19], [20], [21]. C.R.Wang et al. [22] study’s found that complex than the type listed in the NSL-KDD dataset. Thus,
applying machine learning techniques to IDS has a number of this results in the quality of intrusion detection accuracy not
benefits over using more conventional techniques. However, being guaranteed if the model that has been built is applied to
applying machine learning based on a single classifier still detect new types of intrusion.
has a problem for existing intrusion detection to achieve good Thaseen et al. [19] have developed an intrusion classi-
performance. The issue with the single classifier is it may fication model on IDS with a chi-square feature selection
not perform well when detecting of each type of attacks [23]. technique and multi-class support vector machine (SVM) to
For this reason, the our motivation is to find the best perform overcome this problem. However, in that study, only one
technique and to proven superiority of the ensemble classifier. computer-based feature selection method was used: a single
The main contributions of this research are summarized as classifier in the form of SVM. And that research has not
follows: been developed with several other algorithms, so it does
• To increase the single classifier performance for applied not supply information regarding using other better feature
to the intrusion detection system. selection methods.
• Introduce an ensemble approach (Bagging-SDN) using Neha et al. [17] have built a classification model on IDS
the bagging technique and three base learner classifiers with the feature selection technique of intelligent water drops
(SVM, DT, and NB). (IWD) combined with a classification technique using a sup-
• Experimental results show that the proposed method port vector machine (SVM). The experimental results show
surpasses basic methods (single classifier) in terms of that the proposed model performs better and has a higher
Accuracy (ACC), Precision (PC), Recall (RC), F1-Score detection rate, lower false alarm rate, and better accuracy than
(F1), and Kappa Score (KS) metrics. the previous approach.
This article’s structure is divided into five sections. Section Mohammadi et al. [33] have built a model for feature
I describes the introduction, Section II the relevant work, Sec- selection based on the feature grouping linear correlation
tion III the methodology, Section IV the results and discussion, coefficient (FGLCC) algorithm and the cuttlefish algorithm
and Section V the conclusion and future studies. (CFA). The performance of the proposed method is confirmed
using the KDD Cup’99 dataset. The validation results show
II. RELATED WORK an accuracy of 95.03% and a detection rate of 95.23%, with a
Many previous studies have been done to detect intrusions lower false positive rate (1.65%) than the previous literature
on IDS. Comparing previous studies related to IDS can be methods.
grouped based on the classification techniques and evaluation Then Boahen et al. [34], in their research, developed a new
metrics used. model called ’PSOGSARFC.’ Then the model was validated
The method applied in several previous studies mostly used using two datasets, namely NSL-KDD and UNSW-NB15. The
a classification approach [24] and clustering [25]. The classi- evaluation showed an increase in accuracy in the classification
fication technique is widely used in several machine learning of NSL-KDD dataset by 98.56% and in the UNSW-NB15