A Few-Shot Deep Learning Approach For Improved
A Few-Shot Deep Learning Approach For Improved
Intrusion Detection
Md Moin Uddin Chowdhury∗ , Frederick Hammond† , Glenn Konowicz‡ ,Chunsheng Xin § ,Hongyi Wu¶ and Jiang Li
Department of Electrical and Computer Engineering, Old Dominion University
Email: ∗ [email protected], † [email protected], ‡ [email protected], § [email protected],
¶ [email protected], [email protected],
Abstract—Our generation has seen the boom and ubiquitous having limited potential against unfamiliar attacks, misuse
advent of Internet connectivity. Adversaries have been exploiting detection systems find greater usage in commercial arena [3].
this omnipresent connectivity as an opportunity to launch cyber With the increasing processing power of modern CPUs, data
attacks. As a consequence, researchers around the globe devoted a
big attention to data mining and machine learning with emphasis mining/machine learning technique has become an alternative
on improving the accuracy of intrusion detection system (IDS). In to manual human input. This approach was first introduced
this paper, we present a few-shot deep learning approach for im- in mining audit data for dynamic and automatic models for
proved intrusion detection. We first trained a deep convolutional intrusion detection (MADAMID) using association rules [4].
neural network (CNN) for intrusion detection. We then extracted However, the majority of IDSs currently in use are prone
outputs from different layers in the deep CNN and implemented a
linear support vector machine (SVM) and 1-nearest neighbor (1- to generation of false positive alarms. To this end, there
NN) classifier for few-shot intrusion detection. few-shot learning are only few datasets reflecting actual network connections
is a recently developed strategy to handle situation where training being publicly available for classifying normal from abnormal
samples for a certain class are limited. We applied our proposed connections. Among them KDD 99 and NSL-KDD [3] are
method to the two well-known datasets simulating intrusion in well known public datasets to promote anomaly detection
a military network: KDD 99 and NSL-KDD. These datasets are
imbalanced, and some classes have much less training samples techniques using machine learning.
than others. Experimental results show that the proposed method The KDD 99 dataset [3] is the pioneer for machine learning
achieved better performances than the state-of-the-art on those based IDS. KDD 99 dataset was harvested from data gather-
two datasets. ing during the 1998 DARPA Intrusion Detection Evaluation
Index Terms—Intrusion Detection System(IDS); low shot learn-
Program, where a LAN was set up in an effort to simulate an
ing; CNN; SVM.
actual military LAN, collecting TCPdump data over a duration
I. I NTRODUCTION of several weeks, with multiple attack data interspersed within
normal connection data. The training data consists of five
It is estimated that there will be roughly 50 billion devices
million connections records, and two weeks of testing data
that will connect to the Internet by year 2020. To keep abreast
yielded around two million records. The training data contains
with this exponential pace of Internet growth, cyber attacks by
22 different attacks out of the 39 present in the test data. The
hackers will exploit new flaws in Internet protocols, operating
known attack types are those present in the training dataset
systems and application software. There exists several protec-
while the novel attacks are the additional attacks in the test
tive measures such as firewall which is placed at the gateway
datasets not available in the training dataset. The attacks types
to check activities of intruders. To meet the dynamic character-
are grouped into four categories [3] :
istics of attacks, intrusion detection systems (IDSs) [1] is used
as a second line of defense. IDSs dynamically monitor network • DOS (Denial of Service): Under this attack, the attacker
log, file system, and real-time events occurring in a computer prevents user from using resources by pre-occupying
system or network and analyze them for signs of adversaries resources so that the service provider can no longer
or attacks [2]. IDSs are classified as host-based or network- handle new user requests.
based. Host-based IDSs operate on information collected from • Probing: Under the probing attack, an attacker gathers
within an individual computer system, while network-based information to bypass existing security measures by port
IDSs collects raw network packets as the data source from the scanning.
network and analyze for signs of intrusions [2]. There are two • U2R (User to Root): Under this attack, attackers attempt
different detection techniques, misuse detection and anomaly to gain unauthorized access to local super user (root)
detection, employed in IDSs to search for attack patterns. privileges.
Misuse detection systems find known attack signatures in the • R2L (Remote to Local): This attack means unauthorized
monitored resources, whereas anomaly detection systems iden- access from a remote machine outside of the system to
tify attacks by detecting changes in the pattern of utilization or access a valid user account.
behavior of the system. However, at present, anomaly detection Despite of its potential, KDD 99 dataset is considered as
IDSs have not been widely adopted. On the other hand, despite having several drawbacks such as duplicate evidences and
457
TABLE I
CNN ARCHITECTURE FOR F EATURE E XTRACTION
458
100 100
SVM
90 k-NN 90
80 80
70 70
Test Accuracy(%)
Test Accuracy
60 60
50 50
40 40
30 30
20 20 KDD-kNN
KDD-SVM
10 10 NSL-kNN
NSL-SVM
0 0
KDD NSL-KDD Layer 10 Layer 11 Layer 13 Layer 15
Dataset CNN Layer Index
(a)
Fig. 2. Test Accuracy performance of undersampled datasets
80
100
70
90
70 50
Test Accuracy
60 Normal
Dos 40
50 Probe
R2L
30
40 U2R
30 20
KDD-kNN
KDD-SVM
20
10 NSL-kNN
NSL-SVM
10
0
0 Layer 10 Layer 11 Layer 13 Layer 15
SVM k-NN CNN Layer Index
Classifiers
(b)
Fig. 3. Classwise Accuracy performance of undersampled KDD 99 Fig. 4. (a) Test Accuracy performance & (b) Mean Class-wise test perfor-
mance of features from different layers for original KDD and NSL-KDD
datasets.
IV. M ETHODS
A. Pre-processing each feature and dividing by its standard deviation. After that,
Neural network based classification uses only numerical we normalized the test features using the mean and standard
values for training and testing. Hence a pre-processing stage deviation of each feature from train datasets.
is needed to convert the non-numerical values to numerical
values. Two main tasks in our pre-processing are: C. CNN Architecture for Feature Extraction
• Converting the non-numerical features in the dataset to We trained a CNN architecture to extract features for both
numerical values. The features 2, 3 and 4 namely the datasets. Pre-processed data were fed through the input layer.
protocol type, service and flag were non-numerical. These We used various number of filters such as 64, 128 and 256 with
features in the train and test data set were converted filter size as 1x3. After the convolution layers, there was a fully
to numerical types by assigning specific values to each connected dense neural network with 3 hidden layers with 100
variable (e.g. TCP = 1, UDP = 2 and ICMP = 3). , 20 and 5 hidden units respectively. We trained the model
• Convert the attack types at the end of the dataset into using train data and test data separately and extracted outputs
its numeric categories. Category 1 is assigned to normal from intermediate CNN layers to create new representations
data, and 2, 3, 4 and 5 are assigned to DoS, Probe, R2L with different number of features. We considered mainly four
and U2R attack types, respectively layers in this study. The highest layer was a fully connected
layer with 20 outputs, i.e., output from this layer has 20
B. Normalization features. We also considered a fully connected layer with
Since the features of both KDD 99 and NSL-KDD datasets 100 outputs, a maxpooling layer with 4x256 outputs and the
have either discrete or continuous values, the ranges of the last CNN layer which had 8x256 outputs. We also tried to
values were different and this made them incomparable. In this extract features from lower level CNN layers but the testing
study, the features were normalized by subtracting mean from accuracy was around 40% to 43% for both of the datasets and
459
TABLE II accuracies for NSL-KDD is also close to those of previous
O RIGINAL DATASET ACCURACIES literatures. Table III summarizes the class wise performance
Dataset Classifier Test Accuracy of the classifiers for both of the datasets. It turns out that
KDD SVM 95.27% KDD 99 dataset outperforms NSL-KDD in terms of detecting
99 k-NN 96.19% each class individually by scoring 65.83% and 64.05% for
NSL- SVM 77.68%
KDD k-NN 80.74% 1-NN and SVM respectively. For NSL-KDD, the classifiers
were not able to detect the minor classes properly which
resulted into class-wise performance degradation (53.84% and
TABLE III 56.609% respectively for 1-NN and SVM).
O RIGINAL DATASET C LASS - WISE M EAN ACCURACIES
100
Dataset Classifier Mean of Class-wise Test Accuracy
KDD SVM 65.833% 90
99 k-NN 64.048%
NSL- SVM 56.609% 80
Test Accuracy(%)
60
50
hence omitted for comparison. A brief overview of the CNN
40
architecture is shown in Table I.
After getting the intermediate features we used them as 30
input to SVM and k-NN. We considered 1 Neighbor for k-NN 20 SVM-KDD 2 fold
KNN-KDD 2 fold
classifier. Fig. 1 shows our considered methodology and work- 10 SVM-KDD 9 fold
SVM-KDD 9 fold
flow. As performance metric, we considered mean classwise
0
accuracy along with test accuracy. In other words, we first Layer 10 Layer 11 Layer 13 Layer 15
CNN Layer Index
calculated the accuracy for each class and then considered the
(a)
mean of test accuracies of all classes as performance metric.
90
V. N UMERICAL R ESULTS
80
A. Experiment Setup
70
For model development and evaluation we have considered
Intel core i-7 7700 3.60 Ghz CPU with 32 GB RAM work- 60
Test Accuracy(%)
460
100
II. The 1-NN accuracies are slightly better than the results
90 as shown in Table II. For NSL-KDD the accuracies reached
80 close to 90% for both of the classifiers. For instance, Layer
70
15 provides 91.82% for SVM and 89.27% for 1-NN. Features
extracted from Layer 13 provided 94.62% for SVM classifier
Test Accuracy(%)
60
and 88.93% for 1-NN. The accuracies for NSL-KDD drop
50
as we move to lower layers. We can also observe a similar
40 pattern for mean class-wise test performances of different
30 CNN layers in Fig. 4(b). The more we go down, the less we
20 SVM-NSL-KDD 2 fold get mean test accuracy of all the classes. The best performance
KNN-NSL-KDD 2 fold
SVM-NSL-KDD 9 fold
is provided by layer 13 where the class-wise test accuracies
10
SVM-NSL-KDD 9 fold of all classifiers and datasets were above 70%. The SVM
0
Layer 10 Layer 11 Layer 13 Layer 15 classifier on KDD 99 dataset provided better results than other
CNN Layer Index classifiers.
(a)
80
E. Effect of Sampling on Low shot Deep Learning
To increase the class-wise performance of original datasets,
70
we created 2 fold and 9 fold duplicate samples of U2R class
60 and studied the performances for both datasets. The results
for 2 & 9 fold duplicate oversampling of U2R class on KDD
Test Accuracy(%)
50
dataset is depicted in Fig. 5. From Fig. 5(a) we can see
40 that the testing performances of 2 fold oversampled KDD
99 dataset for SVM classifier were 97.29%, 98.19%, 84.96%
30
and 95.51% respectively for Layers 15,13,11 and 10. The 1-
20 NN classifier performance on same oversampled dataset were
SVM-NSL-KDD 2 fold
KNN-NSL-KDD 2 fold 95.84%, 86.62%, 85.8% and 92.62%, respectively. In case
10 SVM-NSL-KDD 9 fold
SVM-NSL-KDD 9 fold
of, 9 fold oversampling of class U2R, KDD provided 97.06
0 %, 97.3%, 95.19% and 93.152% testing accuracies for SVM
Layer 10 Layer 11 Layer 13 Layer 15
CNN Layer Index
classifier. On the other hand, 1-NN scored 95.62%, 94.25%,
(b)
88.65% and 53.5% on the same oversampled KDD dataset. We
then studied mean class-wise accuracies for 2 fold and 9 fold
Fig. 6. (a) Test Accuracy performance & (b) Mean Class-wise accuracy
performance of features from different layers for 2 & 9 fold oversampled duplicate evidences of U2R class on KDD which is depicted
NSL-KDD dataset. in Fig. 5(b). The best results are provided by the features
from Layer 13. Overall, the mean class-wise performance is
better than the original dataset. We also observed that, the
as depicted in Fig. 2. To mitigate the effect of randomness, we performance of 9 fold oversampled outperformed its 2 fold
conducted the experiment 10 times and calculated the mean counterpart for both Layer 13 and 15. SVM classifier scores
results. The accuracies for SVM and 1-NN were 91.66% and 83.152% mean class-wise accuracy on features extracted from
87.3% respectively for KDD 99 dataset. But this undersam- Layer 13 for 9 fold U2R oversampled KDD.
pling method performed poorly on NSL-KDD dataset. The The test performances for 2 & 9 fold duplicate evidences
test accuracy reached merely highest only 13%. We were also of U2R class on NSL-KDD dataset is depicted in Fig. 6. The
curious about taking a look on class-wise performance of testing accuracies of 2 fold oversampled NSL-KDD dataset for
undersampled KDD 99. From Fig. 3, we can see the class- SVM classifier were 90.043%, 89.9%, 22.87% and 88.36%
wise test performance of the classifiers. In this case, mean respectively for layers 15,13,11 and 10 as shown in Fig.
of the class-wise test accuracies were better than original 6(a). The test accuracies of 1-NN classifier performance on
datasets, where SVM and 1-NN scored 71.94% and 75.44 % same oversampled dataset were 88.19%, 87.42%, 10.13% and
respectively. 22.60%, respectively. The performance of 9 fold oversampled
dataset is slightly better than 2 fold oversampled NSL-KDD.
D. Performance of few-shot Deep Learning In case of, 9 fold oversampling of class U2R, NSL-KDD
We trained a CNN model using KDD 99 and NSL-KDD provided 90.4%, 92.92%, 49.67%, 43.79% testing accuracies
datasets and created 4 new datasets by extracting outputs from for SVM classifier. At the same time, 1-NN scored 88.09%,
4 different layers as mentioned in the previous section. The 88.82%, 50.88% and 23.71% accuracies on the same oversam-
results are shown in Fig. 4. From Fig. 4(a), we can see that, pled dataset for features from layer 15 to 10.
the SVM test accuracies of KDD 99 for Layer 15 and Layer The mean classification accuracies of NSL-KDD are shown
13 are 97.26% and 98.71%, respectively, which are higher in Fig. 6(b). We found a decreasing pattern for mean class-wise
than original SVM considering 41 features as shown in Table test accuracy as we move downwards to CNN architecture.
461
TABLE IV [2] H. G. Kayacik, A. N. Zincir-Heywood, and M. I. Heywood, “Selecting
T EST ACCURACY COMPARISON TO LITERATURE features for intrusion detection: A feature relevance analysis on kdd
99 intrusion detection datasets,” in Proceedings of the third annual
conference on privacy, security and trust, 2005.
Algorithms Test Accuracy [3] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, “A detailed
Niyaz ei al. [7] 79.10% analysis of the kdd cup 99 data set,” in 2009 IEEE Symposium on
Mahbod et al. [3] 82.02% Computational Intelligence for Security and Defense Applications, July
Tang et al. [8] 75.75% 2009, pp. 1–6.
Our work 94.62% [4] J. P. T. Srilatha Chebrolu, Ajith Abraham, “Feature deduction and
ensemble design of intrusion detection systems, computers security,
volume 24, issue 4, june 2005, pages 295-307, issn 0167-4048.”
[5] H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE
The highest mean class-wise accuracy (70.46%) was provided Transactions on Knowledge and Data Engineering, vol. 21, no. 9, pp.
1263–1284, Sept 2009.
by the SVM classifier for 9 fold oversampled U2R attack [6] B. Hariharan and R. Girshick, “Low-shot visual recognition by shrinking
class. As we move down towards the architecture, the mean and hallucinating features,” arXiv preprint arXiv:1606.02819, 2016.
class-wise accuracy decreases. In essence, we observed an [7] Q. Niyaz, W. Sun, A. Y. Javaid, and M. Alam, “A deep learning approach
for network intrusion detection system,” in Proceedings of the 9th EAI
increasing trend in mean class-wise performance but a slight International Conference on Bio-inspired Information and Communica-
degradation in test accuracy performance for both of the tions Technologies (Formerly BIONETICS), BICT-15, vol. 15, 2015, pp.
oversampled datasets. This is due to the fact that the classifiers 21–26.
[8] T. A. Tang, L. Mhamdi, D. McLernon, S. A. R. Zaidi, and M. Ghogho,
are detecting the minor class more accurately at the expense of “Deep learning approach for network intrusion detection in software
compromising test accuracy performances of majority classes. defined networking,” in 2016 International Conference on Wireless
Networks and Mobile Communications (WINCOM), Oct 2016, pp. 258–
263.
[9] J. Zhang and M. Zulkernine, “A hybrid network intrusion detection
F. Comparison to Literature technique using random forests,” in Availability, Reliability and Security,
For ease of comparison with previous literatures considering 2006. ARES 2006. The First International Conference on. IEEE, 2006,
pp. 8–pp.
4 attack types, we also provide Table IV which shows that our [10] N. B. Amor, S. Benferhat, and Z. Elouedi, “Naive bayes vs decision
method outperforms other results in terms of test accuracy. trees in intrusion detection systems,” in Proceedings of the 2004 ACM
SVM classifier on features extracted from layer 13 provided symposium on Applied computing. ACM, 2004, pp. 420–424.
[11] V. Kumar, H. Chauhan, and D. Panwar, “K-means clustering approach
the best result on NSL-KDD dataset. Our methods worked to analyze nsl-kdd intrusion detection dataset,” International Journal of
well for both of the datasets in terms of overall test accuracy Soft Computing and Engineering (IJSCE), 2013.
and class-wise test accuracy. The Layer 13, which is the first [12] M. Panda, A. Abraham, and M. R. Patra, “A hybrid intelligent approach
for network intrusion detection,” Procedia Engineering, vol. 30, pp. 1–9,
fully connected layer with 100 hidden units, provided the best 2012.
results. Among the two classifiers, SVM outperformed 1-NN [13] Y. Bouzida, F. Cuppens, N. Cuppens-Boulahia, and S. Gombault, “Ef-
in almost all the experiments. ficient intrusion detection using principal component analysis,” in 3éme
Conférence sur la Sécurité et Architectures Réseaux (SAR), La Londe,
France, 2004, pp. 381–395.
VI. C ONCLUSION [14] S. Mukkamala, G. Janoski, and A. Sung, “Intrusion detection using neu-
In this research, we implemented a few-shot deep learn- ral networks and support vector machines,” in Neural Networks, 2002.
IJCNN’02. Proceedings of the 2002 International Joint Conference on,
ing method for intrusion detection. Among different attack vol. 2. IEEE, 2002, pp. 1702–1707.
types, some rare attack types make machine learning based [15] W.-H. Chen, S.-H. Hsu, and H.-P. Shen, “Application of svm and ann for
detection systems difficult to identify those minority attack intrusion detection,” Computers & Operations Research, vol. 32, no. 10,
pp. 2617–2634, 2005.
types. Inspired by the few-shot image recognition work in [6], [16] J. Snell, K. Swersky, and R. S. Zemel, “Prototypical networks
we trained a deep CNN structure and used it as a general for few-shot learning,” CoRR, vol. abs/1703.05175, 2017. [Online].
feature extractor for feature extraction. We then trained a Available: http://arxiv.org/abs/1703.05175
[17] S. Ravi and H. Larochelle, “Optimization as a model for few-shot
SVM or an 1-NN classifier for intrusion detection on the new learning,” 2016.
feature representations. In addition, we incorporated a tradi- [18] D. S. Kim, H.-N. Nguyen, and J. S. Park, “Genetic algorithm to improve
tional imbalance learning technique that oversampled minority svm based network intrusion detection system,” in 19th International
Conference on Advanced Information Networking and Applications
classes before training. Our method obtained state-of-the-art (AINA’05) Volume 1 (AINA papers), vol. 2, March 2005, pp. 155–158
performances on the KDD and NSL-KDD datasets achieving vol.2.
over 94% accuracies for both datasets. We also able to achieve [19] H.-V. Nguyen and Y. Choi, “Proactive detection of ddos attacks utilizing
k-nn classifier in an anti-ddos framework,” International Journal of
better classwise accuracy using traditional imbalance learning Electrical, Computer, and Systems Engineering, vol. 4, no. 4, pp. 247–
techniques. The proposed method is a good candidate for 252, 2010.
imbalance learning and intrusion detection. In future, we plan [20] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
with deep convolutional neural networks,” in Advances in Neural
to use our method on various imbalanced datasets to enhance Information Processing Systems 25, F. Pereira, C. J. C. Burges,
the minority class detection rate. L. Bottou, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2012,
pp. 1097–1105. [Online]. Available: http://papers.nips.cc/paper/4824-
R EFERENCES imagenet-classification-with-deep-convolutional-neural-networks.pdf
[21] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J.
[1] S. Potluri and C. Diedrich, “Accelerated deep neural networks for Lin, “Liblinear: A library for large linear classification,” J. Mach.
enhanced intrusion detection system,” in 2016 IEEE 21st International Learn. Res., vol. 9, pp. 1871–1874, Jun. 2008. [Online]. Available:
Conference on Emerging Technologies and Factory Automation (ETFA), http://dl.acm.org/citation.cfm?id=1390681.1442794
Sept 2016, pp. 1–8. [22] F. Chollet et al., “Keras,” https://github.com/fchollet/keras, 2015.
462