0% found this document useful (0 votes)
39 views14 pages

Ref 8

This paper proposes a clustering-enhanced transfer learning approach (CeHTL) to detect unknown network attacks. Transfer learning aims to address the problem of detecting new attacks when labeled training data is limited, by transferring knowledge from related known attacks. The previous HeTL approach relied on manually setting hyperparameters. CeHTL improves on this by automatically clustering source and target domains to compute relevance between them. The authors evaluate CeHTL on a benchmark intrusion detection dataset, generating training and test data from different attack types/subtypes. Results show CeHTL outperforms HeTL and other baselines at detecting new attacks.

Uploaded by

Max Riddle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views14 pages

Ref 8

This paper proposes a clustering-enhanced transfer learning approach (CeHTL) to detect unknown network attacks. Transfer learning aims to address the problem of detecting new attacks when labeled training data is limited, by transferring knowledge from related known attacks. The previous HeTL approach relied on manually setting hyperparameters. CeHTL improves on this by automatically clustering source and target domains to compute relevance between them. The authors evaluate CeHTL on a benchmark intrusion detection dataset, generating training and test data from different attack types/subtypes. Results show CeHTL outperforms HeTL and other baselines at detecting new attacks.

Uploaded by

Max Riddle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Zhao et al.

EURASIP Journal on Information Security


https://doi.org/10.1186/s13635-019-0084-4
(2019) 2019:1
EURASIP Journal on
Information Security

R ES EA R CH Open Access

Transfer learning for detecting unknown


network attacks
Juan Zhao1 , Sachin Shetty2* , Jan Wei Pan3 , Charles Kamhoua4 and Kevin Kwiat5

Abstract
Network attacks are serious concerns in today’s increasingly interconnected society. Recent studies have applied
conventional machine learning to network attack detection by learning the patterns of the network behaviors and
training a classification model. These models usually require large labeled datasets; however, the rapid pace and
unpredictability of cyber attacks make this labeling impossible in real time. To address these problems, we proposed
utilizing transfer learning for detecting new and unseen attacks by transferring the knowledge of the known attacks.
In our previous work, we have proposed a transfer learning-enabled framework and approach, called HeTL, which can
find the common latent subspace of two different attacks and learn an optimized representation, which was invariant
to attack behaviors’ changes. However, HeTL relied on manual pre-settings of hyper-parameters such as relativeness
between the source and target attacks. In this paper, we extended this study by proposing a clustering-enhanced
transfer learning approach, called CeHTL, which can automatically find the relation between the new attack and
known attack. We evaluated these approaches by stimulating scenarios where the testing dataset contains different
attack types or subtypes from the training set. We chose several conventional classification models such as decision
trees, random forests, KNN, and other novel transfer learning approaches as strong baselines. Results showed that
proposed HeTL and CeHTL improved the performance remarkably. CeHTL performed best, demonstrating the
effectiveness of transfer learning in detecting new network attacks.
Keywords: Network attacks detection, Machine learning, Transfer learning

1 Introduction varying their behaviors, the distributions of feature may


In recent years, cyber attack is a growing serious concern change, making the trained models work poorly [6] and
due to its increased sophistication and variations, such unable to detect the new attacks. This is a domain-shift
as denial-of-service (DoS) tactics and the zero-day attack, problem, which usually needs recollecting new training
posing a great threat to government, military, and indus- data and retraining the model to adapt to the changes in
trial networks. Conventional signature-based detection the target domain. However, collecting sufficient labeled
approaches may fail to address the increased variability data for such continuously rising attack variants is infea-
of today’s cyber attacks. Developing novel anomaly detec- sible. Further, detecting evolving attacks usually needs
tion techniques to better learn, adapt, and detect threats incorporating new features from various network layers
in diverse network environments becomes essential. [7]. This also needs to retrain the model because of the
Machine learning/data mining approaches have been different feature dimensions.
applied to the attack detection in networked environ- To address the above problems, we proposed using
ments to improve the detection rate [1–4]. Data-driven transductive transfer learning to enhance the detection of
supervised models achieved better accuracy than unsu- new threats [6]. Transductive transfer learning, a novel
pervised approaches but relied on a large number of machine learning technique, can adapt features in a target
labeled malicious samples [5]. As attacks evolved by domain with deficient labeled data by transferring learned
knowledge from a related source domain [8]. The intu-
*Correspondence: [email protected] ition behind is the human’s transitive inference ability to
2
Virginia Modeling Analysis and Simulation Center, Old Dominion University,
23529 Norfolk, USA extending what has been learned in one domain to a new
Full list of author information is available at the end of the article similar domain [9]. Our study is motivated by the fact that

© The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons license, and indicate if changes were made.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Zhao et al. EURASIP Journal on Information Security (2019) 2019:1 Page 2 of 13

most network attacks belong to variants of known net- [12] proposed a methodology to craft traffic with dif-
work attack families and share common traits in features ferent characteristics. Other studies [13, 14] focused on
[6, 10], which suggested a good fit for applying transfer how to find effective signatures. However, one major lim-
learning. itation of the signature-based technique is its failure to
In this study, source and target domain data refers to detect new attacks, as their signatures are unknown to the
the same network environment at a different time. We system. In addition, building new signatures needs man-
assumed that attacks in a source domain are already ual inspection by human experts, which is very expensive
known and labeled, and attacks in a target domain are and time-consuming, and also introduces an important
new and different than the source. We formularized the latency between the discovery of a new attack and the
problem by using source domain data to differentiate new construction of its signature.
attacks in the target domain. Previously, we developed a Another type of technique for network attack detec-
transfer learning-enabled detection framework and pro- tion is the supervised learning-based technique, which
posed a feature-based heterogeneous transfer learning, uses instances of known attacks to build a classifica-
called HeTL [6], to detect unseen variants of attacks. tion model that distinguishes attacks from good programs
HeTL can find new feature representations for source [1, 3]. Nari and Ghorbani [15] present a network behav-
and target domain by transforming them on a common ioral modeling approach for malware detection and
latent space. Nevertheless, we observed that the per- malware family classification. Rafique et al. [16] eval-
formance of HeTL depended on manual pre-settings of uated the evolutionary algorithms for classification of
a hyper-parameter: relevance between the source and malware families through different network behaviors.
target domain [6]. In this paper, we proposed another Iglesias and Zseby [17] focused on the feature selection
approach—a hierarchical transfer learning algorithm with approach to improve the performance of network-based
clustering enhancement, called CeHTL, which can clus- anomaly detection. However, these learning-based tech-
ter source and target domain and compute the relevance niques share the same limitation as the signature-based
between them. detection in that they both perform poorly on new attacks.
We utilized a benchmark network intrusion dataset Since different attacks usually have different distributions
NSL-KDD [11]. To stimulate the domain shift, we gen- of network behaviors, the learned patterns are unable to
erated training and testing datasets by sampling attacks work accurately. A significant advantage of our approach
from different types of attacks, from big category of is its ability to identify an unknown attack that has not
attacks (e.g., DoS, R2L), and also the subcategory of been previously investigated.
attacks (i.e., 22 subtypes). We compared the proposed
CeHTL with HeTL [6], as well as any other baselines, 2.2 Transfer learning
including traditional classification without transfer learn- Transfer learning was designed to use knowledge from
ing and several novel transfer learning approaches. We the source domain, which has sufficient labeled data, to
also evaluated the approaches on imbalanced datasets, help build more precise models in a related, but differ-
which is common in real-world cyber attack practice. We ent, domain with only a few or no labeled data. Transfer
performed sensitivity analysis by tuning parameters and learning approaches can be mainly categorized into three
using different sizes of training set. The results showed classes [18]. The first class is instance-based [19, 20],
that CeHTL demonstrated the most stable results, which which assumes that certain parts in the source data can
means that it does not rely on the pre-setting of param- be reused for the target domain by re-weighting related
eters and thus is more effective in detecting unknown samples. Dai et al. [20] introduced a boosting algorithm,
attacks. TrAdaBoost, which iteratively re-weighted the source
The rest of this paper is organized as follows: Section 2 domain data and the target domain data to reduce the
reviews the related work. Section 3 outlines the trans- effect of “bad” source data while encouraging the “good”
fer learning framework. Section 4 describes the proposed source data to contribute more to the target domains.
approaches. Section 6 presents the experiments, evalua- However, these approaches require a lot of labeled sam-
tions, and discussions. Finally, we conclude the work in ples from the target domain. The second class can be
Section 7. viewed as model-based approaches [21, 22], which assume
both source and target tasks share some parameters or
2 Related work priors of their models. The third class of transfer learning
2.1 Network attack detection approaches is feature-based [23–25], where a new fea-
One of the well-known techniques for network attack ture representation is learned from the source and the
detection is signature-based detection, which is based on target domain and is used to transfer knowledge across
an extensive knowledge of the particular characteristics domains. Shi et al. [26] proposed a heterogeneous transfer
of each attack, referred to as its “signature.” One study learning method, called HeMap, to project the source and

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Zhao et al. EURASIP Journal on Information Security (2019) 2019:1 Page 3 of 13

target domain onto latent subspace via linear transforma- source domain is known and labeled appropriately, and
tions. They assumed the subspace is orthogonal. Pan et al. attacks in the target domain are new and not labeled.
[24] have performed transfer component analysis (TCA) Unlike prior studies [29, 30] assuming that the source and
to reduce the distance between domains by projecting target domains should have the same feature sets, our
the features onto a shared subspace. Nam et al. [27] then framework supports introducing new features into the tar-
applied TCA to the software defect detection problem. get domain. This is relevant to evolving network attacks
Sun et al. [23] proposed an approach, called Correlation where the adversary may change their behaviors, resulting
Alignment (CORAL), to project source data onto target in a need to incorporating new features in the network or
data by aligning the second-order statistics of the source system layers. Thus, in this scenario, the source and tar-
and target distributions, which do not need any labeled get domains have different attack distributions or feature
data from the target domain. The work has been applied sets. The goal of the transfer learning framework is to use
to the object detection problem and achieves good results. source domain data to differentiate new attacks from the
Shi et al. first proposed a state-of-the-art approach called target domain.
HeMap [26], which uses spectral embedding to unify the The framework consists of a machine learning pipeline,
different feature spaces of the target and source datasets, which includes the following stages: (i) extracting features
and applies this approach to image classification. from raw network traffic data, (ii) learning representations
with feature-based transfer learning, and (iii) classifica-
2.3 Transfer learning for network attack detection tion. In the first stage, features are extracted from the
Even though transfer learning has many great applications raw network trace data with a statistic calculation of
in natural language processing and visual recognition the network flow. Second, we used feature-based transfer
[25, 28], not many studies have applied it to the network learning algorithms to learn a good new feature represen-
attack detection problem. Bekerman et al. [4] mentioned tation from both source and target domains. Then, we fed
that transfer learning can improve robustness in detecting the new representation to a common base classifier. The
unknown malware between non-similar environments. choice of a common base classifier can be decision trees,
However, they did not present much detailed and formal SVM, and KNN.
work on this idea. The study in [29] applied an instance-
based transfer learning approach in network intrusion 4 Transfer learning approach via spectral
detection. However, they require plenty of labeled data transformation
from target domain. Gao et al. [30] proposed a model- We model the network attack detection as a binary clas-
based transfer learning approach and apply it to the sification problem, which is to classify each network con-
KDD99 cup network dataset. Both of these instance and nection as a malicious or as normal connection. Suppose
model-based transfer learning approaches depend heav- we are provided
 with source domain training examples
ily on the assumption of homogeneous features. This is S = xi , x ∈ Rm that have labels LS = {yi }, and tar-
often not the case for network attack detection, which typ- get domain data T = {ui }, u  ∈ Rn . Suppose x and u 
ically exhibits heterogeneous features. Another advantage are drawn from different distributions, PS (X)  = PT (X),
of feature-based approaches is its flexibility to adopt dif- where PT (X) is unknown, and the dimensions of x and u 
ferent base classifiers according to different cases, which are different, Rm  = Rn . Our goal is to accurately predict
motivated us to derive a feature-based transfer learning the labels on T.
approach for our network attack detection study. To our Since network attacks share similar traits, our approach
best knowledge, this paper is the first effort in applying is to find the common latent subspace and transform
a feature-based transfer learning approach for improving the source and target data onto it to get new feature
the robustness of network attack detection. representations, which can then be used in classifcation.
We demonstrated the approach in our previous paper
3 Framework of using transfer learning for [6]. Given source domain data and target domain data
detecting new network attacks with different attacks, the model explores the common
We have present a transfer learning-enabled network latent space, in which the original structure of the data is
attack detection framework to enhance detecting new preserved while the discriminative examples are still far
network attacks in a target domain in [6]. From a practi- apart.
cal standpoint, source and target domains can represent
different or the same network environments with differ- 4.1 Optimization
ent attacks captured at different times and at separate Given source data S and target data T, we compute an
instances. In this study, we primarily consider the latter optimal projection of S and T onto an optimal sub-
scenario, wherein the source and target domains com- space VS and VT according to the following optimization
prise different attacks. We assume that the attack in the objective:

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Zhao et al. EURASIP Journal on Information Security (2019) 2019:1 Page 4 of 13

min (VS , S) + (VT , T) + βD(VS , VT ), (1) target domains (β). Inappropriate choice of parameters
VS ,VT
might lead to suboptimal efficacy results. The row order
where (∗, ∗) is a distortion function that evaluates the dif- of the class type for S and T could also affect the results of
ference between the original data and the projected data. D(VS , VT ). Practically, we may know little about the new
D(VS , VT ) denotes the difference between the projected attack in T, so the transformation process in (4) could be
data of the source and target domains. β is a trade-off misleading.
parameter that controls the similarity between the two To address this problem, we proposed a hierarchi-
datasets. cal transfer learning with clustering enhancement, called
Thus, the first two elements of (1) ensure that the pro- CeHTL, through automatically finding the relevance
jected data preserve the structures of the original data as between the source and target domain before we per-
much as possible. form the projection. CeHTL first clustered the instances
We defined D(VS , VT ) in terms of l(∗, ∗) as: for the target domains, as the source domain already has
two natural clusters (classes). By computing the similarity
D(VS , VT ) = VT − VS 2 (2)
of each cluster and choosing the mapping for two simi-
which is the difference between the projected target lar clusters in the source and target domains, we can get
data and the projected source data. Hence, the projected the correspondence (mapping) of each cluster in the tar-
source and target data are constrained to be similar by get domain to the source domain. We sorted the instances
minimizing the difference function (2). by order of their cluster labels, so that the rows in matri-
We applied linear transformation to finding the pro- ces T and S will have the same class order. Then, we solved
jected space. We define (∗, ∗) as follows: objective (4) for the ordered T and S. We illustrated the
comparison between CeHTL with HeTL in Fig. 1. The
(VS , S) = S−VS PS 2 , (VT , T) = T−VT PT 2 , (3) algorithm for CeHTL is listed in Algorithm 1. We chose
where VS and VT are achieved by a linear transformations K-means++ [31] for clustering and used the Euclidean
with linear mapping matrices, denoted as PS ∈ Rk×m distance to compute the similarity.
and PT ∈ Rk×n to the source and target, respectively.
X2 is the Frobenius norm that can also be expressed Algorithm 1: Clustering Enhanced Hierarchical
as a matrix trace norm. In a different view, PS T ∈ Rm×k Transfer Learning (CeHTL)
and PT T ∈ Rn×k project the original data S and T into a Input: T, S
k-dimensional latent subspace, where the projected data Output: Tnew , Snew

are comparable (VS , S) = SPS T − VS 2 . This will lead 1 Initialize: c clusters for each domain, c = 2
to a trivial solution PS = 0, VS = 0. We thus apply 2 CT = kmeans(T, c); %CT is the cluster label for each
(3). It can be viewed as a matrix factorization problem, instance.
which is widely known as an effective tool to extract latent 3 CS = YS ;%CS is the cluster label for each instance. YS is
subspaces while preserving the original data structures. the class label for source domain
4 If the dimensions of T and S is not equal,
4.2 Optimization objective 1 5 T = pca(T); S = pca(S);
Substituting (3) and (2) into (1), we obtain the follow- 6 Compute the Euclidean distance between centroid of
ing optimization objective to minimize with regard to each cluster in T and S.
VS , VT , PS and PT as follows: 7 For each cluster in T, choose the similar cluster from
CS , which has the smallest Euclidean distance value, to
min G(VS , VT , PS , PT ) = min S − VS PS 2 form a similar cluster pair, and assign the same label to
+ T−VT PT 2 (4) each similar pair of clusters.
2 8 Sort the matrices [ T, CT ] and[ S, CS ] in the order of
+ β · VT − VS  )
CT and CS , to get the Tnew , Snew for the new input for
In our previous work [6], we used a gradient method to the HeTL algorithm.
get the global minimums by iteratively fixing three of the
matrices to solve the remaining one until convergence.
The detailed HeTL algorithm was presented in [6]. In case that the source and target domains have hetero-
geneous feature sets, where T and S may have different
5 Clustering-enhanced hierarchical transfer dimensions, the Euclidean distance cannot be applied.
learning To overcome this problem, we use principal component
In previous study, we have observed that the perfor- analysis (PCA) [32] for each source and target domain
mance of HeTL depends on the manual presetting of to perform feature reduction. By choosing the same size
a hyper-parameter—relevance between the source and of components for source and target domains, they will

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Zhao et al. EURASIP Journal on Information Security (2019) 2019:1 Page 5 of 13

(a) (b)
Fig. 1 Comparison between the HeTL and CeHTL

have the same dimensions. The notation description are to other traditional machine learning algorithms as well
presented in Table 1. as other several novel transfer learning methods (in
Section 6.3). We also performed the parameter sensitiv-
6 Experimental evaluation ity analysis and showed the impact of imbalanced datasets
In this section, we evaluated the performance the of pro- and training data sizes (Section 6.4).
posed transfer learning HeTL and CeHTL for detecting
“unknown” network attacks. We addressed the following 6.1 Network datasets
questions: Does transfer learning approach provide any NSL-KDD contains network features extracted from a
advantage compared with a single classifier without using series of TCP connection records captured from a local
transfer learning approach? and Which technique is the area network. Each record in the dataset corresponds to
most appropriate transfer learning approach? We utilized a connection labeled as either an normal or attack type.
a benchmark network intrusion dataset—the NSL-KDD The dataset has 22 different types of attack, which can
benchmark dataset [11] (in Section 6.1). We carried out be grouped into 4 main categories: DoS, R2L, Probe, and
two experiments to stimulate the “unknown” network User to root (U2R). Tables 2 and 3 provide the details of
attacks and different feature spaces (in Section 6.2). We the attacks and their distribution in the training dataset.
demonstrated the benefits of HeTL and CeHTL compared Since the portion of U2R is very small, we only focus on
DoS, R2L, and Probe.
Table 1 Notation descriptions NSL-KDD contains 41 network features that can be split
Notations Descriptions into 3 groups: (1) basic features deduced from TCP/IP
connection packet headers; (2) traffic features, usually
S Source data
extracted by flow analysis tools; and (3) content features,
VS Projected source data
PS Projection function to the source space
T Target data
Table 2 Category of the attack in NSL-KDD
VT Projected target data
Main categories Attack
PT Projection function to the target space
DoS Neptune, back, land, smurf, teardrop,pod
β Weights of the relevance between the source and target data
R2L buffer_overflow, ftp_write, guess_passwd, imap,
k Dimensions of the projected space multihop, phf, spy, warezclient, warezmaster
α Learning rates Probe ipsweep, nmap, portsweep, satan
Step Learning step U2R loadmodule, perl, rootkit

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Zhao et al. EURASIP Journal on Information Security (2019) 2019:1 Page 6 of 13

Table 3 Number of instances in NSL-KDD setting. In network security, there are circumstances that
Class Instances Percentage we need to incorporate new features to better detect
Normal 67343 53.46 the attacks. For example, traffic feature is more distin-
DoS 45927 36.46 guishable for DoS attack. However, for the R2L attack,
the content feature is more distinguishable. This usually
R2L 995 0.79
need to retrain the model. To stimulate this scenario, we
Probe 11656 9.25
selected the most relative features for the source and tar-
U2R 52 0.04 get domains using information gain, resulting in unequal
feature dimensions. The final selected features were listed
requiring the processing of the packet content. Some in Appendix Tables 5 and 6. Of note, using information
example of features are listed in Table 4. gain here is only for generating different feature sets, not
for improving the performance. In real practice, features
6.2 Experimental setting can be changed due to the manual feature engineering
6.2.1 Detection of unknown network attacks as we have less information about the target dataset. The
This experiment is to evaluate the proposed transfer baseline approach manually mapping the target data into
learning approaches for detecting new variants of attacks. the source feature space and applied the traditional clas-
Stimulating new attacks is challenging. We can assume sifiers. We compared our transfer learning approach with
attacks in the target data has no labels and differ from the baselines.
attacks in the source domain. We randomly selected mali-
cious examples from one main attack category (e.g., DoS, 6.3 Evaluation
R2L, Probe) and normal examples as the source domain. We chose the accuracy, F1 score (F − Measure) and
Then, we chose a different attack type combined with receiver operating characteristic curve (ROC curve) as the
normal samples for the target domain. We finally gener- performance metrics. F1 score combines precision and
ated three groups: DoS→Probe (DoS is the source domain recall to measure the per-class performance of classifica-
for training and Probe is target domain for testing), tion or detection algorithms.
DoS→R2L and Probe→R2L). To evaluate the general- We firstly chose C4.5 decision tree (CART), linear
ization, we also chose attacks from 22 sub-attack types SVM, and KNN as the baselines, which were also served
for each source and target set and generated 11 tasks. for base classifiers for HeTL and CeHTL. We com-
We repeated the processes ten times and reported the pared HeTL and CeHTL with baselines on three main
averages and standard deviations. We make the attack transfer learning tasks (i.e., DoS→Probe, DoS→R2l, and
data, and the normal data in each domain are balanced Probe→R2L). Figures 2 and 3 show the box plots of accu-
unless stated otherwise. We further studied the effects of racy and F1 score on ten iterations on three main tasks.
imbalanced data in Section 6.4. We observed that the baseline models performed poorly,
with accuracy of 0.47–0.74 and F1 score of 0.1–0.65.
6.2.2 Network attacks with different feature spaces Our HeTL and CeHTL significantly outperformed the
To evaluate the performance in detecting attacks using baselines, obtained over 0.70 accuracy and 0.75 F1 score.
different feature spaces, we used different feature sets for CeHTL outperformed HeTL with all three base classi-
source and target domains, based on the first experiment fiers in DoS→Probe and in decision tree and KNN in

Table 4 Some selected features in NSL-KDD


Feature name Description Feature category
Duration Duration of the connection Basic features
Src_bytes Data bytes from source to destination Basic feature
Dst_bytes Data bytes from destination to source Basic feature
Num_failed_logins Number of incorrect login in a connection Content feature
Srv_count Sum of connections to the same destination port number Traffic feature
Serror_rate Percentage of connections that have “SYN” errors among the connections to the same Traffic feature
host in the past 2 s
Srv_serror_rate Percentage of connections that have “SYN” errors among the connections to the same Traffic feature
destination port in the past 2 s
Dst_host_count Sum of connections to the same destination IP address Traffic feature
Dst_host_same_srv_rate The percentage of connections that were to the same service, among the connections Traffic feature
aggregated in dst_host_count

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Zhao et al. EURASIP Journal on Information Security (2019) 2019:1 Page 7 of 13

Fig. 2 Box plot of accuracy of transfer learning approaches and baselines on three main tasks

Probe→R2L. CeHTL achieved the best result with an different feature spaces. We compare the transfer learn-
average accuracy and F1 score of 0.88. ing approach with the manual mapping approach on
Then, we applied HeTL, CeHTL, and two baseline DoS→R2L. From the results shown in Fig. 8, we can see
methods—SVM and HeMap [26], a novel transfer learn- that the transfer learning approaches outperformed the
ing approach—to the 11 transfer learning tasks generated baselines.
by the subtypes of attacks, along with the 3 main tasks. We
run the experiment for 10 iterations with different random 6.4 Discussion
seeds and reported the average and standard deviations The study proposed two transfer learning methods, HeTL
of accuracy and F1 scores in Figs. 4 and 5. We observed and CeHTL, on network attack detection methods to
(1) transfer learning approaches outperformed the tradi- address the issues of lacking sufficient labels for new
tional classifiers without using transfer learning in all 14 attacks. The results showed that HeTL and CeHTL signif-
tasks, (2) HeTL and CeHTL can improve the accuracy to icantly improved the accuracy compared to the traditional
0.8–0.9 in 5 tasks, (3) HeTL and CeHTL outperformed classifiers and other transfer learning methods. Especially,
HeMap, and (4) CeHTL outperformed all other methods CeHTL performed the best in most of the tasks, espe-
in 10 cases. Figure 6 shows the ROC curves on 3 main cially in DoS→Probe tasks. One of the reason is DoS had
transfer learning tasks using KNN as the base classi- more similarities with Probe than R2L, according to the
fier. CeHTL achieved the best area under ROC curves top selected features in Appendix Table 5 and 6. This can
(AUC) in 2 DoS→Probe and Probe→R2L (CeHTL 0.93 improve the accuracy of computing the cluster correspon-
and 0.91 AUC vs. HeTL 0.82 and 0.65 AUC). Besides dence, which thus resulted in a better performance.
HeMap, we compared our approaches with more base-
lines, TCA [24] and CORAL [23]. Figure 7 showed the 6.4.1 Parameter sensitivity
results of approaches on 5 classifiers in DoS→R2L. HeTL Two hyper-parameters, the similarity confidence param-
and CeHTL outperformed all baselines. eter β and the dimensions of the new feature space k,
Finally, we carried out the second experimental set- need to be set for optimization (4). There are several
ting, where the source domain and target domain have ways to determine the optimum hyper-parameters: (a) the

Fig. 3 Box plot of F1 score of transfer learning approaches and baselines on three main tasks

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Zhao et al. EURASIP Journal on Information Security (2019) 2019:1 Page 8 of 13

Fig. 4 Performance comparison of accuracy on unknown network attacks detection, sample size = 1000

Fig. 5 Performance comparison of F1 score on unknown network attacks detection, sample size = 1000

(a) (b) (c)


Fig. 6 Performance comparison of ROC curves on the three transfer learning datasets. a ROC curve on DoS→Probe. b ROC curve on DoS→R2L
c. ROC curve on Probe→R2L

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Zhao et al. EURASIP Journal on Information Security (2019) 2019:1 Page 9 of 13

Fig. 7 Performance comparison of feature-based transfer learning approaches on DoS→ R2L

similarity confidence β can be determined by computing the similarity confidence parameter β, as shown in Fig. 10,
the similarity or distance between the source and target CeHTL shows a significant improvement and stays stable
data, (b) the optimal number of both parameters can be from β ≥ 0, because the correspondence has been auto-
found by enumerating the number of parameters, or (c) matically computed and involved in the transfer learning,
the parameters can be set empirically. However, the first so β should be set larger than 0. For the parameter k,
and second approaches need a few labeled data from the in general, CeHTL shows an outstanding and stable per-
target domain, which is not a truly “unknown” situation. formance than other approaches. The results show that
We studied the impact of different parameter settings on CeHTL is more suitable for unknown network detection
the performance of detecting attacks. Figure 9 demon- since we can empirically set the parameters and do not
strates the effect on accuracy by using different parameter reply heavily on information about the labeled data in the
combinations of β and k (where β ∈[ 0, 1] and k ranges target domain.
from 1 to 6). Figures 10 and 11 demonstrate the average
accuracy achieved on parameters β and k. 6.4.2 The imbalanced data effects
Compared with HeMap, both HeTL and CeHTL In many real cases, the size of normal and attack data
improve the highest accuracy achieved with different would be not equal. Thus, we investigated the perfor-
parameter settings, shown in Fig. 9. However, HeTL is mance of the HeTL and CeHTL on imbalanced data.
sensitive to parameter tuning, showing lower accuracy in Figure 12 shows the F1 score of the transfer learn-
some specific parameter combinations. CeHTL performs ing approaches and baselines in different percentage
more stably. For example, in DoS→Probe, after several of the attack data. We observed the baseline method
fluctuation, CeHTL can maintain around 0.8 accuracy. For performed poorly on the imbalanced data, especially

Fig. 8 Performance comparison on heterogeneous spaces on DoS→R2L

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Zhao et al. EURASIP Journal on Information Security (2019) 2019:1 Page 10 of 13

(a) (b) (c)


Fig. 9 Accuracy comparison with different combinations of k and β, sample = 1000. a DoS→Probe. b DoS→R2L. c Probe→R2L

(a) (b) (c)


Fig. 10 Study of parameter β sensitivity on three main detection tasks, sample = 1000. a DoS→Probe. b DoS→R2L. c Probe→R2L

(a) (b) (c)


Fig. 11 Study of parameter k sensitivity on the three main detection tasks, sample = 1000. a DoS→Probe. b DoS→R2L. c Probe→R2L

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Zhao et al. EURASIP Journal on Information Security (2019) 2019:1 Page 11 of 13

(a) (b) (c)


Fig. 12 The performance on imbalanced data by varying the portion of attack data, sample = 1000. a DoS→Probe. b DoS→R2L. c Probe→R2L

in DoS→R2L and Probe→R2L. The transfer learning the same distribution. However, in most real cases, contin-
approaches improved F1 scores in most cases. Although uously evolving attacks and the lack of sufficient labeled
all the methods had a lower F1 score in 10% attack data, datasets hinder the ability of supervised learning tech-
HeTL and CeHTL boosted the F1 by 50% when adding niques to detect new attacks. In this paper, we introduced
another 10% of attack data, and the metric kept rising with a feature-based transfer learning framework and transfer
increasing the attack data. learning approaches. We presented a feature-based trans-
fer learning approach using a linear transformation, called
6.4.3 The training size HeTL. We also proposed a cluster enhanced transfer
We studied how much training data was needed for learning approach, called CeHTL, to make it more robust
unknown attack detection. We plot the learning curves in in detecting unknown attacks. We evaluated the transfer
Fig. 13. From the results, we observed that CeHTL gained learning approaches on common classifiers. The results
the best accuracy at a 500 sample size in DoS→Probe and showed the transfer learning approaches improve the per-
DoS→R2L, and the second best accuracy in Probe→R2L. formance of detecting unknown network attacks com-
CeHTL needs the smallest training sample size, which pared to baselines. Spectacularly, CeHTL exhibited higher
makes it the best option given a limited amount of training performance and the ability to be more robust in detect-
data. ing unknown attacks with no labeled data. The results also
demonstrated that the proposed transfer learning tech-
7 Conclusion niques can support different feature spaces. In the future,
Machine learning have been employed in detecting the we aim to apply the model to various attack domains, such
occurrence of malicious attacks. Most machine learning as malware detection. We also plan to combine transfer
techniques for attack detection are effective only given the learning with deep learning to pre-train the models for
assumptions that the training and testing data are from practical use.

(a) (b) (c)


Fig. 13 Learning curves on different training size. a DoS→Probe. b DoS→R2L. c Probe→R2L

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Zhao et al. EURASIP Journal on Information Security (2019) 2019:1 Page 12 of 13

Appendix Acknowledgements
Not applicable.
Table 5 Top features for detecting DoS, used in the second
Funding
experiment
The research presented in this paper was supported by Office of the Assistant
Rank index Features Score Secretary of Defense for Research and Engineering (OASD (R&E)) agreement
FA8750-15-2-0120 and Boeing Data Analytics agreement BRT-L1015-0006.
1 srv_serror_rate 0.504
2 serror_rate 0.500 Availability of data and materials
Not applicable.
3 flag 0.475
4 dst_host_srv_serror_rate 0.441 Authors’ contributions
5 src_bytes 0.426 JZ carried out the data processing, design and implementation of the
proposed algorithms, experiment setup, and results evaluation and drafted
6 logged_in 0.417 the manuscript. SS contributed to the conception, experiment design, and
7 dst_host_serror_rate 0.392 evaluation of the proposed approach and results and helped draft the
manuscript. JWP provided oversight for data and experimentation and
8 diff_srv_rate 0.383 participated in the manuscript editing. CK and KK helped revised the
9 dst_bytes 0.334 manuscript. All authors read and approved the final manuscript.

10 same_srv_rate 0.279
Authors’ information
11 service 0.181 Juan Zhao is working as a postdoc fellowship in Department of Biomedical
12 dst_host_diff_srv_rate 0.173 Informatics at Vanderbilt University. Before that, she was a research scientist
and Adjunct Graduate Faculty in Tennessee State University. She received her
13 dst_host_same_srv_rate 0.162 PhD from University of Chinese Academy of Sciences in 2012 and B.E. from
Shandong University in 2006, both degrees in Computer Science. She worked
14 wrong_fragment 0.161
as an Associate Research Professor from 2012 to 2015 in Chinese Network
15 dst_host_srv_diff_host_rate 0.150 Information Center, Chinese Academy of Sciences. Her research interests
include machine learning in cyber security, bioinformatics, transfer learning,
16 dst_host_srv_count 0.150 anomaly detection, feature engineering, natural language processing, and
17 count 0.138 social network analysis.
Sachin Shetty is an associate professor in the Virginia Modeling, Analysis and
18 dst_host_count 0.136 Simulation Center at Old Dominion University. He holds a joint appointment
19 srv_diff_host_rate 0.135 with the Department of Modeling, Simulation and Visualization Engineering
and the Center for Cybersecurity Education and Research. He received his PhD
20 duration 0.115 in Modeling and Simulation from the Old Dominion University in 2007 under
the supervision of Prof. Min Song. Prior to joining Old Dominion University, he
was an associate professor in the Electrical and Computer Engineering
Department at Tennessee State University. His research interests lie at the
intersection of computer networking, network security, and machine learning.
Jan Wei Pan has received his PhD from Virginia Polytechnic Institute and State
University and B.E. from University of New South Wales, both degrees in
Mechanical Engineering. He was an advanced technologist at The Boeing
Table 6 Top features for detecting R2L, used in the second Company and the company’s principal investigator on projects that specialize
experiment in machine learning, computer vision, and robotics. He was also a
Rank index Features Score board-certified Professional Engineer in Virginia and an Adjunct Graduate
Faculty in Tennessee State University. Currently, he is R&D Director in AutoX
1 srv_count 0.399 Inc. His research activities are related to various aspects of machine learning,
such as deep learning, change detection, predictive analytics, data mining,
2 count 0.326
and pattern recognition.
3 dst_host_srv_count 0.307 Charles Kamhoua received the BS in electronic from the University of Douala
(ENSET), Cameroon, in 1999, and the MS in Telecommunication and
4 service 0.283
Networking and the PhD in Electrical Engineering from Florida International
5 dst_bytes 0.243 University, in 2008 and 2011, respectively. In 2017, he joined the Network
Security Branch of the US Army Research Laboratory, Adelphi, MD. From 2011
6 src_bytes 0.231 to 2017, he worked at the Cyber Assurance Branch of the US Air Force
7 hot 0.225 Research Laboratory (AFRL), Rome, NY, as a National Academies Post-doctoral
Fellow and became a Research Electronics Engineer in 2012. His current
8 is_guest_login 0.215 research interests include the application of game theory to cyber security,
9 protocol_type 0.208 survivability, cloud computing, hardware Trojan, online social network,
wireless communication, and cyber threat information sharing.
10 srv_diff_host_rate 0.176 Kevin Kwiat has been with the US Air Force Research Laboratory (AFRL) in
11 dst_host_srv_diff_host_rate 0.175 Rome, NY, for over 32 years. Currently, he is assigned to the Cyber Assurance
Branch. He received the BS in Computer Science and the BA in Mathematics
12 dst_host_same_src_port_rate 0.162 from Utica College of Syracuse University and the MS in Computer
13 num_failed_logins 0.154 Engineering and PhD in Computer Engineering from Syracuse University. He is
also an adjunct professor of Computer Science at the State University of New
14 dst_host_count 0.127 York at Utica/Rome, an adjunct instructor of Computer Engineering at
Syracuse University, and a research associate professor at the University at
15 flag 0.104
Buffalo. His main research interest is dependable computer design.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Zhao et al. EURASIP Journal on Information Security (2019) 2019:1 Page 13 of 13

Competing interests 17. F. Iglesias, T. Zseby, Analysis of network traffic features for anomaly
The authors declare that they have no competing interests. detection. Mach. Learn. 101(1-3), 59–84 (2014)
18. S. J. Pan, Q. Yang, A survey on transfer learning. IEEE Trans. Knowl. Data
Publisher’s Note Eng. 22(10), 1345–1359 (2010)
Springer Nature remains neutral with regard to jurisdictional claims in 19. S. Bickel, M. Brückner, T. Scheffer, in Prof. of the 24th International Conf. on
published maps and institutional affiliations. Machine Learning. ICML ’07. Discriminative learning for differing training
and test distributions (ACM, New York, 2007), pp. 81–88
Author details
1 Vanderbilt University Medical Center, 37203 Nashville, USA. 2 Virginia 20. W. Dai, Q. Yang, G.-R. Xue, Y. Yu, in Prof. of the 24th International Conf. on
Machine Learning. ICML ’07. Boosting for transfer learning (ACM, New York,
Modeling Analysis and Simulation Center, Old Dominion University, 23529
2007), pp. 193–200
Norfolk, USA. 3 AutoX Inc, San Jose, California, USA. 4 US Army Research
21. T. Evgeniou, M. Pontil, in Prof. of the Tenth ACM SIGKDD International Conf.
Laboratory’s Network Security Branch, 20783 Adelphi, USA. 5 Haloed Sun TEK,
on Knowledge Discovery and Data Mining. KDD ’04. Regularized multi–task
LLC, in affiliation with the CAESAR Group, Sarasota, Florida, USA.
learning (ACM, New York, 2004), pp. 109–117
22. E. Bonilla, K. M. Chai, C. Williams, Multi-task Gaussian process prediction.
Received: 3 July 2018 Accepted: 18 January 2019
Adv. Neural Inf. Process. Syst. 20(October), 153–160 (2008)
23. B. Sun, J. Feng, K. Saenko, in Proceedings of the Thirtieth AAAI Conference on
Artificial Intelligence. AAAI 16. Return of frustratingly easy domain
References adaptation (AAAI Press, 2016), pp. 2058–2065. http://dl.acm.org/citation.
1. R. Perdisci, W. Lee, N. Feamster, in NSDI, vol. 10. Behavioral clustering of cfm?id=3016100.3016186
http-based malware and signature generation using malicious network 24. S. J. Pan, I. W. Tsang, J. T. Kwok, Q. Yang, Domain adaptation via transfer
traces (USENIX Association, Berkeley, 2010), p. 14 component analysis. IEEE Trans. Neural Netw. 22(2), 199–210 (2011)
2. C. Rossow, C. Dietrich, H. Bos, L. Cavallaro, M. V. Steen, F. C. Freiling, 25. B. Kulis, K. Saenko, T. Darrell, in Computer Vision and Pattern Recognition
N. Pohlmann, in BADGERS ’11 Prof. of the First Workshop on Building Analysis (CVPR), 2011 IEEE Conf. On. What you saw is not what you get: domain
Datasets and Gathering Experience Returns for Security. Sandnet: Network adaptation using asymmetric kernel transforms (IEEE Computer Society,
traffic analysis of malicious software, (2011), pp. 78–88 Los Alamitos, 2011), pp. 1785–1792
3. N. Stakhanova, M. Couture, A. A. Ghorbani, in Prof. of the 2011 6th 26. X. Shi, Q. Liu, W. Fan, P. S. Yu, R. Zhu, in Prof. - IEEE International Conf. on
International Conf. on Malicious and Unwanted Software, Malware 2011. Data Mining, ICDM. Transfer learning on heterogenous feature spaces via
Exploring network-based malware classification (IEEE Computer Society, spectral transformation (IEEE, Los Alamitos, 2010), pp. 1049–1054
Washington, DC, 2011), pp. 14–19 27. J. Nam, S. J. Pan, S. Kim, in Prof. of the 2013 International Conf. on Software
4. D. Bekerman, B. Shapira, L. Rokach, A. Bar, in Communications and Network Engineering. ICSE ’13. Transfer defect learning (IEEE Press, Piscataway,
Security (CNS), 2015 IEEE Conference On. Unknown malware detection 2013), pp. 382–391
using network traffic classification (IEEE, Los Alamitos, 2015), pp. 134–142 28. B. Long, Y. Chang, A. Dong, J. He, in WSDM. Pairwise cross-domain factor
5. K. Bartos, M. Sofka, V. Franc, in USENIX Security 2016. Optimized invariant model for heterogeneous transfer ranking (ACM, New York, 2012), p. 113
representation of network traffic for detecting unseen malware variants 29. S. Gou, Y. Wang, L. Jiao, et al., in 2009 IEEE International Symposium on
(USENIX Association, Austin, 2016), pp. 807–822 Parallel and Distributed Processing with Applications. Distributed transfer
6. J. Zhao, S. Shetty, J. W. Pan, in Military Communications Conference, network learning based intrusion detection (IEEE, Los Alamitos, 2009),
(MILCOM). Feature-based transfer learning for network security (IEEE, Los pp. 511–515
Alamitos, 2017) 30. J. Gao, W. Fan, J. Jiang, J. Han, in Prof. of the 14th ACM SIGKDD International
7. A. Javaid, Q. Niyaz, W. Sun, M. Alam, in Proceedings of the 9th EAI Conference on Knowledge Discovery and Data Mining. Knowledge transfer
International Conf. on Bio-inspired Information and Communications via multiple model local structure mapping (ACM, New York, 2008),
Technologies (Formerly BIONETICS), BICT’15. A deep learning approach for pp. 283–291
network intrusion detection system (ICST, ICST, 2016), pp. 21–26 31. D. Arthur, S. Vassilvitskii, in Proceedings of the Eighteenth Annual ACM-SIAM
8. F. Zhuang, X. Cheng, P. Luo, S. J. Pan, Q. He, in Proceedings of the 24th Symposium on Discrete Algorithms. SODA ’07. K-means++: the advantages
International Conference on Artificial Intelligence, IJCAI’15. Supervised of careful seeding (Society for Industrial and Applied Mathematics,
representation learning: transfer learning with deep autoencoders (AAAI Philadelphia, 2007), pp. 1027–1035. http://dl.acm.org/citation.cfm?id=
Press, 2015), pp. 4119–4125 1283383.1283494
9. K. D. Feuz, D. J. Cook, Transfer learning across feature-rich heterogeneous 32. H. Abdi, L. J. Williams, Principal component analysis. Wiley Interdiscip. Rev.
feature spaces via feature-space remapping (FSR). ACM Trans. Intell. Syst. Comput. Stat. 2(4), 433–459 (2010)
Technol. 6(1), 3–1327 (2015)
10. D. Lin, Network intrusion detection and mitigation against denial of service
attack. Technical Report MS-CIS-13-04. (Department of Computer and
Information Science Technical, University of Pennsylvania, 2013)
11. NSL-KDD, UNB IUNB ISCX NSL-KDD DataSet (2016). http://www.unb.ca/
research/iscx/dataset/iscx-NSL-KDD-dataset.html. Accessed 01 May 2016
12. A. Valdes, K. Skinner, Adaptive, Model-Based Monitoring for Cyber Attack
Detection. (H. Debar, L. Mé, S. F. Wu, eds.) (Springer, Berlin, Heidelberg,
2000), pp. 80–93
13. M. Hilker, C. Schommer, in Conf.s in Research and Practice in Information
Technology Series, vol. 54. Description of bad-signatures for network
intrusion detection (ACSW, 2006), pp. 175–182
14. H. Han, X.-L. Lu, L.-Y. Ren, in Machine Learning and Cybernetics, 2002.
Proceedings. 2002 International Conference On, vol. 1. Using data mining to
discover signatures in network-based intrusion detection (IEEE, Los
Alamitos, 2002), pp. 13–17
15. S. Nari, A. A. Ghorbani, in Prof. of the 2013 International Conf. on Computing,
Networking and Communications (ICNC). ICNC ’13. Automated malware
classification based on network behavior (IEEE Computer Society,
Washington, 2013), pp. 642–647
16. M. Z. Rafique, P. Chen, C. Huygens, W. Joosen, in Prof. of the 2014
conference on Genetic and evolutionary computation - GECCO ’14.
Evolutionary algorithms for classification of malware families through
different network behaviors (ACM, New York, 2014), pp. 1167–1174

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:

1. use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
2. use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at

[email protected]

You might also like