Cloud Intrusion Detection Method Based On Stacked Contractive Auto-Encoder and Support Vector Machine
Cloud Intrusion Detection Method Based On Stacked Contractive Auto-Encoder and Support Vector Machine
3, JULY-SEPTEMBER 2022
Abstract—Security issues have resulted in severe damage to the cloud computing environment, adversely affecting the healthy and
sustainable development of cloud computing. Intrusion detection is one of the technologies for protecting the cloud computing
environment from malicious attacks. However, network traffic in the cloud computing environment is characterized by large scale, high
dimensionality, and high redundancy, these characteristics pose serious challenges to the development of cloud intrusion detection
systems. Deep learning technology has shown considerable potential for intrusion detection. Therefore, this study aims to use deep
learning to extract essential feature representations automatically and realize high detection performance efficiently. An effective stacked
contractive autoencoder (SCAE) method is presented for unsupervised feature extraction. By using the SCAE method, better and robust
low-dimensional features can be automatically learned from raw network traffic. A novel cloud intrusion detection system is designed on
the basis of the SCAE and support vector machine (SVM) classification algorithm. The SCAEþSVM approach combines both deep and
shallow learning techniques, and it fully exploits their advantages to significantly reduce the analytical overhead. Experiments show that
the proposed SCAEþSVM method achieves higher detection performance compared to three other state-of-the-art methods on two
well-known intrusion detection evaluation datasets, namely KDD Cup 99 and NSL-KDD.
Index Terms—Cloud computing, intrusion detection system (IDS), feature extraction, deep learning, contractive auto-encoder, support vector
machine
1 INTRODUCTION between VMs in the cloud environment will account for 85%
[2]. Network traffic will continue to increase dramatically
LOUD computing [1] is an emerging Internet-based com-
C puting model that provides tenants with seemingly
“unlimited” IT services, thereby freeing them from complex
and will inevitably encounter malicious attacks. Network
attacks not only result in severe damage to the cloud envi-
ronment but also cause tenants to lose confidence in cloud
underlying hardware, software, and protocol stacks.
computing itself, which will adversely affect the healthy and
Although “open for all service” is the essence of cloud com-
sustainable development of cloud computing. Intrusion
puting, it does not necessarily comprise useless information.
detection is one of the technologies for protecting cloud com-
Tenants can use cloud services for efficient computing. How-
puting from malicious attacks. Therefore, we study cloud
ever, they can also abuse the cloud environment and attack
intrusion detection systems (CIDSs) to detect and analyze
the network. For example, a malicious tenant may reside in a
network traffic, particularly “east-west” traffic, identify
virtual machine (VM), successfully intrude into other VMs in
malicious attack behaviors, and prevent damages, thus
the cloud, and use the puppet machines to spread malicious
guaranteeing the safety and reliability of cloud computing.
software, or launch distributed denial of service (DDoS)
However, cloud infrastructure is constructed with the
attack, and so on. In fact, tenant behavior will generate mas-
technology of virtualization, which renders the virtual net-
sive network traffic in the cloud environment, mainly includ-
work flow between VMs invisible and uncontrollable by
ing “north-south” and “east-west” traffic. The “north-south”
the traditional intrusion detection system (IDS). A software-
traffic mainly refers to the traffic of tenants accessing cloud
defined network (SDN) has characteristics such as program-
services from the external network, and the “east-west” traf-
fic refers to the traffic between VMs in the virtual network mability, centralized control, and global view, so it is widely
[2]. Cisco’s cloud industry research report predicts that the used in cloud computing. Previous studies [3], [4], [5], [6]
global cloud network traffic will account for 95% of the total proposed the use of SDN technology to redirect network traf-
network traffic by 2021. In particular, the “east-west” traffic fic to Snort IDS for detecting malicious attacks. Snort, a signa-
ture-based detection system, cannot detect unknown attacks
and adapt to large-scale traffic. On the contrary, anomaly
The authors are with PLA Information Engineering University, Zhengzhou,
Henan 450000, China. E-mail: [email protected], [email protected],
detection, developed as classifiers to differentiate anomalous
{officeshan, 18530023930}@163.com, [email protected]. traffic from normal traffic, is well suited for the detection of
Manuscript received 9 September 2019; revised 21 February 2020; accepted 4 unknown attacks, but it has a high false alarm rate.
June 2020. Date of publication 9 June 2020; date of current version 6 Septem- Many types of shallow discriminative machine learning
ber 2022. techniques have been extensively applied to IDS, such as
(Corresponding author: W. Wang.) the neural network (NN), random forest (RF), decision tree
Recommended for acceptance by V. Piuri.
Digital Object Identifier no. 10.1109/TCC.2020.3001017 (DT), and support vector machine (SVM). However, these
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
WANG ET AL.: CLOUD INTRUSION DETECTION METHOD BASED ON STACKED CONTRACTIVE AUTO-ENCODER AND SUPPORT VECTOR... 1635
approaches provide unsatisfactory classification or detec- zero-day attacks with high accuracy. And they used a stochas-
tion accuracy. The intrusion detection result depends not tic approach to determine the threshold value that directly
only on the performance of the classifier but also on the affects the accuracy of the proposed models. Niyaz et al. [21]
quality of the input data. Network traffic data usually developed an effective and flexible NIDS referred as self-
involves high dimensionality and redundancy of features, taught learning (STL), which combines a sparse AE used for
which can easily cause a feature dimensionality disaster. unsupervised dimensionality reduction with softmax regres-
Therefore, feature dimensionality reduction is particularly sion used to train the classifier. This method achieved satisfac-
important for effectively improving the performance of the tory classification accuracy in 2-class, 5-class, and 23-class
above-mentioned supervised classifiers [7]. It includes two classification tasks. Subsequently, Niyaz et al. [22] developed
types of techniques: feature subset selection and feature a DDoS intrusion detection system and applied it to the SDN
extraction [8]. Feature subset selection works by removing environment. They used the stacked auto-encoder (SAE) for
relevant or redundant features; the subset of features feature reduction and evaluated the detection performance of
selected will give the best performance according to some the SAE-SVM model on network traffic collected from real
objective function. Many studies [9], [10], [11], [12], [13] and private network test beds.
have demonstrated that feature selection methods can over- Studies on intrusion detection using the KDD Cup 99
come the “dimensionality curse” and achieve high detection dataset have been reported [23], [24], [25]. Kim et al. [23]
performance in CIDS. Feature extraction maps the original specifically targeted advanced persistent threats and pro-
high-dimensional features into low-dimensional features posed a deep neural network (DNN) using 100 hidden
and generates new linear or nonlinear combinations of the units, combined with the rectified linear unit activation
original features [8]. Recently, various researchers have function and the ADAM optimizer. Their approach was
demonstrated that deep learning technology has consider- implemented on a GPU using TensorFlow. Papamartziva-
able potential for IDS, especially in feature extraction. This nos et al. [24] proposed a novel method that combines the
study aims to use the deep learning technique to automati- benefits of a sparse AE and the MAPE-K framework to
cally extract essential features from raw network data and deliver a scalable, self-adaptive and autonomous misuse
input them into a shallow classifier for effective identifica- IDS. They merged the datasets provided by KDD Cup 99
tion of attacks. and NSL-KDD to create a single voluminous dataset. Shone
The remainder of this paper is organized as follows. Section 2 et al. [25] designed a new non-symmetric deep autoencoder
reviews exiting studies. Section 3 presents relevant background (NDAE) model, which unlike typical AEs, provides non-
information. Section 4 describes the design and training process symmetric data dimensionality reduction. This model was
of the proposed stacked contractive autoencoder (SCAE)–SVM combined with an RF classification algorithm to construct a
model in detail. Section 5 discusses the design of the CIDS classifier. This method achieved satisfactory results on the
framework. Section 6 presents and analyzes the experimental KDD Cup 99 and NSL-KDD datasets.
results. Finally, Section 7 states the conclusions and explores Studies using private or other public datasets for intrusion
directions for future work. detection have also been documented [26], [27], [28]. Loukas
et al. [26] used RNN-based deep learning enhanced by long
short term memory (LSTM) to considerably increase intru-
2 EXISTING STUDIES sion detection accuracy for a robotic vehicle. They demon-
Deep learning approaches are mainly categorized into strated that their approach achieves high accuracy with
supervised learning and unsupervised learning. The differ- considerably more consistency than with standard machine
ence between these two approaches lies in the use of labeled learning techniques. Yu et al. [27] proposed a network intru-
training data. Specifically, convolutional neural network sion detection model based on TCP, UDP, and ICMP ses-
(CNN) [14] that use labeled data fall under supervised sions, which combines the stacked denoising auto-encoder
learning, which employs a special architecture suitable for (SDAE) and softmax classifier. Comparative experiments
image recognition. Unsupervised learning methods include showed that the performance of this model is better than that
deep belief network (DBN) [15], recurrent neural network of DBN and SAE in 2-class and 8-class classification tasks.
(RNN) [16], autoencoder (AE) [17], and their variants. Next, This method was also validated on other public datasets,
we describe recent studies related to our work; these studies such as the UNB ISCX IDS 2012 and CTU-13 datasets. Later,
are mainly based on KDD Cup 99 or NSL-KDD datasets. Yu et al. [28] proposed a stacked dilated convolutional auto-
Studies on intrusion detection using the NSL-KDD data- encoder (DCAE) method, which can automatically learn key
set have been reported [18], [19], [20], [21], [22]. Tang et al. features from a large amount of raw network traffic. DCAE
[18] used deep neural networks (DNNs) to build an anom- has fewer parameters than fully connected neural networks
aly detection model in the software-defined network (SDN) such as SAE. However, the limitation of the DCAE model
environment. They trained their model by using 6 basic fea- lies in the relatively long training process, and the authors
tures taken from the 41 features of the NSL-KDD dataset. planned to adopt GPU parallelization technology to over-
Salama et al. [19] used DBN to extract features for intrusion come this problem in the future. Yu et al. trained and tested
detection and SVM to classify the data after dimensionality their model with a private dataset, so the model cannot be
reduction. Experimental results showed that their hybrid directly compared with other schemes.
DBNþSVM method improves the detection performance From the above-mentioned findings, we can conclude that
compared with using SVM or DBN as standalone classifiers. deep learning has been successfully applied to network
Aygun et al. [20] proposed two deep learning-based anom- intrusion detection. However, it remains in its infancy. Most
aly detection models using AE and denoising AE to detect researchers are still combining it with various algorithms
1636 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 10, NO. 3, JULY-SEPTEMBER 2022
through numerous experiments (structure, training, optimi- Here, we use u ¼ fw; bg and u0 ¼ fw0 ; b0 g, where u and u0
zation, etc.) to explore the most effective solution. Hence, we represent the parameters of the encoder and decoder,
believe that our study can make a significant contribution to respectively. The goal of learning is to minimize the recon-
cloud intrusion detection research. struction error between the input x and the output z by
The contributions of this study can be summarized as adjusting these parameters. The minimization objective
follows: function is defined as
X Xn
We propose a novel stacked contractive autoencoder JAE ¼ Lðx; zÞ ¼ LðxðiÞ ; gu0 ðfu ðxðiÞ ÞÞÞ; (3)
(SCAE)-based deep learning method for network x2Dn i¼1
intrusion detection. The SCAE method can automati-
cally learn essential and robust low-dimensional fea- where L is the loss function or reconstruction error function,
tures from raw network traffic and input them into a which is typically the mean squared error (4) or the cross-
shallow SVM classifier. Leveraging on the respective entropy loss (5).
strengths of deep and shallow learning techniques,
1X n
2
their combination can effectively improve detection Lðx; zÞ ¼ ðxðiÞ zðiÞ Þ (4Þ
performance. n i¼1
We design a cloud intrusion detection system (CIDS) 1X n
that uses an SDN frame for collecting virtual network Lðx; zÞ ¼ ½xðiÞ log zðiÞ þ ð1 xðiÞ Þlog ð1 zðiÞ Þ: (5Þ
n i¼1
traffic from the Xen cloud platform and apply the
SCAEþSVM model for feature extraction and classifi-
In general, the smaller the reconstruction error, the closer
cation detection. The proposed system attempts to
output z is to input x, which implies that h is an effective
detect attacks on the data plane and is implemented
low-dimensional feature representation. However, the
on the application plane.
reconstruction criterion alone will lead to the problem of
We evaluate our SCAEþSVM IDS model by applying
the output being identical to the input; therefore, the AE
it on two well-known datasets frequently used to
cannot effectively extract features. To address this problem,
evaluate the detection performance of IDS. The exper-
we can adopt strategies such as adding constraint represen-
imental results show the effect of the network struc-
tation or corrupting the input by adding noise.
ture depth and extracted feature number on the
detection performance. In addition, they demonstrate
3.2 Denoising Autoencoder (DAE)
that compared with the results of existing similar
Unlike the conventional AE, the denoising autoencoder
models, our model achieves better or similar results.
(DAE) [29] aims to learn a more effective and robust feature
representation from the corrupted input data x^. The DAE
3 AUTOENCODER AND ITS VARIANTS first corrupts the input x and then sends the corrupted data
x^ into the auto-encoder for denoising. Finally, it recon-
3.1 Autoencoder (AE)
structs the clean version x. This process yields the following
An autoencoder (AE) is an unsupervised feature dimension-
objective function:
ality reduction technique, with its structure consisting of an
encoder and a decoder, including an input layer, a hidden X
n
layer, and an output layer. The encoder is used for dimension- JDAE ¼ xðiÞ ÞÞÞ;
Ex^ðiÞ qð^xðiÞ jxðiÞ Þ LðxðiÞ ; gu0 ðfu ð^ (6)
ality reduction and the decoder is used for reconstruction, i¼1
which is regarded as the reverse process of the encoder. where the expectation is over the corrupted versions x^ of
Let a training dataset D have n samples; then D ¼ samples x obtained from a corruption process qð^xjxÞ . Com-
fxðiÞ ; yðiÞ jxðiÞ 2 <dx ; yðiÞ 2 <gni¼1 , where each sample x is a dx- mon corruption approaches include additive isotropic
dimensional feature vector and y is the class label. The Gaussian noise (GS), i.e., x^jx Nð0; s 2 IÞ, and binary mask-
encoder function f maps input x into a hidden representa- ing noise (MN), which randomly sets a fraction of input fea-
tion h 2 <dh , and the decoder function g maps hidden repre- tures to 0.
sentation h back to a reconstruction z 2 <dx . When the
number of hidden layer neurons are less than the number of 3.3 Contractive Autoencoder (CAE)
input layer and output layer neurons, that is, dh< dx, we Rifai et al. followed up on DAE and proposed the contrac-
obtain the compressed vector of the input, and thus realize tive autoencoder (CAE) [30]. The aim of CAE is to learn
dimensionality reduction. The encoding and decoding pro- robust feature representation. Although DAE and CAE
cesses are defined as follows: have the same purpose, they adopt two distinct methods.
DAE learns robust feature representation from a relatively
h ¼ fðwx þ bÞ (1Þ intuitive perspective by randomly adding noise to the input.
0 0
z ¼ gðw h þ b Þ; (2Þ CAE learns robust feature representation from the perspec-
tive of analysis by regularization. The minimization objec-
where f and g are non-linear activation functions (typically, tive function is given by
they are sigmoid or hyperbolic tangent functions), w and w’
are the weight matrices, where w is a dx dh matrix and X
n
JCAE ¼ ðLðxðiÞ ; gu0 ðfu ðxðiÞ ÞÞÞ þ jjJf ðxðiÞ Þjj2F Þ; (7)
w0 ¼ wT , and b and b’ are the bias values with dh and dx
i¼1
dimensions, respectively.
WANG ET AL.: CLOUD INTRUSION DETECTION METHOD BASED ON STACKED CONTRACTIVE AUTO-ENCODER AND SUPPORT VECTOR... 1637
X @hj ðxÞ 2
jjJf ðxÞjj2F ¼ (8Þ
tj
@xt Fig. 1. Structure of stacked contractive autoencoder.
X
dh X
dx
jjJf ðxÞjj2F ¼ ðhj ð1 hj ÞÞ2 jjwtj jj2 (9Þ
j¼1 t¼1 4 PROPOSED METHODOLOGY
In this section, we first describe the SCAE used for feature
where hj 2 h represents the element of the hidden repre- learning. Subsequently, we describe the training process of
sentation, dx and dh represent the dimensions of input x and the SCAE model, and the SVM classifier used for multiclass
hidden representation h, respectively. wtj is the element of a anomaly detection.
dh dx matrix that represents the weight of connection
between input x and hidden representation h. The overall 4.1 Designing SCAE Model for Feature Learning
computational complexity is O(dx dh). In general, deep feedforward networks have many advan-
As can be seen, the final objective function consists of tages, which are also applicable to the AE and its variants.
two parts. First, the reconstruction error function is used The SCAE consists of several hidden layers for encoding,
to obtain as much effective information as possible from and a set of symmetrical layers for decoding in which the
the input data. Second, the newly added penalty term is output of each layer is fed as the input of the subsequent
used to suppress minor perturbations in the input data, layer. The detailed structure of the SCAE is presented in
and by introducing the F norm of the Jacobian matrix as Fig. 1. Here, the superscript numbers refer to the hidden
the constraint term, the learned feature is made locally layer identity, and the subscript numbers signify the dimen-
invariant. sion for the layer.
In the encoding phase, the k-th hidden layer of the SCAE
3.4 Contrastive Analysis can learn k-order features from the output of the (k-1)-th
The three technologies described in Sections 3.1 to 3.3 can layer. That is, the first hidden layer learns 1-order features
achieve feature dimensionality reduction. However, the from raw input. The second hidden layer learns 2-order fea-
CAE has some advantages compared with the AE and tures in the appearance of the 1-order features. Subsequent
DAE. In general, there are two criteria for good feature higher layers learn higher-order features. Conversely, in the
representation: (1) good reconstruction of input data, and decoding phase, the (k-1)-th layer is reconstructed from (k-
(2) excellent robustness when the input data is disturbed to 1)-th-order features from the output of the k-th layer, and so
a certain extent. The AE satisfies only the first criterion, on, until the input is reconstructed.
while the DAE and CAE satisfy both criteria. In some classi- Thus, the encoding and decoding processes of the SCAE
fication tasks, the second criterion is more important. There- network can be expressed as
fore, the DAE and CAE are more suitable for the task of
classification detection. xk ¼ fðwk xk1 þ bk Þk ¼ 1; . . . ; m (10Þ
In addition, there are at least three differences between 0k k 0k
z k1
¼ gðw z þ b Þ (11Þ
the DAE and the CAE. First, the sensitivity of the features is
penalized directly rather than indirectly. Thus, DAE is a Here x0 represents the input data, xk represents the k-th-
robust reconstruction of gðfðxÞÞ, and the robustness of fðxÞ order features learned by the k-th hidden layer, and xm
is partial or indirect. CAE will punish fðxÞ instead of denotes the low-dimensional m-th-order features that will be
gðfðxÞÞ, and the encoder part fðxÞ is used for classification, transported to the classifier, where m is the number of hidden
robustness of the extracted features appears more important layers or the depth of the network. Here, we let xm ¼ zm , and
than robustness of the reconstruction inputs. Second, DAE allow layer-wise mapping of the m-th-order features back to
improves the robustness of feature extraction by adding reconstruction z0 . We use uk ¼ fwk ; bk g and u0k ¼ fw0k ; b0k g,
random noise to the input data, while the robustness of which represent the parameters corresponding to the k-th
CAE against perturbations is achieved by calculation. The encoder and decoder, respectively. Thus, the minimization
robustness is analytic rather than stochastic. Third, CAE can objective function of the SCAE is expressed as follows:
finely control the trade-off between the reconstruction input
and the punishment by setting the hyper-parameter . X
m X
n
Thus, CAE is superior to DAE, we believe that CAE will be JSCAE ¼ ðLðxði;k1Þ ; gu 0k ðfuk ðxði;k1Þ ÞÞÞ þ k jjJf ðxði;k1Þ Þjj2F Þ
k¼1 i¼1
a better choice than DAE for learning useful features and
achieving higher classification accuracy. (12)
1638 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 10, NO. 3, JULY-SEPTEMBER 2022
5 CLOUD INTRUSION DETECTION SYSTEM BASED NC is responsible for routing control and network flow col-
ON SCAE AND SVM lection. The network traffic obtained by the NC is handed
over to the anomaly detection application for the intrusion
Here, we use SDN technology to build our CIDS, which
detection.
decouples the traditional network structure into data plane,
control plane, and application plane. The CIDS framework is
shown in Fig. 3. An openflow virtual switch (OVS) is used to 5.2 Data Preprocessing
forward the virtual network flow; this represents the data Data preprocessing mainly includes data transformation
plane. A network controller (NC) is used to install the flow and standardization. Data transformation is used to convert
table and routing control as well as to collect network traffic; the nominal features into numeric values. For example, in
this represents the control plane. NC configures and manages the NSL-KDD dataset, there are three nominal features: pro-
the OVS according to the openflow protocol. The application tocol type, service type, and TCP status flag. The attack type
plane includes various applications to implement different is also nominal. We transfer the attack type into numeric
functions, such as the anomaly detection application. values. For example, 0, 1, 2, 3, and 4 represent Normal,
The anomaly detection application is used to achieve DOS, Probe, R2L, and U2R, respectively.
three main functions: (1) data preprocessing, where the net- In addition, to eliminate the bias in favor of features with
work traffic is transformed and standardized, (2) classifier greater values as well as the large number of sparse features
training, where the SCAEþSVM model used for feature whose values are 0, we use standardization to scale each
extraction and classification detection is trained from the feature value into a well-proportioned range. Here, we use
preprocessed network traffic, and (3) attack recognition, the Z-score method for standardization.
where the trained classifier is used to detect intrusion on the
testing dataset or online network traffic. 5.3 SCAEþSVM Classifier
When building classifiers or other predictors, combining fea-
5.1 Data Collection ture learning methods can lead to dimensionality reduction
Usually, data collection is the first and a critical step in and high detection performance. Here, we use the SCAE
intrusion detection. In our study, we use the OVS and NC deep learning algorithm to extract essential features from
to collect virtual network traffic data from the Xen cloud raw network traffic. Note that the SCAE is pretrained in an
platform, as shown in Fig. 4. Each physical machine (PM) unsupervised mode and fine-tuned by employing a super-
consists of a privileged domain named dom0 and a non- vised back-propagation algorithm. Once the essential fea-
privileged domain named domU. PMs are connected by a tures are extracted, they will be used to train the SVM
traditional switch and VMs are connected by the OVS, classifier. Here, the SVM classifier exploits the OVA
which is deployed in dom0. A VM can communicate with approach to distinguish between normal and abnormal data.
another VM in the same or a different PM through the OVS. We consider SCAEþSVM as a whole or a black-box, and the
The OVS is used to forward virtual network flow, while the learned features are not visible.
TABLE 1 TABLE 2
Sample Distribution of KDD Cup 99 and NSL-KDD Datasets Models Considered in This Work for Comparison
With SCAEþSVM Model
Class Attack type KDD Cup 99 NSL-KDD
Training Testing Training Testing Acronyms Description
Dos Back 2203 1098 956 359 SVM Support Vector Machine
Land 21 9 18 7 SAEþSVM Stacked Auto-Encoder plus Support
Neptune 107201 58001 41214 4657 Vector Machine
Pod 264 87 201 41 SDAEþSVM Stacked Denoising Auto-Encoder plus
Smurf 280790 164091 2646 665 Support Vector Machine
Teardrop 979 12 892 12 STL [21] Sparse Auto-Encoder plus Softmax
Probe Ipsweep 1247 306 3599 141 S-NDAE [25] Stacked Non-symmetric Auto-Encoder
Nmap 231 84 1493 73 plus Softmax
portsweep 1040 354 2931 157 SCAEþSVM Stacked Contractive Auto-Encoder plus
Satan 1589 1633 3633 735 Support Vector Machine
R2L Ftp_write 8 3 8 3
Guess_passwd 53 4367 53 1231
Imap 12 1 11 1 are used widely in similar studies. Therefore, we validate
Multihop 7 18 7 18
the performance of the SCAEþSVM model using these two
Phf 4 2 4 2
Spy 2 0 2 0 datasets.
Warezclient 1020 0 890 0
Warezmaster 20 1602 20 944 6.2 Experimental Design and Environment
U2R Loadmodule 9 2 9 2
Here, we design two groups of experiments to verify the
Buffer_overfolw 30 22 30 20
Rootkit 10 13 10 13 effectiveness of the proposed SCAEþSVM model. These
Perl 3 2 3 2 two experiments answer two questions: (1) whether SCAE
Normal 97278 60593 67343 9711 can extract effective features and achieve the objective of
Total 494021 292300 125973 18794 dimensionality reduction, (2) whether the proposed
SCAEþSVM model can effectively improve the detection
performance of the CIDS.
6 EXPERIMENTAL RESULTS AND ANALYSIS Therefore, we will prove the ability of the SCAE to gener-
ate low-dimensional features as well as the effect of parame-
To verify the effectiveness of the proposed SCAEþSVM
ters in the SCAEþSVM model on the intrusion detection
model, we conduct some experiments. First, we introduce
efficiency. Then, we will compare the results of the
the NSL-KDD [33] and KDD Cup 99 [34] datasets used in
SCAEþSVM method with those of three state-of-the-art
this paper. Second, we describe the experimental design
approaches and those obtained by two similar methods (see
and environment, including classification performance met-
Table 2). The structures of these three approaches are simi-
rics and model parameters. Finally, we present some experi-
lar to that of the SCAEþSVM (here, SDAE employs 0.3 times
mental results and a comparative analysis.
Gaussian noise). We perform three classification tasks, i.e.,
2-class, 5-class, and 13-class classification, on the NSL-KDD
6.1 Dataset dataset. And executing two classification tasks, i.e., 2-class
In our experiments, we use two well-known intrusion and 5-class classification, on the KDD Cup 99 dataset. Spe-
detection evaluation datasets that are widely used to vali- cifically, 2-class classification involves normal and attack
date CIDS. The KDD Cup 99 dataset contains around five data, 5-class classification involves normal data and four
million training data records and two million testing data types of attack traffic data (i.e., DOS, Probe, U2R, and R2L),
records. Here, we use around 10% of the records, i.e., and 13-class classification involves more than the minimum
494,021 training data records and 311,029 testing data 20 entries of the training data in Table 1.
records, to evaluate the SVM classifier. By removing the In general, six metrics are used for evaluating the detec-
records that appear only in the testing data but not in the tion performance: accuracy rate (ACC), precision rate (P),
training data, we are left with 292,300 testing records. Each recall rate (R), f-measure (F), confusion matrix (M), and
record consists of 41 different features and is labeled as receiver operating characteristic (ROC) [18]. They are
either normal or an attack. These attacks can be classified defined as follows:
into four types: DOS, Probe, R2L, and U2R.
As a revised version of the KDD Cup 99 dataset, the NSL- TP þ TN
Accuracy ¼ (15Þ
KDD dataset proposed by Tavallaee et al. [35] contains TP þ TN þ FP þ FN
125,973 training data records and 22,544 testing data TP
Precision ¼ (16Þ
records. Similarly, by removing the records that appear TP þ FP
only in the testing data but not in the training data, we are TP
Recall ¼ (17Þ
left with 18,794 testing records. Each record in the NSL- TP þ FN
KDD dataset is also composed of 41 different features. precision recall
F -measure ¼ 2 ; (18Þ
Table 1 summarizes the exact distribution of the KDD Cup precision þ recall
99 and NSL-KDD datasets.
Although these two datasets have some limitations and where the accuracy rate is the proportion of records that are
41 is a relatively small number of dimensional features, they all correctly identified. The precision rate is the proportion
WANG ET AL.: CLOUD INTRUSION DETECTION METHOD BASED ON STACKED CONTRACTIVE AUTO-ENCODER AND SUPPORT VECTOR... 1641
TABLE 3 TABLE 5
Comparison of Different SCAEþSVM Structures Two Types of Classification Tasks on KDD Cup 99 Dataset
Model ACC P R F
SCAE þSVM
1
75.97 73.49 75.97 71.15
SCAE2þSVM 85.45 86.73 85.45 81.70
SCAE3þSVM 85.46 84.44 85.46 80.96
SCAE4þSVM 87.33 87.96 87.33 85.01
SCAE5þSVM 83.57 82.70 83.57 79.26
TABLE 6 TABLE 7
Precision, Recall, and F-Measure of Various Learning Confusion Matrix of 5-Class Classification Task
Methods on NSL-KDD Dataset
Normal DOS Probe R2L U2R
SAEþSVM SDAEþSVM SCAEþSVM
Attack class Normal 9325 123 217 40 6
P R F P R F P R F DOS 14 5667 60 0 0
Normal 0.78 0.93 0.85 0.81 0.92 0.87 0.84 0.96 0.89 Probe 131 79 896 0 0
DOS 0.91 0.93 0.92 0.91 0.99 0.95 0.96 0.99 0.98 R2L 1664 4 9 522 0
Probe 0.68 0.75 0.71 0.70 0.74 0.72 0.76 0.81 0.78 U2R 29 4 0 2 2
R2L 0.81 0.09 0.17 0.90 0.15 0.25 0.93 0.24 0.38
U2R 1.00 0.05 0.10 0.00 0.00 0.00 0.25 0.05 0.09
Total 0.82 0.82 0.78 0.84 0.86 0.81 0.88 0.87 0.85 From the experimental results, we can observe the
following:
with the number of network layers, and additional layers 1) The SVM classifiers combined with different deep
will easily lead to over-fitting. In our experiment, five differ- learning methods are superior to the standalone
ent types of SCAEþSVM models were set up on the NSL- shallow SVM. The SAE, SDAE, and SCAE deep
KDD dataset. The comparative analysis results in terms of learning algorithms all achieve the goal of
the accuracy rate are shown in Table 3. dimensionality reduction. They can not only capture
SCAE1þSVM is set as a shallow network, i.e., a 41-5 struc- the essential features but also improve the classifica-
ture. Similarly, SCAE2þSVM, SCAE3þSVM, SCAE4þSVM, tion accuracy. We demonstrate that combining deep
and SCAE5þSVM are set as 41-20-5, 41-28-16-5, 41-28-16-8-5, and shallow learning techniques, can play to their
and 41-28-20-12-6-5 structures, respectively. From Table 3, respective strengths and achieve better detection
the SCAE4þSVM model shows the best classification perfor- performance.
mance. Thus, SCAE can extract optimal features and achieve 2) In the 2-class classification task, for the NSL-KDD
the objective of dimensionality reduction. The performance dataset, the SCAEþSVM model has the highest accu-
of deep structures is better than that of shallow structures racy rate and precision rate. However, the recall rate
because multi-layer mapping units can extract important and f-measure of the SCAEþSVM model are higher
structural information. than those of the SAEþSVM and SDAEþSVM meth-
ods and lower than those of the STL method.
6.3.2 Classification Performance of the SCAEþSVM 3) In the 2-class classification task, for the KDD Cup 99
To further explore the validity of the proposed SCAEþSVM dataset, the accuracy rate, precision rate, recall rate,
model, we evaluate the detection performance of the model and f-measure of the SCAEþSVM model are all the
and perform comparative analysis with some other meth- highest, albeit only slightly better than those of the
ods as mentioned previously. We evaluate our proposed other three methods.
model on three types of classification tasks: 2-class, 5-class, 4) In the 5-class classification task, for the NSL-KDD
and 13-class classification. The numbers of hidden layers dataset, the SCAEþSVM model achieves the highest
and neurons of SAE and SDAE are the same as those of accuracy rate and recall rate among all the models.
SCAE. Tables 4 (a)–(c) show the detection performance However, the precision rate and f-measure of the
of 2-class, 5-class, and 13-class classification tasks on the SCAEþSVM model are higher than those of all
NSL-KDD dataset. Tables 5 (a)–(b) show the detection models except the S-NDAE. The AUC value of the
performance of 2-class and 5-class classification tasks on the S-NDAE method illustrated later is lower than that
KDD Cup 99 dataset. of our SCAEþSVM model.
TABLE 8
Classification Performance of 13-Class Classification Task on NSL-KDD Dataset
Fig. 5. ROC curves for various methods in 5-class classification tasks on Fig. 6. ROC curves for various methods in 5-class classification tasks on
NSL-KDD dataset. KDD Cup 99 dataset.
5) In the 5-class classification task, for the KDD Cup 99 approaches, and the results are better or at least match
dataset, the accuracy rate and the recall rate of the those of the similar studies compared herein.
SCAEþSVM model are higher than those of the Table 6 lists the precision rate, recall rate, and f-measure
SAEþSVM and SDAEþSVM methods and slightly of every class of the 5-class classification task on NSL-KDD
higher than those of the S-NDAE models. However, dataset. We can see that our model achieves superior perfor-
the precision rate and f-measure are lower than those mance in terms of most of the measures except the U2R
of the S-NDAE model. class. The results re-emphasize that our model does not
6) In the 13-class classification task, for the NSL-KDD handle smaller classes such as “R2L” and “U2R”, which
dataset, the SCAEþSVM model achieves the highest show lower performance because the data size affects the
accuracy rate and recall rate among all the models. classification results to some degree.
However, the precision rate and f-measure of the The confusion matrix of the 5-class classification obtained
SCAEþSVM model are lower than those of the using our SCAEþSVM model is presented in Table 7. The
S-NDAE model. values of the leading diagonal denote the number of cor-
In summary, the SCAEþSVM method achieves better rectly classified records of the testing dataset. “R2L” and
detection performance than the three state-of-the-art “U2R” have more samples that are identified incorrectly.
1644 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 10, NO. 3, JULY-SEPTEMBER 2022
TABLE 9 TABLE 11
AUC Value of Four Methods AUC Value of Four Methods in 13-Class Classification
Tasks on NSL-KDD Dataset
TABLE 10
AUC Value of Two Methods in 5-Class Classification
Tasks on NSL-KDD Dataset
Fig. 7. ROC curve for SCAEþSVM method in 13-class classification task
Method Normal DOS Probe R2L U2R on NSL-KDD dataset.
S-NDAE [25] 0.82 0.95 0.99 0.45 0.44
SCAEþSVM 0.88 0.99 0.90 0.53 0.62 detection performance than the other method. On the KDD
Cup 99 dataset, the AUC values of the SAEþSVM,
SDAEþSVM, and SCAEþSVM methods are all 0.98, that is,
Next, we evaluate the detection performance of the pro- all achieve high detection performance. Thus, our method
posed SCAEþSVM model on the 13-class classification task. performs well in the 2-class classification task.
These 13 classes are those with more than the minimum 20 Table 10 lists the AUC values of the 5-class classification
entries listed in Table 1. The purpose of the experiment is to task in two methods—the S-NDAE and SCAEþSVM meth-
determine whether the proposed deep learning model can ods—on the NSL-KDD dataset. These results indicate that
identify each type of attack with fine granularity and main- our proposed method has four AUC values superior to the
tain stable detection performance when the number of S-NDAE.
attack categories increases. The corresponding performance In the 13-class classification tasks, the AUC values of four
analysis is presented in Table 8. The experimental results methods are listed in Table 11, SCAEþSVM achieves an
show the following: AUC value of 0.95 on the NSL-KDD dataset; thus, it
achieves higher detection performance than the other meth-
1) The total detection performance of the SCAEþSVM ods. The ROC curve of our method in the 13-class classifica-
model is satisfactory and stable. It achieves the high- tion task is shown in Fig. 7.
est level compared to other methods. From the above-mentioned results, it is nearly certain that
2) All the classes of our model achieve better detection our proposed SCAEþSVM method has a high true positive
performance, except for three classes: buffer_over- rate and a low false positive rate. Thus, our method performs
flow, nmap, and pod. Because the number of testing well in the 2-class, 5-class, and 13-class classification tasks.
data of the warezclient class is 0, so the detection per-
formance of this attack is 0.
Figs. 5a, 5b, 5c, 5d and 6a, 6b, 6c, 6d show the ROC curves 7 CONCLUSION
of four different methods in 5-class classification tasks on Security concerns not only lead to severe losses in the cloud
the NSL-KDD and the KDD Cup 99 datasets. The dotted computing environment but also cause users to lose confi-
lines represent the ROC curve of the total classification per- dence in cloud computing itself, which will inevitably have a
formance of the method. The other lines represent the five serious impact on the healthy and sustainable development
attack types. A larger area under the ROC curve implies a of cloud computing. Building a cloud intrusion detection
high true positive rate and a low false positive rate. As can system is one of the solutions for protecting cloud computing
be seen, the area under the ROC curve of the SCAEþSVM from malicious attacks. Recently, researchers have demon-
model is the largest; thus, it can be considered to have the strated that an efficient and effective CIDS can be built by
best performance. combining a deep learning algorithm for feature extraction
The AUC values of four methods are listed in Tables 9 (a)– with a classifier. In this study, we designed a hybrid system
(b). In the 5-class classification tasks, the AUC values of the that uses a stacked contractive auto-encoder (SCAE) for fea-
SCAEþSVM model on the NSL-KDD and the KDD Cup 99 ture reduction and the SVM classification algorithm for the
datasets are 0.92 and 0.98, respectively. This implies that the detection of malicious attacks. Using the NSL-KDD and
proposed SCAEþSVM method achieves higher detection KDD Cup 99 intrusion detection datasets, we experimentally
performance than the other methods. In addition, in the 2- demonstrated that the proposed SCAEþSVM-IDS model
class classification tasks, the AUC values of the SCAEþSVM achieves promising classification performance in terms of six
on the NSL-KDD dataset is 0.86, which implies higher metrics compared with three state-of-the-art methods.
WANG ET AL.: CLOUD INTRUSION DETECTION METHOD BASED ON STACKED CONTRACTIVE AUTO-ENCODER AND SUPPORT VECTOR... 1645
Although the proposed SCAEþSVM-IDS approach has [14] Y. L. Cun et al., “Handwritten digit recognition: Applications of
neural network chips and automatic learning,” IEEE Commun.
shown encouraging performance, it can be improved by fur- Mag., vol. 27, no. 11, pp. 41–46, Nov. 1989, doi: 10.1109/35.41400.
ther optimizing the classifier. The SVM classifier cannot [15] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algo-
effectively recognize some new attacks existing in the testing rithm for deep belief nets,” Neural Comput., vol. 18, no. 7,
dataset. Therefore, designing an optimal classifier requires pp. 1527–1554, 2006, doi: 10.1162/neco.2006.18.7.1527.
[16] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning repre-
careful consideration in future studies. As a long-term objec- sentations by back-propagating errors,” Nature, vol. 323, no. 6088,
tive for future work, on the one side, we aim to reduce the pp. 533–536, 1986, doi: 10.1038/323533a0.
controller’s bottleneck and implement an CIDS that can [17] G Alain and Y Bengio. “What regularized auto-encoders learn
from the data generating distribution,” J. Mach. Learn. Res., vol. 15
detect different kinds of network attacks. On the other side, no. 1, pp. 3563–3593, 2014.
we plan to implement our solution in a real cloud environ- [18] T. A. Tang, L. Mhamdi, D. McLernon, S. A. R. Zaidi, and
ment to evaluate its performance. M. Ghogho, “Deep learning approach for network intrusion
detection in software defined networking,” in Proc. Int. Conf. Wire-
ACKNOWLEDGMENTS less Netw. Mobile Commun., 2016, pp. 258–263. [Online]. Available:
http://dx.doi.org/10.1109/wincom.2016.7777224
The authors were very grateful to the anonymous reviewers [19] M. A. Salama, H. F. Eid, R. A. Ramadan, A. Darwish, and
and editor for their helpful and constructive comments A. E. Hassanien, “Hybrid intelligent intrusion detection scheme,”
in Advances in Intelligent and Soft Computing, K. Janusz, Ed., Berlin
and suggestions. This work was supported by the Natural Germany: Springer, 2011, pp. 293–303.
National Key Basic Research Program of China under Grant [20] R. C. Aygun and A. G. Yavuz, “Network anomaly detection with
2016YFB050190104 and the National Natural Science Foun- stochastically improved autoencoder based models,” in Proc.
IEEE 4th Int. Conf. Cyber Secur. Cloud Comput., 2017, pp. 193–198.
dation of China (61802436, 61702550). [21] A. Javaid, Q. Niyaz, W. Sun, and M. Alam, “A deep learning
approach for network intrusion detection system,” in Proc.
REFERENCES 9th EAI Int. Conf. Bio-Inspired Inf. Commun. Technol., 2016,
pp. 21–26. [Online]. Available: http://dx.doi.org/10.4108/
[1] P. M. Mell and T. Grance, “The NIST definition of cloud eai.3-12-2015.2262516
computing,” National Institute of Standards and Technology, 2011. [22] Q. Niyaz, W. Sun, and A. Y. Javaid, “A deep learning based DDoS
[Online]. Available: http://dx.doi.org/10.6028/nist.sp.800-145 detection system in software-defined networking (SDN),” ICST
[2] C. G. C. Index, “Forecast and methodology, 2016–2021 White Trans. Secur. Safety, vol. 4, no. 12, 2017, Art. no. 153515, doi:
Paper,” 2018. [Online]. Available: https://www.cisco.com/c/en/ 10.4108/eai.28-12-2017.153515.
us/solutions/collateral/service-provider/global-cloud-index- [23] J Kim et al., “Method of intrusion detection using deep neural
gci/white-paper-c11-738085.html network,” in Proc. IEEE Int. Conf. Big Data Smart Comput., 2017,
[3] S. Shin and G. Gu, “CloudWatcher: Network security monitoring pp. 313–316.
using OpenFlow in dynamic cloud networks (or: How to provide [24] D Papamartzivanos, F. G. Marmol, and G. Kambourakis, “Intro-
security monitoring as a service in clouds?),” in Proc. IEEE Int. ducing deep learning self-adaptive misuse network intrusion
Conf. Netw. Protoc., 2012, pp. 1–6. detection systems,” IEEE Access, vol. 2019, pp. 13546–13560, 2019.
[4] C J Chung, P Khatkar, T Xing, J. Lee, and D. Huang,“NICE: Net- [25] N. Shone, T. N. Ngoc, V. D. Phai, and Q. Shi, “A deep learning
work intrusion detection and countermeasure selection in virtual approach to network intrusion detection,” IEEE Trans. Emerg.
network systems,” IEEE Trans. Depend. Secure Comput., vol. 10 no. 4, Topics Comput. Intell., vol. 2, no. 1, pp. 41–50, Feb. 2018, doi:
pp. 198–211, Jul./Aug. 2013. 10.1109/tetci.2017.2772792.
[5] T. Xing, Z. Xiong, D. Huang, and D. Medhi, “SDNIPS: Enabling [26] G Loukas et al., “Cloud-based cyber-physical intrusion detection
software-defined networking based intrusion prevention system for vehicles using deep learning,” IEEE Access, vol. 6 no. 1,
in clouds,” in Proc. Int. Conf. Netw. Serv. Manage. Workshop, 2014, pp. 3491–3508, 2017.
pp. 308–311. [27] Y. Yu, J. Long, and Z. Cai, “Session-based network intrusion detec-
[6] J. S. Cui, C. Guo, L. Chen, Y. N. Zhang, and D. Huang, tion using a deep learning architecture,” in Modeling Decisions for
“Establishing process-level defense-in-depth framework for soft- Artificial Intelligence, Berlin, Germany: Springer, 2017, pp. 144–155.
ware defined networks,” J. Softw., vol. 25, no. 10, pp. 2251–2265, [28] Y. Yu, J. Long, and Z. Cai, “Network intrusion detection through
Oct. 2014, doi: 10.13328/j.cnki.jos.004682. stacking dilated convolutional autoencoders,” Secur. Commun.
[7] G. E. Hinton and R. Salakhutdinov, “Reducing the dimensionality of Netw., vol. 2017, pp. 1–10, 2017, doi: 10.1155/2017/4184196.
data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507, [29] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting
2006, doi: 10.1126/science.1127647. and composing robust features with denoising autoencoders,” in
[8] Z. M. Hira and D. F. Gillies, “A review of feature selection and fea- Proc. 25th Int. Conf. Mach. Learn., 2008, pp. 1096–1103. [Online]. Avail-
ture extraction methods applied on microarray data,” Adv. Bioinf., able: http://dx.doi.org/10.1145/1390156.1390294
vol. 2015, pp. 1–13, 2015, doi: 10.1155/2015/198363. [30] S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio,
[9] A. Kannan, G. Q. Maguire Jr, A. Sharma, and P. Schoo, “Genetic “Contractive auto-encoders: explicit invariance during feature
algorithm based feature selection algorithm for effective intrusion extraction,” in Proc. Int. Conf. Mach. Learn., 2011, pp. 833–840.
detection in cloud networks,” in Proc. IEEE 12th Int. Conf. Data [31] J. Weston and C. Watkins, “Multi-class support vector machines,”
Mining Workshops, 2012, pp. 416–423. [Online]. Available: http:// Support Vector Machines for Pattern Classification. London, U.K.:
dx.doi.org/10.1109/icdmw.2012.56 Springer, 2005.
[10] A. Kannan et al., “A novel cloud intrusion detection system using [32] R. Rifkin and K. Aldebaro, “In defense of one-vs-all classi-
feature selection and classification,” Int. J. Int. Inf. Technol., vol. 11, fication,” Learn. Res., vol. 5, no. 1, pp. 101–141, 2004.
no. 4, pp. 1–15, 2015, doi: 10.4018/ijiit.2015100101. [33] NSL_KDD dataset. 2009. [Online]. Available: http://www.unb.ca/
[11] O. Osanaiye et al., “Ensemble-based multi-filter feature selection cic/datasets/nsl.html
method for DDoS detection in cloud computing,” EURASIP J. [34] KDD CUP 1999 dataset. 1999. [Online]. Available: http://archive.
Wireless Commun. Netw., vol. 2016, no. 1, 2016, Art. no. 130, doi: ics.uci.edu/ml/datasets/kddþcupþ1999þdata
10.1186/s13638-016-0623-3. [35] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, “A detailed
[12] A. Javadpour, S. Kazemi Abharian, and G. Wang, “Feature analysis of the KDD CUP 99 data set,” in Proc. IEEE Symp. Comput.
selection and intrusion detection in cloud environment based on Intell. Secur. Defense Appl., 2009, pp. 1–6. [Online]. Available:
machine learning algorithms,” in Proc. IEEE Int. Symp. Parallel http://dx.doi.org/10.1109/cisda.2009.5356528
Distrib. Process. Appl. IEEE Int. Conf. Ubiquitous Comput. Commun., [36] W. Wang, X. Du, and N. Wang, “Building a cloud IDS using an
2017, pp. 1417–1421. [Online]. Available: http://dx.doi.org/ efficient feature selection method and SVM,” IEEE Access, vol. 7,
10.1109/ispa/iucc.2017.00215 pp. 1345–1354, 2019, doi: 10.1109/access.2018.2883142.
[13] N. M. Ibrahim and A. Zainal, “A feature selection technique for [37] H. Larochelle, Y. Bengio, J. Louradour, and P. Lamblin, “Exploring
cloud IDS using ant colony optimization and decision tree,” Adv. Sci. strategies for training deep neural networks,” J. Mach. Learn. Res.,
Letts., vol. 23, no. 9, pp. 9163–9169, 2017, doi: 10.1166/asl.2017.10045. vol. 1, no. 10, pp. 1–40, 2009.
1646 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 10, NO. 3, JULY-SEPTEMBER 2022
Wenjuan Wang received the master’s degree Ruoxi Qin received the BS degree in information
from Zhengzhou University, China, in 2007. engineering from Electronic Engineering Institute,
Currently, she is working toward the PhD degree Hefei, China, in 2017, currently he is working
in the PLA Strategic Support Force Information toward the master’s degree in the PLA Strategic
Engineering University. From 2016 she was work- Support Force Information Engineering Univer-
ing as an associate professor with the PLA Infor- sity. His research interests include image proc-
mation Engineering University. Her primary essing, deep learning, and artificial intelligence.
research interests include intrusion detection and
security of cloud computing.