01-2020 DL CNN
01-2020 DL CNN
Received March 22, 2020, accepted April 3, 2020, date of publication April 10, 2020, date of current version April 24, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.2986882
ABSTRACT Deep Learning has been widely applied to problems in detecting various network attacks.
However, no cases on network security have shown applications of various deep learning algorithms in
real-time services beyond experimental conditions. Moreover, owing to the integration of high-performance
computing, it is necessary to apply systems that can handle large-scale traffic. Given the rapid evolution of
web-attacks, we implemented and applied our Artificial Intelligence-based Intrusion Detection System (AI-
IDS). We propose an optimal convolutional neural network and long short-term memory network (CNN-
LSTM) model, normalized UTF-8 character encoding for Spatial Feature Learning (SFL) to adequately
extract the characteristics of real-time HTTP traffic without encryption, calculating entropy, and compres-
sion. We demonstrated its excellence through repeated experiments on two public datasets (CSIC-2010,
CICIDS2017) and fixed real-time data. By training payloads that analyzed true or false positives with a
labeling tool, AI-IDS distinguishes sophisticated attacks, such as unknown patterns, encoded or obfuscated
attacks from benign traffic. It is a flexible and scalable system that is implemented based on Docker images,
separating user-defined functions by independent images. It also helps to write and improve Snort rules for
signature-based IDS based on newly identified patterns. As the model calculates the malicious probability
by continuous training, it could accurately analyze unknown web-attacks.
INDEX TERMS Computer networks, intrusion detection, neural networks, large-scale systems, intelligent
systems, real time systems, security, CNN-LSTM.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 8, 2020 70245
A. Kim et al.: AI-IDS: Application of Deep Learning to Real-Time Web Intrusion Detection
System (NIDS). An NIDS is collected using network equip- B. CONDITIONS AND ASSUMPTIONS
ment via mirroring by network devices, such as switches, This study uses the following conditions and assumptions:
routers, and network terminal access points (TAP), which is
a surveillance device for monitoring network infringements 1) AI-IDS: AN OPEN SOURCE SOFTWARE
and policy violations [4]. Many organizations operate NIDS AI-IDS software contains the following license and notice
with firewalls and an application firewall (L7) to protect web- below: Licensed under the MIT License. You can access the
servers that are on the same network and system. An NIDS source-code directly on Github in our repositories [5].
runs mostly signature-based detection by Snort IDS rules.
The analyst writes a user-defined pattern into the rules to 2) PARALLEL OPERATIONS: IDS, TAS
detect an attack. When there is a malicious payload on the An IDS and a Traffic Analysis System (TAS) operate
network traffic, the rule triggers security events, including independently and do not affect each other. We used a
detection time, source/destination IP (metadata), and some signature-based NIDS for intrusion detection and a Splunk
raw packets (payloads). String or pattern match is reliable StreamApp-based TAS for collecting real-time traffic. A TAS
and generates very few false alarms but does not identify is equal to a packet monitoring system.
unknown or irregular pattern attacks. Recent sophisticated
cyber-attacks use irregular patterns such as encoding and
3) APPLICATION-LEVEL PACKETS INSPECTION
obfuscation to bypass security systems. To solve these prob-
We focused on the HTTP commonly used in web services that
lems, we applied AI-IDS to detect variant attacks that cannot
request headers and payload data. We did not address low-risk
be identified by legacy signature-based NIDS.
attacks from protocols below the application layer, such as
user datagram protocol (UDP).
A. LIST OF CONTRIBUTIONS
The remainder of this paper is organized as follows.
The main contributions of this paper are summarized as Section II presents related works, limitations of meta-
follows: datasets, and the motivation for this study. Section III intro-
duces the security operations for deep learning. Section IV
1) APPLYING DEEP LEARNING TO REAL-WORLD NETWORKS shows our spatial feature learning algorithms for big-data,
We have successfully applied AI-IDS to big-data scale traffic. optimal CNN-LSTM model, and AI-IDS infrastructure.
The AI-IDS is a flexible and scalable system that is imple- Section V shows the experimental results. Section VI intro-
mented based on Docker images, and separates user-defined duces the efficacy and applications. The last Section VII
functions by independent images. shows the conclusion.
learning methods. An RNN-IDS is suitable for modeling a and adaptive machine learning-based IDS: the adap-
classification model with high accuracy, and its performance tively supervised and clustered hybrid IDS (ASCH-IDS).
is superior to that of traditional machine learning classifica- Chouhan et al. [14] proposed a Channel Boosted and Resid-
tion methods in both binary and multiclass classification. ual learning-based deep Convolutional Neural Network
Shone et al. [9] showed a non-symmetric deep auto- (CBR-CNN) architecture for the detection of network intru-
encoder (NDAE) for unsupervised feature learning. This sions. This study used Stacked Auto-encoders (SAE) and
study improves the classification performance of KDD99 and unsupervised training, and the performance of the proposed
NSL-KDD99 by comparing an auto-encoder with a CBR-CNN technique is evaluated with an NSL-KDD dataset.
non-symmetric deep auto-encoder (NDAE). Wu et al. [10] Vinayakumar et al. [15] developed an IDS to detect and
devised CNN and RNN for attack detection; however, their classify unforeseen and unpredictable cyberattacks by DNN.
model differs from the model used in this study because The performance was tested with the DNN model and com-
it performed separate experiments on the CNN and RNN pared to the results of the NSL-KDD, UNSW-NB15, Kyoto,
model. Naseer et al. [11] investigated the suitability of deep WSN-DS, and CICIDS2017 datasets. Chiba et al. [16] pro-
learning approaches for anomaly-based intrusion detection posed a DNN model in a cloud environment based anomaly
systems. Ding and Zhai [12] compared the performance of network IDS using recent datasets, such as CICIDS2017,
models using multi-class classification with the performance NSL-KDD version 2015, and CIDDS-001, using a hybrid
of traditional machine learning methods. optimization framework (IGASAA) based on the Improved
Otoum et al. [13] devised DL for an IDS available on Genetic Algorithm (IGA) and Simulated Annealing Algo-
wireless sensor networks (WSNs), and also compared rithm (SAA). Zhang et al. [17], used a deep belief net-
the Boltzmann machine-based clustered IDS (RBC-IDS) work (DBN) model to identify SQL injection attacks in
network traffic. Faker and Dogdu [18] experimented with a practical environment, because metadata are not attack-
improving the performance of intrusion detection systems on attempts. Moreover, most public datasets contain redundant
CICIS2017 and UNSW NB15 datasets using a DNN and two information and an unbalanced number of categories. For
ensemble techniques, RF and gradient boosting tree (GBT). instance, Ring et al. [21] compared the characteristics of
Previous research has suggested new ideas or algorithms intrusion detection datasets used in previous works. This
for improving deep learning algorithms. Aloqaily et al. [19] study shows that various previously published datasets model
introduced an automated secure continuous cloud service repetitive and inefficient attacks, such as DOS, UDP Flood-
availability framework for smart connected vehicles that ing, and brute force, which are different to recent web-attacks
enables an intrusion detection mechanism against security trends. In fact, types of attacks and trends in the data are
attacks. constantly changing; therefore, it is necessary to develop
However, most previous deep learning-based studies have a general-purpose model that is not biased toward current
difficulty applying intrusion detection in real-world environ- trends.
ments because the models were usually pre-processed into Another problem with most published datasets is that
metadata formats in an experimental environment. Few stud- they are often over-fitted due to duplicated or flow-based
ies have proven how to apply them in real-time in the real metadata, and the performance of the model is signifi-
world. cantly upgraded in experimental conditions. If the model
is applied in practical services, it will face a serious
B. LIMITATIONS OF META-DATASETS problem with false-positive alarms. Likewise, the work in
Previous studies [8], [12], [15], [16] mainly focused on Sabhnani and Serpen [25] has shown that when using the
extracting or analyzing features from metadata rather than KDD99 dataset, it is not possible to successfully train pattern
paying attention to exploited raw packets. Owing to network classifications or machine learning algorithms for misuse
traffic changing with trends, the accuracy rates of real-world detection.
without continuous re-training is significantly reduced even if Nevertheless, most previous studies measured the model
a system is 99.9% accurate in an experimental environment. performance on deep learning or machine learning techniques
The following is a description of the KDD99, NSL-KDD, in experiments using KDD99 datasets. Yin et al. [8] also used
and PU-IDS datasets that have been widely used in previous the KDDTest +- dataset to compare performance with the
works. RNN model, and a recent work in Vinayakumar et al. [15]
experimented using the DNN model through public data
1) KDD CUP 1999 DATASETS such as KDD99, NSL-KDD, and UNSW-NB15 using
KDD Cup 1999 Dataset [20] is the most widespread dataset machine learning techniques such as LR, NB, RF, and
for intrusion detection based on the DARPA dataset. The DT. Gu et al. [26] demonstrated that validated training
dataset contains TCP high-level attributes, such as the con- data is an essential determinant for successful research
nection window, but no IP addresses. KDD99 involves more that can greatly enhance the detection capability. Moreover,
than 20 different types of attacks and comes along with Moustafa et al. [27] compared the characteristics of various
redundant records in the test-set [21]. public datasets and suggested that datasets that are not based
on reality can lead to misguided research.
2) NSL-KDD DATASETS
NSL-KDD [22] NSL-KDD is a dataset that has been C. MOTIVATION
enhanced from KDD99, removing much of the duplicated One of the challenges faced by security operations is an inef-
data from KDD99 and creating a more advanced sub-dataset. ficient operation due to false-positive alarm events. It wastes
The dataset consists of separate and predefined training data IDS resources and reduces the performance for effective deep
and test data for intrusion detection. NSL-KDD uses the learning; therefore, the issue of false-positives should be
same attributes as KDD 99 and belongs to the four attack addressed properly to detect threats in big-data infrastructure.
categories: R2L, Prob, U2R, and DOS. belongs to the other Misuse detection that broadly applied in SOC uses predefined
category [23]. signatures for filtering and to detect attacks. It relies on
human inputs by constantly updating the signature database.
3) PU-IDS DATASETS This method is accurate in finding known attacks but is
PU-IDS [24] is a derivative of the NSL-KDD data set, and its completely ineffective for unknown attacks. In most cases in
author has developed a generator that extracts the statistics real-world environments, misuse detection generates a high
of the input data set and then creates a new data set. A traffic false-positive rate similar to anomaly-based IDS. In the study
generator has the same attributes and format as the NSL-KDD of Mishra et al. [4], performance optimization was needed
data set. during the detection process to deal with false-positive issues.
While previous studies mainly used KDD99 or KDD, However, most previous works do not adequately address
and NSL-KDD, they are not suitable as datasets for real- the false-positives issue in the real-world due to performance
time detection. These datasets deal with metadata and evaluations with limited datasets in experimental environ-
therefore make it difficult to identify invalid attacks in ments. To mitigate the false-positive problem, high-quality
training data is a basic determinant for improving DL model network traffic in a user-defined function. A TAS is a type of
performance. system that collects network traffic and enables users to ana-
The most common issues with existing solutions based on lyze traffic by collecting various protocols, including HTTP,
learning models include SMTP, and SSH. It allows analysts to analyze anomalies
–First, the learning models produce a high false-positive by collecting various network protocols from the network
rate with a wide range of attacks. layer to the application layer. We use the StreamApp [28]
–Second, the learning models are not adaptive to the real- as a Splunk software for traffic collection and an analysis
world, as meta-datasets like KDD Cup 99 were mainly used system. For the effective detection and analysis of cyber-
to evaluate the performance of the learning model. attacks, we recommend running NIDS and TAS in parallel.
–Third, previous studies were unable to foresee today’s If security events are alerted on an NIDS, a TAS could inspect
huge network traffic; therefore, scalable solutions are the same malicious payloads on the network.
required to maintain a high performance with a rapidly An NIDS and a TAS inspected a variety of protocols, such
increasing high-speed network size. as SSDP, DNS, SMTP, POP3, HTTP, and SSL, from the
–Finally, no cases have been published on DL applications network layer to the application layer on the network. As the
for the detection of unknown attacks on real-world computer UDP-based protocol does not establish 3-ways handshaking,
networks. These challenges form the primary motivation for it is difficult to attribute it to an IP address and can easily be
the application of deep learning-based NIDS. forged. Thus, we did not analyze invalid UDP-type or denial
of service (DOS) attacks to maintain stable performance.
III. SECURITY OPERATIONS FOR DEEP LEARNING SSL protocol was excluded from our scope because it is not
This section introduces the security operations for deep learn- possible for an analyst to review the malicious payload.
ing applications and the data design for practical training. In managed security monitoring operations, security man-
agers process security events in the order of Detection, Anal-
A. OVERVIEW OF SECURITY OPERATIONS ysis, and Prevention. ‘‘Detection’’ means to collect security
We detected and analyzed intrusion attempts into financial alerts generated by user-defined Snort rules on NIDS or TAS,
networks to protect electronic financial incidents. The SOC which include detection time, source IP, destination IP, port
also plays the role of an Information Sharing and Analysis information and signature messages in Table 2. ‘‘Analysis’’
Center (KF-ISAC). Fig. 1 shows that the SOC collected refers to classifying events into true or false positives by
real-time network traffic, and detected malicious network reviewing detection information. ‘‘Prevention’’ is to register
traffic by directly installing an NIDS, a TAS, a TAP, and a malicious IP addresses to blacklists, which are then blocked
virtual private network (VPN) on the Internet DMZ area of from accessing service websites. Prevention is applied to
many financial companies in South Korea. The SOC oper- very obvious attack patterns, and it is recommended to block
ated continuously for 24 hours a day, 7 days a week, and access from certain attacks only after being verified by an
365 days a year. About 20 people work in shifts and generate analyst or system. The proposed AI-IDS is used as a supple-
daily analytical information for training. The IDS and TAS ment system with legacy signature-based NIDSs for network
data were transferred to the SOC via VPN from financial layer security.
institutions, and the SOC collected approximately 1 billion
TABLE 2. Attributes of analysis information.
real-time HTTP traffic per day (Sep. 2019).
per day on legacy signature-based NIDS, and we perform detailed AI-IDS architecture, and the structure of a neuron
about 10,000 automatic and manual analyses. During gen- network model for large-scale web traffic.
eral security operations, malicious detection information is
triggered by NIDS when an attack packet occurs in the net- A. SPATIAL FEATURE LEARNING BASED ON NORMALIZED
work communication. Daily training-data on the production UTF-8 CHARACTER ENCODING
environment is labeled in real-time by security analysts using Feature extraction is one of the most important tasks in
labeling tools. designing an efficient learning algorithm. Mamun et al. [29]
The analysis information labeled is shared with AI-IDS devised a combined preprocessing technique using attributes
and used as training data for prediction in neural networks. of information theory such as entropy, encoding, and com-
We implemented deep learning models in real-time HTTP pression. Theoretically, the entropy of encrypted or irregular
traffic – ‘‘Password guessing and Authentication bypass data is high as there are many uncertainties in the data stream.
(AUT),’’ ‘‘SQL Injection (SQL),’’ and ‘‘Application vulnera- The entropy value indicates the degree of uncertainty of
bility attack (APP).’’ For UDP-type attacks, such as ‘‘infor- the information, but it is difficult to extract the feature by
mation gathering’’ or ‘‘denial of service,’’ it is difficult to matching the unique characters of the given data 1:1. For
identify the attacker’s IP address when compared to TCP, example, entropy can express the uncertainty of information
because the session is not connected perfectly, and contains as a number in the range of 0 to 1, but a collision problem
meaningless repeated data; therefore, we excluded it from would be calculated with the same entropy even if different
the deep learning model. Besides, HTTP traffic related to data were given. For this reason, it is difficult to extract unique
malware infection events are often detected when the mal- features of a given string, as it is. Thus, we use UTF-8 encod-
ware connects to the C2 server after infection. Unlike general ing that normalizes the deep learning model to recognize
intrusion events of which traffic are sent from an external the data with its own characteristics. It is simple and fast,
IP to an internal IP, malware events’ traffic is usually in the because it does not include unnecessary entropy calculations,
opposite direction. compression, encryption, or anything else. Assuming that all
The security event shown in Table 2 consists of the detec- data preprocessing for billions of HTTP within 1,000 bytes
tion time, detection site, direction, source IP, source port, per day is executed, the UTF-8 encoding method can achieve
destination IP, destination port, signature name, raw packet fast data preprocessing at only about 1 × 28 × 1, 000 billion.
(pcap file), and flag. ‘‘Detection Time’’ is the time the signa- The biggest advantage of UTF-8 is that it cannot be confused
ture generated the event, and ‘‘Detection Site’’ is the location with a single encoding method, so there is no possibility of
identifier where IDS and TAS were installed. ‘‘Direction’’ wrong encoding in other ways, such as for the national lan-
shows the direction of attacks based on assets between the guage encoding method such as UTF-16, EUC-KR (Korean),
source IP and the destination IP. ‘‘Source IP/port (src_ip, GB2312 (Simplified Chinese). As both browsers and web
src_port)’’ is the IP/Port address that requests a connec- servers are now developed assuming UTF-8, it is a very
tion from the client to the server, and ‘‘Destination IP/port efficient way to preprocess HTTP traffic.
(dest_ip, dest_port)’’ is the IP/Port address from the server to UTF-8 encoding in Algorithms 1-2 converts up to 256 char-
the client. Most of the above metadata are managed as Critical acters into floats, which can be encoded into numbers, includ-
IP or Threat Intelligence by security administrators. ing special characters that include Simplified Chinese in
The number of HTTP requests collected per day was the packet, such as WebDAV attacks. When preprocessing a
approximately 1 billion, of which about ten thousand were string of 7 bits or less, it is difficult to preprocess various
analyzed information about attack events detected in HTTP. characters in a real environment. We used the normalized
Assuming a normal to abnormal ratio of 5:5, the amount of UTF-8 encoding and the module developed on ‘‘parse’’ and
malicious analysis information is 65 MB for the last year, but shown in Figs. 2 and 3. The input variable was replaced
benign HTTP traffic is 6 GB per day. To equalize the data with a value corresponding to a unique string in the range
rate for training in the deep learning model, the 65 MB HTTP of 0 to 255 (256 features), and the input string was con-
payload, which was analyzed during one year, was multiplied verted into a float value between −1.000 and 1.000 given
100 times by concatenation and shuffle, and the ratio of the that ys = −(ys − 128)/128. The output variables ys for
analysis information and normal traffic was adjusted to be a transformed set of input data, for one training-data size
equal to 6 GB per day. Malicious events identified by analysts s ∈ [0, 2, 3, . . . , 999].
were used as data for re-training. The training data was Fig. 2 shows a preprocessing example for ‘‘http://target.
approximately 6 GB per day, and the analysis information com/manager/html/.’’ When comparing preprocessing meth-
from the duration of 1 year was changed sequentially like a ods with our proposed UTF-8 encoding and entropy-based
sliding window. encoding, our proposed method is a normalized calcula-
tion expression. The entropy of a string requires probabil-
IV. DESIGN AND IMPLEMENTATION ity calculation, followed by multiplication and logarithm.
This section introduces a fast and effective spatial feature Entropy-based preprocessing involves two steps of calcu-
learning based on normalized UTF-8 character encoding, lating the probability of each string and then calculating
application-level strings for AI-IDS, and the ‘‘data_backup’’ TABLE 3. Samples of prediction output.
module backs up raw-data which has been collected more
than 24 hours in the past.
CPU, 1 TB RAM, Tesla V100 32 GB RAM × 2EA GPU, the data is zero-padded. AI-IDS preprocessing continuously
960G × 4(RAID-5) SSD, 10 TB × 4(RAID-5) HDD and collects data for 3 hours in 1 cycle. AI-IDS operates 8 times
10 Gbps LAN. of learning, and predicts every 3 hours for real-time traffic,
As shown in Fig. 1, the sensor systems were located in which allows for real-time monitoring for 24 hours. In the
several financial institutions, and IDS alert events and TAS training phase, the labels indicating ‘‘malicious,’’ ‘‘benign,’’
traffic were collected and transmitted to the SOC via VPN and ‘‘unknown’’ are recorded at the end of 1,000 bytes of an
from financial institutions. The experiment was conducted HTTP request, and the model calculates a malicious proba-
in a test-bed system, which was deployed to the operating bility when all neuronal network operations are completed.
system only when the performance and function verification The initial input-data at the CNN layer generates 1 × 1,
were completed. 000×12 composited data through an operation with 1×4×12
filters. After 1/5 max-pooling, 1 × 200 × 12 pieces of data
C. OPTIMIZED CNN-LSTM MODEL are stored in the memory in normalized form. In the second
Table 4 and Fig. 4 show the CNN-LSTM structure, which convolutional layer, 1 × 200 × 60 data are generated through
illustrates the hyper-parameters. One UTF-8 encoded HTTP the composite product of a 1 × 4 × 5 filter, and then 1 ×
data, including the variable-length HTTP header and pay- 40 × 60 data are generated as a result of 1/5 max-pooling
loads, which is the initial input value of the proposed neuron and normalization. Data output from the CNN layer is used
network model, is made into a fixed-length input value of as an input to the RNN layer, and data processed into cells
1,000 bytes (1 dimension × 1,000 bytes). Strings corre- of 1 × 40 × 60 are sequentially input to Forward LSTM
sponding to the header and body of the HTTP request from and Bidirectional LSTM. The first LSTM cell is calculated
the 0-th byte to the n-th byte are aligned, and the rest of in the forward direction with 16 cells, the second LSTM cell
TABLE 4. Summary of proposed CNN-LSTM model. datasets to select optimal performance parameters. The pro-
posed model is devised with an intuitive design based on the
theoretical basis of a previous study, and we proved the model
validity through repeat experiments. In the next section,
we present the detailed experimental results to demonstrate
the validity and performance of our proposed model.
V. EXPERIMENTS
This section demonstrates performance measurements,
experimental design, and results: comparing the performance
of the CNN-LSTM, LSTM-CNN, and DNN models and the
experimental results of the KF-ISAC, CSIC-2010 HTTP, and
CICIDS 2017 datasets.
We have defined the following experimental statements for
the deep learning application:
• Selection of experimental data: CSIC-2010, CICIDS
2017 HTTP dataset, real-time HTTP data
• Design of optimal model structure using deep learn-
ing: CNN-LSTM model, LSTM-CNN model
• Determination of hyper-parameters: This is required
for individual neural networks, such as CNN,
RNN, DNN: conv_depth, conv_filter, and lstm_units,
dense_units
• Model validation: experiments on two public datasets
(CSIC-2010, CICIDS2017) and fixed real-time data
We experimented to select the optimal model by comparing
CNN-LSTM with LSTM-CNN based on real-time HTTP
traffic on a fixed date. In the second experiment, we validated
is processed in a bidirectional flow, and the last 32nd LSTM the model through experiments using two public datasets
Cells are transferred to the DNN layer by combining the (CSIC-2010, CICIDS 2017 HTTP dataset) and real-time data
accumulated forward and backward cells. on the optimal model. Recently, various models have been
The output value of the calculated RNN is input into each introduced that optimize performance by combining CNN,
of the 12 fully connected DNN layers. Until the DNN output RNN (LSTM), and DNN. Liu et al. [31] and Wu et al. [10]
layer, dropout was set to 0.1, and the LeakyReLU function devised CNN and RNN for intrusion detection, but it was
was applied. Sigmoid activation function was used at the different from the model of this study because it performed
DNN output layer and the model was trained for prediction experiments each separated model in CNN and RNN. In this
on malicious payloads using the Adam optimizer along with paper, a DNN was selected as the last layer to output a single
binary-crossentropy (BCE) as the loss function. The probabil- result; we chose a model that can best characterize the data
ity is calculated in the output layer which includes the JSON among a CNN-LSTM or LSTM-CNN.
output-file shown in Table 3 and the output files are shared
with Index Cluster, as shown in Figs. 1 and 3. A. PERFORMANCE MEASUREMENT
The analyst reviews the probability calculated by AI-IDS We used a confusion matrix to evaluate the performance of
and examines the payload to determine whether an attack the deep learning model. A confusion matrix is a popular
warning is valid or not. During the training phase, AI-IDS indicator of the performance of classification models. The
uses labeled analysis information from an analyst: (i) attack matrix in Table 5 shows us the number of correctly and incor-
alert events detected by IDS and (ii) valid attack events that rectly classified results, compared to the actual outcomes in
the analyst has confirmed. As the AI-IDS aims to detect new the test data. One of the advantages of a confusion matrix
threats in the predict phase, the security events detected by as an evaluation tool is that it allows for a more detailed
legacy signature-based IDS are considered duplicate data. analysis. The matrix is n by n, where n is the number of
It calculates malicious probability for new and real-time pay- classes. The simplest classifiers, called binary classifiers,
loads and outputs prediction results. have only two classes: positive/negative. The performance of
The composition and depth of each layer of CNN, RNN, a binary classifier is summarized in a confusion matrix that
and DNN derives the optimal parameters for the model cross-tabulates predicted and observed examples into four
through repeated experiments in the training phase. The categories [8], [32].
structure and parameters of the neuron network are slightly In our deep learning model, Precision and F-Score are more
different when iterative experiments are performed on various important performance indicators than others. Moreover, the
• Precision: the proportion of the number of correctly pre- 3) CICIDS2017 HTTP DATASETS
dicted cases as positive to the number of predicted cases The CICIDS2017 datasets [34] generated in 2017 by the
as positive, high precision relates to a low false-positive Canadian Institute of Cybersecurity overcome these issues.
rate. The CICIDS2017 benchmark dataset contains the abstract
• Recall (Sensitivity, Detection Rate): the proportion of behavior of 25 users based on HTTP, HTTPS, FTP, SSH,
the number of correctly predicted as positive to the and email protocols. We use only HTTP datasets, including
number of cases labeled as positive. web attacks and generated 586,180 records by augmenting
• Specificity: the proportion of the number of correctly 20 times from the original 29,309 records. The dataset con-
predicted as negative to the number of cases predicted sists of the entire abnormal/normal pcap file, the unlabeled
as negative. HTTP attack, and the metadata, including label data.
• F-Score: the weighted average of Precision and Recall;
this score considers both false positives and false C. EXPERIMENTAL RESULTS
negatives. 1) MODEL SELECTION
We implemented CNN-LSTM and LSTM-CNN structures
B. EXPERIMENTAL DESIGN for an optimal deep learning model selection and then
We describe the details of the experimental datasets in the performed 10 iterations using real-time HTTP data shown
following paragraphs and in Table 7. in Table 7. The training data of KF-ISAC consisted of approx-
imately 6.6 million records extracted and proposed on a
specific date, and each of the normal/attack classes was com-
1) REAL-TIME HTTP DATASETS (KF-ISAC)
posed of approximately 3.3 million records. The test datasets
KF-ISAC HTTP data is real-time HTTP stream data during were set to a ratio of 8:2. The results of the experiment
fixed dates from a TAS. The proposed model trains a mix of shown in Fig. 5 are the average values of the results of
benign HTTP data and labeled malicious payloads that have 50 epochs. The overall model performance of CNN-LSTM is
been analyzed over the past year. It evaluates performance better than LSTM-CNN, in areas such as accuracy, precision,
by separating training and test data at an 8:2 ratio. The label and F-Score. In particular, there are many differences in
in the training data is located at the end of the preprocessed Precision, and F-Score because of the True/False Positive
data. Rate. The model performance starting from the highest to the
lowest is CNN-LSTM, LSTM-CNN, and DNN. CNN-LSTM
2) CSIC-2010 HTTP DATASETS reduces the dimension by max-pooling at the initial step, but
CSIC-2010 HTTP data [33] was provided by Aberystwyth LSTM-CNN takes more time to train because the dimension
University. The contributors collected HTTP packets to detect and parameters are increased through LSTM Cell. The DNN
web attacks. The dataset contains 36,000 normal requests and is relatively fast but it has low rates for the scores of Accuracy
more than 25,000 anomalous requests. The data consists of and Specificity.
normal HTTP data for training and normal/abnormal data
for testing. We generated 1,941,300 records by augmenting 2) DETERMINATION OF HYPER-PARAMETERS
20 times from the original 77,652 records and split the set in We chose the best-performing deep learning model according
a ratio of train 8: test 2, except for 6 error records during data to the experimental results. The CNN-LSTM model needs to
import. determine the optimal hyper-parameters for stable operations.
We considered a high precision such that the time needed For all experiments for each dataset, the model parameters
to train or to validate events by true/false-positive rates in were modified to obtain the results above and to optimize
a practical environment is minimized. The experiment used the performance on different datasets. Considering that our
real-time HTTP (KF-ISAC) data shown in Table 7. model has 14,000 trainable parameters, the CSIC-2010 and
The CNN layer determines the conv_depth, conv_filter, CICIDS- 2017 are relatively small, which leads to overfit-
conv_kernel_width, and conv_pool variables. In detail, one ting and low performance. Experimental results shown in
variable has to be selected from conv_depth ∈ [2], conv_filter Fig. 6 showed a high accuracy of 91–93% for each dataset in
∈ [2, 4, 8, 12], conv_kernel_width ∈ [4] and conv_pool CSIC-2010 and CICISC-2017. The precision was in the range
∈ [3, 4, 5]. The RNN layer determines the lstm_units and of 86–98%, and the F-Score was in the range of 80–82%,
lstm_depth variables. In detail, one of the following values which is lower than the experimental results of the previous
has to be selected from lstm_unit ∈ [16] and lstm_depth real-time data. Experimental results showed that the perfor-
∈ [1, 2, 4, 8]. The DNN layer determines dense_depth, mance of the model is affected by the number of samples
dense_units, dense_dropout, and dense_relu_alpha. In detail, and the diversity of the training data. It was difficult to
one value of dense_depth ∈ [1, 2, 4, 8], dense_units ∈ cross-validate our model with two published datasets owing
[4, 8, 12, 16] and dense_dropout ∈ [0.1, 0.5] is selected. The to small samples. If we had a large amount of non-repeated
experiment was conducted 270 times, with one or more of the HTTP data, the experimental performance would improve
five indicators converging to zero or one, and then moving on and would return more reliable results. Considering the above
to the next parameters. results, our model is more suitable for a large amount of data,
Aiming for the high F-score and the high preci- and we demonstrated the excellence of our model by training
sion, which means minimum with false-positive values, with various datasets of more than 6 million HTTP traffic
the hyper-parameters of an optimal model are shown as fol- data.
lows: 2 for convolution depth, 12 for convolution filter, 4 for
convolution kernel size, 5 for max-pooling size, 16 for LSTM VI. EFFICACY AND APPLICATION
Cell, 2 for LSTM depth (1 forward LSTM, 1 Bidirectional This section describes cases of how AI-IDS detects vari-
LSTM), 12 dense units, 8 for dense depth, and 0.1 for dense ant attacks that bypass detection on legacy signature-based
dropout. NIDS, and how Snort rules can be rewritten or improved.
The AI-IDS in Fig. 7 performs ‘‘predict’’ based on the
3) MODEL VALIDATION completed h5-model shown in Fig. 3, and it predicts real-time
To validate the performance of deep learning mod- data by inspecting the attack as a prediction output. When
els, we used real-time data and public HTTP datasets the prediction value is 100%, the NIDS knows the payload is
(CSIC-2010, CICIDS 2017 HTTP datasets), and experi- malicious, but the results of analysts are not perfectly reliable
mented with 50 epochs on the previously selected model. because an initial AI-IDS result may contain an analysis error.
The experimental results of real-time data showed that Thus, an analyst needs to confirm the final step until a stable
the proposed CNN-LSTM model can be used for general level has been reached. We classified the suspicious payloads
HTTP data with a high performance. The AI-IDS is a deep as a prediction value within a range of 50–100%, and an
learning-based model with no pre-feature extraction and average of 100-500 events occurred every 3h. We assumed
therefore all strings can be processed. that AI-IDS is classified as normal or malignant, and less
than 50% of the predicted values are labeled as ‘‘benign,’’ and TABLE 8. Detecting variant patterns on AI-IDS.
50–100% are classified as ‘‘malicious’’ payloads. In Fig. 7,
the analyst labels suspicious payloads on the program as
‘‘benign,’’ ‘‘malicious,’’ or ‘‘unknown’’ using a conditional
search. The labeled data is used for daily retraining. The
label program shows the prediction value (%) generated by
the optimal CNN-LSTM model, and the analyst can use it
as a reference for identifying the actual malicious payloads.
Some of the analysis information, such as src_ip/port and
dest_ip/port can be used to register a blocking policy in the
firewall.
The effects of applying AI-IDS are summarized as follows:
–First, it can detect variant bypass attacks that are not
detected on legacy Signature-based NIDS. For all AI-IDS
predictions, security events on the legacy NIDS are automat-
ically excluded, such that no duplicate events can occur.
–Second, it is possible to write or modify Snort rules for types and patterns, as well as the examples shown in Table 8.
new patterns. If legacy NIDS have existing rules but cannot In the case of SQL Injection, the detection accuracy of variant
detect attacks, then this had to be caused by Snort grammati- patterns is close to 100%.
cal errors or missing patterns in the rules. However, it can also The AI-IDS can also detect unknown variant patterns or
be a detection failure due to low performance or functional obfuscated attacks, as shown in Fig. 8. An attacker can use
failure. URL Encoding or base64 to bypass arbitrary payloads in the
security system to attack web servers effectively. An attacker
A. DETECTION OF OBFUSCATED VARIANTS uses the Char( ) function to insert code into noticeView.jsp
Table 8 shows an example of a variant attack detection. to attempt to acquire system information. In other cases,
A common intrusion pattern is a scan of an admin page or the attacker attempts to send spare-phishing mails, attempting
file upload page, usually accessed by an attacker via a known to communicate with an external SMTP server by inject-
open source path. Suppose that there is an admin page such as ing irregular or encoded code to AspCms_SiteSetting.asp
‘‘http://target.com/admin/index.php’’ and a rule that detects (AspCMS). Recent malicious HTTP payloads contain irreg-
‘‘/admin/index.php’’ in legacy NIDS. The AI-IDS examines ular patterns that are difficult to detect as simple strings.
payloads coming from the trained CNN-LSTM model in A commercial NIDS detects most known attacks or patterns
real-time to detect abnormal URI accesses that detect vari- but does not detect strings that do not have a registered
ant attacks on ‘‘index.php’’ parameters and subpaths. It also pattern. By contrast, the AI-IDS can detect variant and obfus-
detects similar and different new variant attacks for all attack cated attacks that cannot be detected with legacy NIDS.
rewrite detection rules that typically detect PHP webshell [9] N. Shone, T. Nguyen Ngoc, V. Dinh Phai, and Q. Shi, ‘‘A deep learning
code attacks. However, if a general-purpose detection rule approach to network intrusion detection,’’ IEEE Trans. Emerg. Topics
Comput. Intell., vol. 2, no. 1, pp. 41–50, Feb. 2018.
is written without regard to the environment, appropriate [10] K. Wu, Z. Chen, and W. Li, ‘‘A novel intrusion detection model for
optimization tasks are required as the number of detections a massive network using convolutional neural networks,’’ IEEE Access,
increases. vol. 6, pp. 50850–50859, 2018.
[11] S. Naseer, Y. Saleem, S. Khalid, M. K. Bashir, J. Han, M. M. Iqbal,
and K. Han, ‘‘Enhanced network anomaly detection based on deep neural
VII. CONCLUSION networks,’’ IEEE Access, vol. 6, pp. 48231–48246, 2018.
We proposed an optimal CNN-LSTM model based on [12] Y. Ding and Y. Zhai, ‘‘Intrusion detection system for NSL-KDD dataset
using convolutional neural networks,’’ in Proc. 2nd Int. Conf. Comput. Sci.
SFL and successfully applied payload-level deep learn- Artif. Intell. (CSAI), 2018, pp. 81–85.
ing techniques in a high-performance computing environ- [13] S. Otoum, B. Kantarci, and H. T. Mouftah, ‘‘On the feasibility of deep
ment. The AI-IDS distinguishes between normal and abnor- learning in sensor network intrusion detection,’’ IEEE Netw. Lett., vol. 1,
no. 2, pp. 68–71, Jun. 2019.
mal traffic on HTTP traffic that could not be detected in [14] N. Chouhan, A. Khan, and H.-U.-R. Khan, ‘‘Network anomaly detection
legacy signature-based NIDS because AI-IDS can formalize using channel boosted and residual learning based deep convolutional
unknown patterns, help write or improve signature-based neural network,’’ Appl. Soft Comput., vol. 83, Oct. 2019, Art. no. 105612.
[15] R. Vinayakumar, M. Alazab, K. P. Soman, P. Poornachandran,
rules for new vulnerabilities, variants, and bypass attacks. A. Al-Nemrat, and S. Venkatraman, ‘‘Deep learning approach
Network meta-data, without its payload is usually difficult for intelligent intrusion detection system,’’ IEEE Access, vol. 7,
to identify whether it is malicious or not. Thus, we review pp. 41525–41550, 2019.
the HTTP header and body of web attacks in detail. We also [16] Z. Chiba, N. Abghour, K. Moussaid, A. El Omri, and M. Rida, ‘‘Intelligent
approach to build a deep neural network based IDS for cloud environ-
used real-time web traffic for deep learning, but initially, ment using combination of machine learning algorithms,’’ Comput. Secur.,
we learned that AI-IDS needed to be re-validated for pre- vol. 86, pp. 291–317, Sep. 2019.
dicted suspicious events due to false positives alarms. The [17] H. Zhang, B. Zhao, H. Yuan, J. Zhao, X. Yan, and F. Li, ‘‘SQL injection
detection based on deep belief network,’’ in Proc. 3rd Int. Conf. Comput.
AI-IDS performs continuous optimization by re-training Sci. Appl. Eng. (CSAE), 2019, p. 20.
analysis information that is labeled ‘‘benign,’’ ‘‘malicious,’’ [18] O. Faker and E. Dogdu, ‘‘Intrusion detection using big data and deep
and ‘‘unknown.’’ Thus, it should be used as an assistant learning techniques,’’ in Proc. ACM Southeast Conf. ZZZ - ACM SE, 2019,
pp. 86–93.
system until it reaches a high-quality level. If the quality goes [19] M. Aloqaily, S. Otoum, I. A. Ridhawi, and Y. Jararweh, ‘‘An intrusion
beyond the ability of humans by continually learning, it could detection system for connected vehicles in smart cities,’’ Ad Hoc Netw.,
be executed as an automated analysis. Ultimately, the goal of vol. 90, Jul. 2019, Art. no. 101842.
[20] KDD Cup 1999 Data. Accessed: Nov. 17, 2019. [Online]. Available:
AI-IDS is to outperform human analysis quality and to help http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
analysts handle large quantities of unknown security events. [21] M. Ring, S. Wunderlich, D. Scheuring, D. Landes, and A. Hotho, ‘‘A survey
Previous works have mainly considered accuracy (ACC) in of network-based intrusion detection data sets,’’ Comput. Secur., vol. 86,
pp. 147–167, Sep. 2019.
terms of performance measures, but scalability and precision
[22] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, ‘‘A detailed analysis
are also important indicators for applying deep learning in of the KDD CUP 99 data set,’’ in Proc. IEEE Symp. Comput. Intell. Secur.
the real-world. In practical security services, re-validation Defense Appl., Jul. 2009, pp. 1–6.
for predicted events is a required task because of the low [23] A. Shenfield, D. Day, and A. Ayesh, ‘‘Intelligent intrusion detection
systems using artificial neural networks,’’ ICT Express, vol. 4, no. 2,
tolerance for analysis errors. pp. 95–99, Jun. 2018.
[24] R. Singh, H. Kumar, and R. K. Singla, ‘‘A reference dataset for network
traffic activity based intrusion detection system,’’ Int. J. Comput. Commun.
REFERENCES
Control, vol. 10, no. 3, pp. 390–402, 2015.
[1] R. Fielding, J. Gettys, J. C. Mogul, H. Frystyk, and T. Berners-Lee, [25] M. Sabhnani and G. Serpen, ‘‘Why machine learning algorithms fail in
Hypertext Transfer Protocol—HTTP/1.1, document RFC 2616, IETF, Jun. misuse detection on KDD intrusion detection data set,’’ Intell. Data Anal.,
1999. vol. 8, no. 4, pp. 403–415, Oct. 2004.
[2] D. Atienza, Á. Herrero, and E. Corchado, ‘‘Neural analysis of HTTP traffic [26] J. Gu, L. Wang, H. Wang, and S. Wang, ‘‘A novel approach to intru-
for Web attack detection,’’ in Proc. Comput. Intell. Secur. Inf. Syst. Conf. sion detection using SVM ensemble with feature augmentation,’’ Comput.
Cham, Switzerland: Springer, 2015, pp. 201–212. Secur., vol. 86, pp. 53–62, Sep. 2019.
[3] B. Mukherjee, L. T. Heberlein, and K. N. Levitt, ‘‘Network intrusion [27] N. Moustafa, J. Hu, and J. Slay, ‘‘A holistic review of network anomaly
detection,’’ IEEE Netw., vol. 8, no. 3, pp. 26–41, May 1994. detection systems: A comprehensive survey,’’ J. Netw. Comput. Appl.,
[4] P. Mishra, V. Varadharajan, U. Tupakula, and E. S. Pilli, ‘‘A detailed inves- vol. 128, pp. 33–55, Feb. 2019.
tigation and analysis of using machine learning techniques for intrusion [28] Splunk. (2019). Splunk Stream (STM). Accessed: Mar. 20, 2020. [Online].
detection,’’ IEEE Commun. Surveys Tuts., vol. 21, no. 1, pp. 686–728, Available: https://splunkbase.splunk.com/app/1809/
1st Quart., 2019. [29] M. Mamun, R. Lu, and M. Gaudet, ‘‘Tell them from me: An encrypted
[5] FSI AI-IDS Software for Splunk. [Online]. Available: https://github. application profiler,’’ in Proc. Int. Conf. Netw. Syst. Secur., in Lecture Notes
com/ackim-fsi/AI-IDS in Computer Science, vol. 11928. Cham, Switzerland: Springer, 2019,
[6] Y. Liu, S. Liu, and X. Zhao, ‘‘Intrusion detection algorithm based on con- pp. 456–471.
volutional neural network,’’ Beijing Ligong Daxue Xuebao/Trans. Beijing [30] Y. Zhang and B. Wallace, ‘‘A sensitivity analysis of (and practitioners’
Inst. Technol., vol. 37, no. 12, pp. 1271–1275, 2017. guide to) convolutional neural networks for sentence classification,’’ 2015,
[7] W. Wang, Y. Sheng, J. Wang, X. Zeng, X. Ye, Y. Huang, and M. Zhu, arXiv:1510.03820. [Online]. Available: http://arxiv.org/abs/1510.03820
‘‘HAST-IDS: Learning hierarchical spatial-temporal features using deep [31] H. Liu, B. Lang, M. Liu, and H. Yan, ‘‘CNN and RNN based payload
neural networks to improve intrusion detection,’’ IEEE Access, vol. 6, classification methods for attack detection,’’ Knowl.-Based Syst., vol. 163,
pp. 1792–1806, 2018. pp. 332–341, Jan. 2019.
[8] C. Yin, Y. Zhu, J. Fei, and X. He, ‘‘A deep learning approach for intru- [32] Y. Xin, L. Kong, Z. Liu, Y. Chen, Y. Li, H. Zhu, M. Gao, H. Hou, and
sion detection using recurrent neural networks,’’ IEEE Access, vol. 5, C. Wang, ‘‘Machine learning and deep learning methods for cybersecu-
pp. 21954–21961, 2017. rity,’’ IEEE Access, vol. 6, pp. 35365–35381, 2018.
[33] C. T. Giménez, A. P. Villegas, and G. Marañón. (2010). Information MOHYUN PARK received the B.S. degree in
Security Institute of CSIC (Spanish Research National Council). Accessed: computer science from Seoul National Univer-
Nov. 17, 2019. [Online]. Available: https://www.isi.csic.es/dataset/ sity, Seoul, South Korea, in 2013. He is currently
[34] I. Sharafaldin, A. Habibi Lashkari, and A. A. Ghorbani, ‘‘Toward gen- the Manager with the Financial Security Institute
erating a new intrusion detection dataset and intrusion traffic char- (FSI), Yongin, South Korea. His research interests
acterization,’’ in Proc. 4th Int. Conf. Inf. Syst. Secur. Privacy, 2018, include applied deep learning and intrusion detec-
pp. 108–116. Accessed: Nov. 17, 2019. [Online]. Available: https://www. tion on computer networks.
unb.ca/cic/datasets/ids-2017.html