0% found this document useful (0 votes)

100 views14 pages

Web Application Attack Detection Using Deep Learning

The document presents a deep learning model called SWAD for detecting web application attacks in real-time. SWAD uses an autoencoder model trained on a labeled dataset to learn patterns in web requests. It is trained on both anomalous and benign requests to classify new requests and detect attacks with high true positive rates and low false positive rates. The experimental results show that SWAD can successfully detect various types of web application attacks.

Uploaded by

Toheed Ejaz Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

100 views14 pages

Web Application Attack Detection Using Deep Learning

Uploaded by

Toheed Ejaz Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Web Application Attack Detection using Deep

Learning
arXiv:2011.03181v1 [cs.CR] 6 Nov 2020

Tikam Alma and Manik Lal Das

DA-IICT
Gandhinagar, India
Email: {201601030, maniklal das}@daiict.ac.in

Abstract
Modern web applications are dominated by HTTP/HTTPS messages
that consist of one or more headers, where most of the exploits and pay-
loads can be injected by attackers. According to the OWASP, the 80 per-
cent of the web attacks are done through HTTP/HTTPS requests queries.
In this paper, we present a deep learning based web application attacks
detection model. The model uses auto-encoder that can learn from the
sequences of word and weight each word or character according to them.
The classification engine is trained on ECML-KDD dataset for classifica-
tion of anomaly queries with respect to specific attack type. The proposed
web application detection engine is trained with anomaly and benign web
queries to achieve the accuracy of receiver operating characteristic curve
of 1. The experimental results show that the proposed model can detect
web applications attack successfully with low false positive rate.
Keywords: Web Application; Web Security; Machine Learning; Deep
Learning.

1 Introduction
Web application attacks are found one of the most targets by attackers. Syman-
tec Internet security [9] reported an interesting statistics that 1 in 10 URLs are
identified as being malicious. Web applications typically use the HTTP/HTTPS
protocols supported by other backend and frontend interfaces. According to Im-
perva Web Application Vulnerability Report [9], high severity attacks are injec-
tion attacks, which are being exploited through injecting payloads in HTTP/HTTPS
web queries by using GET, POST and PUT methods. CSRF (Cross Site Re-
quest Forgery) attack, SQL injection attack, XSS (Cross Site Scripts) attacks,
and widely used vulnerable JS libraries, which account for 51 percent, 27 per-
cent, 33 percent and 36 percent, respectively. This paper focuses on the most
frequent types of web-based injection attacks, which includes SQL injection,
XSS (Cross Site Script), RFI (Remote File Inclusion), XXE (XML External

1
Entity), CSRF (Cross Site Request Forgery), and SSRF (Server Side Request
Forgery).
Network Intrusion Detection System (NIDS) monitors the network traffic in
web applications. Web IDS acts as intermediate between web application and
users, as it analyzes web traffics to detect any anomaly or malicious activity
[2]. Generally, there are two types of detection approaches: anomaly-based de-
tection and signature-based detection. The signature-based IDS system uses
a signature concept, more like antivirus detects the virus, when the antivirus
database has that specific kind of virus signature. If attackers create new virus,
the antivirus is of no use if the signature/pattern is not present. Anomaly-based
detection is based on detection unique behavior pattern recognition or any ac-
tivity that differs from previous data or information is fed. When comparing
signature-based detection method with anomaly-based detection method, the
performance of anomaly based detection found high to detect the unknown at-
tacks, but it comes with a cost that it has problem with false positive alarm
rates. After detection of an anomaly it is stored in the database it becomes
a “signature”, and furthermore, there are two detection methods which come
under anomaly detection which is adaptive detection and constant detection [3].
An adaptive detection algorithm analyzes the network traffic of port 80 of web
network that is HTTP traffic, which continuously gets the input of traffic and
analyzes in a timely manner, while constant based detection method analyzes
stores incoming traffic or use to analyze the logs of collected traffic.
The conventional patching approach to mitigating most network layer vulnera-
bilities does not work well in web application vulnerabilities such as SQLi, RCE,
or XSS. The reason behind all these attacks is that modern web applications
are poorly designed with insecure coding. One can follow the OWASP’s secure
coding guidelines to prevent most of the attacks. Adaptive detection model is
effective to detect anomalies and classify them which type of attack it is, so
that the developer at backend can fix that patch or prevent it (e.g., Django
uses CSRF Tokens in the framework to prevent CSRF) attacks which account
for 51 percent of the web attack. At the same time, the model should learn
the patterns over time to detect unknown web attacks and identify which type
of attack vectors are being exploited. Anomaly based detection approaches [2]
usually rely on an adaptive model to identify anomalous web requests, but with
a high degree of false positive rate. In this paper, we come up with a solution
to handle false positive where IDS monitor a system based on their behavior
pattern. There are several reasons why a conventional IDS or web application
firewall does not work, as follows:

• Limited Dataset: To collect and capture a large amount of anomalous

data, one has to set-up a system or an automated system that captures the
attack requests and classify them whether they are anomalous or normal
requests. This is more like a attack-defense simulation system, but smart
enough to classify, that does not need to be labeled manually and it could
save lots of time.

2
• High False Positive: Conventional system uses unsupervised learning al-
gorithms such as PCA [5] and SVM [7] to detect web attacks, these ap-
proaches require manual selection of attack specific features [6]. These
conventional methods may achieve acceptable performance, but they face
high false positive rates.

• Labeled Dataset: Conventional IDS uses rule-based or conditional strate-

gies or supervised algorithms like support vector machines or decision
trees to classify normal traffic requests from attack requests, which re-
quires large database to get the accurate results [6].

In this paper, we present a web application attacks detection model, SWAD,

based on deep learning technique that detects web application attacks au-
tonomously in real-time. The model uses auto-encoder that can learn from
the sequences of word and weight each word or character according to them.
The classification engine is trained on ECML-KDD dataset for classification of
anomaly queries with respect to specific attack type. We have implemented the
model on sequence to sequence model, which consists of encoder and decoder,
that sets its target values equal to its input values. The proposed SWAD model
first uses 40,000 web requests, both anomaly and benign nature, for training
and then 20,000 anomaly web requests and responses for training the model.
The experimental results of the proposed model show that it can detect web
applications attack with true positive rate is 1 and low false positive rate.
The paper is organized as follows: Section II summarizes the background and
related works. Section III describes the system design. Section IV evaluates the
performance of the proposed model. Section V concludes the paper.

2 Background and Related work

2.1 Deep Learning for Web Attack Detection
There are two categories of Machine Learning approaches for detecting web at-
tacks: unsupervised and supervised learning. Supervised learning is the learn-
ing approach feeds mapped labeled data which then outputs the expected data,
which is simply mapping of input functions to dataset and expecting new in-
put with learned labels at the output. For the classification of data, the most
common algorithm is supervised learning, which is used to learn the machine
learning model to train and identify the data using the labels which are mapped.
The concept of the algorithm is to learn the mapping function of a given input
to the output, where the output is defined by variable Y and input is defined
by variable x.

Y = f (X)
If web attacks labeled dataset is trained using supervised algorithms such
as SVM (Support Vector Machine) [1] and Naive Bayes [8], then it classifies

3
anomalous to normal web attack requests. However, the model cannot handle
new types of attack requests and it requires a large amount of labeled dataset.
Unsupervised learning is used mainly with unlabeled dataset. The model sup-
ported by this learning finds patterns from previous sequence or dataset and
identifies or predicts the next one. For exploratory analysis, the unsupervised
learning method is used to automate the identification pattern of data struc-
tures. Using an unsupervised method, one can reduce the dimensions used to
represent data for fewer columns and features. To sort the eigenvectors, Princi-
pal Component Analysis (PCA) [11] is used to compute the eigenvectors of the
co-variance matrix that is “principal axes”. To get the dimensionality reduction,
the centered data were projected into principal axes. Principal component (PC)
scores are a group of score that are obtained from PCA. To create equal number
of new imaginary variables or principle components the relationship between a
batch or group of those PC scores are analyzed. The optimized and maximally
correlated with all of the original group of variables is the first created imagi-
nary variables, then the next created imaginary variable is less correlated and
next is lesser than the previous and it goes on until the point when the principal
components scores predicts any variable from the first created group.
PCA Reconstruction = PC Score × EigenVectors(t) + Mean
The condition of perfect reconstruction of the data or input and the there will
be no dimentionality reduction is when all the p eigenvectors are used and V V t
is the identity matrix. When using large dataset features, whether it is image,
text or video data, one cannot use any machine-learning algorithms directly. In
order to reduce the training time, prepossessing steps are required to clean the
dataset. It is noted that PCA is restricted to a linear map. Autoencoders [10]
can have non linear encoder/decoders. A single layer autoencoder with linear
function is nearly equivalent to PCA. We use sequence-to-sequence autoencoder
in our proposed detection model.

Figure 1: Autoencoder basic architecture

The condition to make the autoencoder equivalent to principal component

analysis is that if normalized inputs are used with the linear decoder, linear
encoder and square error loss function, thenautoencoders are not restricted to
linear maps. The proposed model is optimized and trained to minimize and

4
reduce the loss between the input and the output layer. We have used non-
linear functions with encoders to get more accuracy when reconstruction of
data is processing. The activation functions used in autoencoders are ReLu and
sigmoid, which are non-linear in nature.

Φ:χ→F (1)

Ψ:F →χ (2)

Φ, Ψ = argminΦ ,Ψ ||X = (Φ ∗ Ψ)X||2 (3)

The encoder function, denoted by Φ, maps the original data X to a latent
space F. The decoder function, denoted by Ψ, maps the latent space F to the
output. We basically recreate the original image after some generalized non-
linear compression. The encoding network can be represented by the standard
neural network function passed through an activation function, where z is the
hidden dimension. The output works the same as the input.

Z = σ(Wx + b) (4)
With slight different weight, bias and activation function, the output function
or the decoder network is represented in the same way.
′ ′ ′ ′
X = σ (W z + b ) (5)
To train the model for getting optimized results and the loss function in the
equation, the model is trained with back-propagation method.
′ ′ ′ ′ ′
L(x, x ) = ||x − x || = ||x − σ (W (σ(W x + b)) + b )||2 (6)
To reconstruct the input data or input characters, the autoencoders select the
encoder and decoder function for optimization, so that it requires the minimal
information to encode the input data for reconstructing the output.

3 The Proposed Model

The proposed detection engine uses an autoencoder model based on sequence
to sequence architecture that is made up of LSTM (Long Short Term Memory)
[3] cells. LSTM networks are complex neural networks that are used to train
ordered sequences of inputs to remember it and re-create it. The proposed
model devises the LSTM neural network model, which feeds sequenced inputs.
After completing the reading input processes, the output is given by an internal
learned representation of the fed input sequences as a fixed-length vector. Then,
output vector is fed inputs that interprets the input sequence by sequence at
each step and the output is generated.
The proposed detection and classification model works in synchronization as
follows:

5
Figure 2: The Proposed System Architecture

1. For the training purpose, large amounts of unlabeled normal HTTP re-
quests are collected from open-source Vulnbank organization, which con-
tains 40k normal HTTP (GET,POST and PUT) methods requests.

2. For the auto-encoder’s (Encoder-Decoder) architecture, the hyper-parameters

are trained by setting the problem as a grid search problem. Each hyper-
parameter combination requires training the neuron weights for the hidden
layer(s), which results in increasing computational complexity with an in-
crease in the number of layers and number of nodes within each layer.
To deal with these critical parameters and training issues, stacked auto-
encoder concepts have been proposed that trains each layer separately to
get pre-trained weights. Then the model is fine-tuned using the obtained
weights. This approach significantly improves the training performance

6
over the conventional mode of training. For implementation of the pro-
posed model, we consider the following parameters.

Batch Size = 128

Embed Size = 64
Hidden Size = 64
Number of Layers = 2
Dropout Rate = 0.7

′ ′ ′ ′
3. Reconstruction of requests are done by the decoder X = σ (W + b ),
which perfectly reconstructs the given input and evaluates loss function
and accuracy.

4. When a new requests is given as input to the trained autoencoder, it de-

codes and encodes the requests vector and calculates the reconstruction
or loss error. If loss error is larger than the learned threshold θ, it catego-
rizes as anomalous requests. If loss error is smaller than θ, it categorizes
as normal requests.

5. After categorizing requests into normal and anomalous requests, normal

requests are sent to the database for retraining or re-learning, so that over
time the detection model learns new type of requests patterns. Anomalous
requests are sent to the classification model which further categorizes the
anomalous requests into which type of attack it was exploited through
requests like SQLi, XSS or CSRF.

6. The classification model is trained on larger number of labeled attack

vectors HTTP requests. It contains 7 class of attacks which are Os-
Commanding, PathTraversal, SQLi, X-PathInjection, LDAPInjection, SSI,
and XSS.

We use LSTM layers to train the classification model and fine-tune the model
with hyperparametrs. Every LSTM layer is accompanied by a dropout layer,
which helps to prevent over-fitting by ignoring randomly selected neurons during
training, and hence, reduces the sensitivity to the specific weights of individual
neurons.
The image in Figure-3 is the raw anomaly HTTP requests with XSS attack
vector. Tn data pre-processing step, the raw HTTP data is converted to a single
string and parsed as input to the LSTM cell, which is the passed to the training
phase to train the model.

7
Figure 3: HTTP Requests with XSS attack Vector

4 Experimental Results and Evaluation

We have experimented the proposed model with 40,000 web requests followed by
20,000 anomaly web requests and responses. The classification engine is trained
on ECML-KDD dataset for classification of anomaly queries with respect to
specific attack type. We have evaluated the proposed model on ROC curve. An
ROC curve is a graph showing the performance of a classification model at all
classification thresholds. The ROC curve plots two parameters - true positive
rate and false positive rate. A false positive (FP) or false alarm, which refers
to the detection of benign traffic as an attack. A false negative (FN) refers to
detecting attack traffic as benign traffic. A key goal of an intrusion detection
system is to minimize both the FP rate and FN rate. We use the following
parameters to evaluate the proposed model’s performance:
- True Positive (TP): the number of observations correctly assigned to the
positive class.

- False Positive (FP): the number of observations assigned by the model to

the positive class.

- True Positive Rate (TPR) reflects the classifier’s ability to detect members
of the positive class
TP
TPR =
(T P + F N )

- False Positive Rate (FPR) reflects the frequency with which the classifier
makes a mistake by classifying normal state as pathological
FP
FPR =
(F P + T N )

An ROC curve plots TPR versus FPR at different classification thresholds.

Lowering the classification threshold classifies more items as positive, and thus,
increasing both false positives and true positives.

8
Figure 4: ROC Curve of the Proposed Model

As defining normality with a descriptive feature set is difficult, anomalies

raised by systems can sometime be detected with false alarms (false positives)
or missed alerts (false negatives). With the ROC curve, the closer the graph
is to the top and left-hand borders, the more accurate the test. Similarly, the
closer the graph to the diagonal, the less accurate the test. The experimental
results obtained on the proposed model are as follows:
Precision: 0.9979
Recall: 1.00
Number of True Positive: 1097
Number of Samples: 1097
True Positive Rate: 1.00
Number of False Positive: 7
Number of samples: 2200
False Positive Rate: 0.0032

5 Conclusion
We discussed an intrusion detection model using deep learning. The proposed
model detects web application attacks autonomously in real-time. The model
uses auto-encoder that can learn from the sequences of word and weight each
word or character according to them. The experimental results show that the
proposed model can detect web applications attack with low false positive rate
and true positive rate is 1. Because of less volume of labeled categorized anoma-
lous dataset, the proposed classification engine is not 100 percent accurate; how-
ever, the classification can be improved with optimized training with a large
volume of dataset, which is left as the future scope of the work.

9
References
[1] C. Cortes and V. Vapnik. Support Vector Machine. In: Machine learning,
20(3):273–297, 1995.
[2] Y. Donga and Y. Zhanga. Adaptively Detecting Malicious Queries in Web
Attacks. https://arxiv.org/pdf/1701.07774.pdf
[3] S. Hochrieter and J. Schmidhuber. Long Short Term Memory. Neural Com-
putation, 1997.
[4] K. L. Ingham and H. Inoue. Comparing Anomaly Detection Techniques for
HTTP. In Proceedings of International Workshop on Recent Advances in
Intrusion Detection, LNCS 4637, Springer, pp. 42–62, 2007.
[5] G. Liu, Z. Yi and S. Yang. A Hierarchical Intrusion Detectionmodel based
on the PCA Neural Networks. In: Neurocomputing, 70(7):1561–1568, 2007.
[6] Y. Pan, F. Sun, Z. Teng, J. White, D. C. Schmidt, J. Staples and L. Krause.
Detecting Web Attacks with end-to-end Deep Learning. In: Journal of In-
ternet Services and Applications, 10(16), 2019.
[7] X. Xu and X. Wang. An Adaptive Network Intrusion Detection method
based on PCA and Support Vector Machines. In: Advanced Data Mining
and Applications, pp. 731–731, 2005.
[8] S. Russell and P. Norvig. Artificial Intelligence - A Modern Approach. Pear-
son, 2009.
[9] Symantec Internet Security Threat. https://cdn2.hubspot.net/hubfs/4595665/
[10] P. Vincent and H. Larochelle. Stacked Denoising Autoencoders: Learning
useful Representations in a Deep Network with a Local Denoising Criterion.
In: Journal of Machine Learning Research, 11, 2010.
[11] S. Wold and K. Esbensen and P. Geladi. Principal Component Analysis.
In: Chemometrics and Intelligent Laboratory Systems, 2(1-3):37–52, 1987.