Web Application Attack Detection Using Deep Learning
Web Application Attack Detection Using Deep Learning
Learning
arXiv:2011.03181v1 [cs.CR] 6 Nov 2020
Abstract
Modern web applications are dominated by HTTP/HTTPS messages
that consist of one or more headers, where most of the exploits and pay-
loads can be injected by attackers. According to the OWASP, the 80 per-
cent of the web attacks are done through HTTP/HTTPS requests queries.
In this paper, we present a deep learning based web application attacks
detection model. The model uses auto-encoder that can learn from the
sequences of word and weight each word or character according to them.
The classification engine is trained on ECML-KDD dataset for classifica-
tion of anomaly queries with respect to specific attack type. The proposed
web application detection engine is trained with anomaly and benign web
queries to achieve the accuracy of receiver operating characteristic curve
of 1. The experimental results show that the proposed model can detect
web applications attack successfully with low false positive rate.
Keywords: Web Application; Web Security; Machine Learning; Deep
Learning.
1 Introduction
Web application attacks are found one of the most targets by attackers. Syman-
tec Internet security [9] reported an interesting statistics that 1 in 10 URLs are
identified as being malicious. Web applications typically use the HTTP/HTTPS
protocols supported by other backend and frontend interfaces. According to Im-
perva Web Application Vulnerability Report [9], high severity attacks are injec-
tion attacks, which are being exploited through injecting payloads in HTTP/HTTPS
web queries by using GET, POST and PUT methods. CSRF (Cross Site Re-
quest Forgery) attack, SQL injection attack, XSS (Cross Site Scripts) attacks,
and widely used vulnerable JS libraries, which account for 51 percent, 27 per-
cent, 33 percent and 36 percent, respectively. This paper focuses on the most
frequent types of web-based injection attacks, which includes SQL injection,
XSS (Cross Site Script), RFI (Remote File Inclusion), XXE (XML External
1
Entity), CSRF (Cross Site Request Forgery), and SSRF (Server Side Request
Forgery).
Network Intrusion Detection System (NIDS) monitors the network traffic in
web applications. Web IDS acts as intermediate between web application and
users, as it analyzes web traffics to detect any anomaly or malicious activity
[2]. Generally, there are two types of detection approaches: anomaly-based de-
tection and signature-based detection. The signature-based IDS system uses
a signature concept, more like antivirus detects the virus, when the antivirus
database has that specific kind of virus signature. If attackers create new virus,
the antivirus is of no use if the signature/pattern is not present. Anomaly-based
detection is based on detection unique behavior pattern recognition or any ac-
tivity that differs from previous data or information is fed. When comparing
signature-based detection method with anomaly-based detection method, the
performance of anomaly based detection found high to detect the unknown at-
tacks, but it comes with a cost that it has problem with false positive alarm
rates. After detection of an anomaly it is stored in the database it becomes
a “signature”, and furthermore, there are two detection methods which come
under anomaly detection which is adaptive detection and constant detection [3].
An adaptive detection algorithm analyzes the network traffic of port 80 of web
network that is HTTP traffic, which continuously gets the input of traffic and
analyzes in a timely manner, while constant based detection method analyzes
stores incoming traffic or use to analyze the logs of collected traffic.
The conventional patching approach to mitigating most network layer vulnera-
bilities does not work well in web application vulnerabilities such as SQLi, RCE,
or XSS. The reason behind all these attacks is that modern web applications
are poorly designed with insecure coding. One can follow the OWASP’s secure
coding guidelines to prevent most of the attacks. Adaptive detection model is
effective to detect anomalies and classify them which type of attack it is, so
that the developer at backend can fix that patch or prevent it (e.g., Django
uses CSRF Tokens in the framework to prevent CSRF) attacks which account
for 51 percent of the web attack. At the same time, the model should learn
the patterns over time to detect unknown web attacks and identify which type
of attack vectors are being exploited. Anomaly based detection approaches [2]
usually rely on an adaptive model to identify anomalous web requests, but with
a high degree of false positive rate. In this paper, we come up with a solution
to handle false positive where IDS monitor a system based on their behavior
pattern. There are several reasons why a conventional IDS or web application
firewall does not work, as follows:
2
• High False Positive: Conventional system uses unsupervised learning al-
gorithms such as PCA [5] and SVM [7] to detect web attacks, these ap-
proaches require manual selection of attack specific features [6]. These
conventional methods may achieve acceptable performance, but they face
high false positive rates.
Y = f (X)
If web attacks labeled dataset is trained using supervised algorithms such
as SVM (Support Vector Machine) [1] and Naive Bayes [8], then it classifies
3
anomalous to normal web attack requests. However, the model cannot handle
new types of attack requests and it requires a large amount of labeled dataset.
Unsupervised learning is used mainly with unlabeled dataset. The model sup-
ported by this learning finds patterns from previous sequence or dataset and
identifies or predicts the next one. For exploratory analysis, the unsupervised
learning method is used to automate the identification pattern of data struc-
tures. Using an unsupervised method, one can reduce the dimensions used to
represent data for fewer columns and features. To sort the eigenvectors, Princi-
pal Component Analysis (PCA) [11] is used to compute the eigenvectors of the
co-variance matrix that is “principal axes”. To get the dimensionality reduction,
the centered data were projected into principal axes. Principal component (PC)
scores are a group of score that are obtained from PCA. To create equal number
of new imaginary variables or principle components the relationship between a
batch or group of those PC scores are analyzed. The optimized and maximally
correlated with all of the original group of variables is the first created imagi-
nary variables, then the next created imaginary variable is less correlated and
next is lesser than the previous and it goes on until the point when the principal
components scores predicts any variable from the first created group.
PCA Reconstruction = PC Score × EigenVectors(t) + Mean
The condition of perfect reconstruction of the data or input and the there will
be no dimentionality reduction is when all the p eigenvectors are used and V V t
is the identity matrix. When using large dataset features, whether it is image,
text or video data, one cannot use any machine-learning algorithms directly. In
order to reduce the training time, prepossessing steps are required to clean the
dataset. It is noted that PCA is restricted to a linear map. Autoencoders [10]
can have non linear encoder/decoders. A single layer autoencoder with linear
function is nearly equivalent to PCA. We use sequence-to-sequence autoencoder
in our proposed detection model.
4
reduce the loss between the input and the output layer. We have used non-
linear functions with encoders to get more accuracy when reconstruction of
data is processing. The activation functions used in autoencoders are ReLu and
sigmoid, which are non-linear in nature.
Φ:χ→F (1)
Ψ:F →χ (2)
Z = σ(Wx + b) (4)
With slight different weight, bias and activation function, the output function
or the decoder network is represented in the same way.
′ ′ ′ ′
X = σ (W z + b ) (5)
To train the model for getting optimized results and the loss function in the
equation, the model is trained with back-propagation method.
′ ′ ′ ′ ′
L(x, x ) = ||x − x || = ||x − σ (W (σ(W x + b)) + b )||2 (6)
To reconstruct the input data or input characters, the autoencoders select the
encoder and decoder function for optimization, so that it requires the minimal
information to encode the input data for reconstructing the output.
5
Figure 2: The Proposed System Architecture
1. For the training purpose, large amounts of unlabeled normal HTTP re-
quests are collected from open-source Vulnbank organization, which con-
tains 40k normal HTTP (GET,POST and PUT) methods requests.
6
over the conventional mode of training. For implementation of the pro-
posed model, we consider the following parameters.
′ ′ ′ ′
3. Reconstruction of requests are done by the decoder X = σ (W + b ),
which perfectly reconstructs the given input and evaluates loss function
and accuracy.
We use LSTM layers to train the classification model and fine-tune the model
with hyperparametrs. Every LSTM layer is accompanied by a dropout layer,
which helps to prevent over-fitting by ignoring randomly selected neurons during
training, and hence, reduces the sensitivity to the specific weights of individual
neurons.
The image in Figure-3 is the raw anomaly HTTP requests with XSS attack
vector. Tn data pre-processing step, the raw HTTP data is converted to a single
string and parsed as input to the LSTM cell, which is the passed to the training
phase to train the model.
7
Figure 3: HTTP Requests with XSS attack Vector
- True Positive Rate (TPR) reflects the classifier’s ability to detect members
of the positive class
TP
TPR =
(T P + F N )
- False Positive Rate (FPR) reflects the frequency with which the classifier
makes a mistake by classifying normal state as pathological
FP
FPR =
(F P + T N )
8
Figure 4: ROC Curve of the Proposed Model
5 Conclusion
We discussed an intrusion detection model using deep learning. The proposed
model detects web application attacks autonomously in real-time. The model
uses auto-encoder that can learn from the sequences of word and weight each
word or character according to them. The experimental results show that the
proposed model can detect web applications attack with low false positive rate
and true positive rate is 1. Because of less volume of labeled categorized anoma-
lous dataset, the proposed classification engine is not 100 percent accurate; how-
ever, the classification can be improved with optimized training with a large
volume of dataset, which is left as the future scope of the work.
9
References
[1] C. Cortes and V. Vapnik. Support Vector Machine. In: Machine learning,
20(3):273–297, 1995.
[2] Y. Donga and Y. Zhanga. Adaptively Detecting Malicious Queries in Web
Attacks. https://arxiv.org/pdf/1701.07774.pdf
[3] S. Hochrieter and J. Schmidhuber. Long Short Term Memory. Neural Com-
putation, 1997.
[4] K. L. Ingham and H. Inoue. Comparing Anomaly Detection Techniques for
HTTP. In Proceedings of International Workshop on Recent Advances in
Intrusion Detection, LNCS 4637, Springer, pp. 42–62, 2007.
[5] G. Liu, Z. Yi and S. Yang. A Hierarchical Intrusion Detectionmodel based
on the PCA Neural Networks. In: Neurocomputing, 70(7):1561–1568, 2007.
[6] Y. Pan, F. Sun, Z. Teng, J. White, D. C. Schmidt, J. Staples and L. Krause.
Detecting Web Attacks with end-to-end Deep Learning. In: Journal of In-
ternet Services and Applications, 10(16), 2019.
[7] X. Xu and X. Wang. An Adaptive Network Intrusion Detection method
based on PCA and Support Vector Machines. In: Advanced Data Mining
and Applications, pp. 731–731, 2005.
[8] S. Russell and P. Norvig. Artificial Intelligence - A Modern Approach. Pear-
son, 2009.
[9] Symantec Internet Security Threat. https://cdn2.hubspot.net/hubfs/4595665/
[10] P. Vincent and H. Larochelle. Stacked Denoising Autoencoders: Learning
useful Representations in a Deep Network with a Local Denoising Criterion.
In: Journal of Machine Learning Research, 11, 2010.
[11] S. Wold and K. Esbensen and P. Geladi. Principal Component Analysis.
In: Chemometrics and Intelligent Laboratory Systems, 2(1-3):37–52, 1987.
10
This figure "ROC.png" is available in "png" format from:
http://arxiv.org/ps/2011.03181v1
This figure "encoder_decoder.png" is available in "png" format from:
http://arxiv.org/ps/2011.03181v1
This figure "raw_http.png" is available in "png" format from:
http://arxiv.org/ps/2011.03181v1
This figure "system.png" is available in "png" format from:
http://arxiv.org/ps/2011.03181v1