0% found this document useful (0 votes)
2 views

A_Study_on_SQL_Injection_Detection_AI-based_Perspective

The document presents a study on SQL injection detection using AI techniques, highlighting the vulnerabilities of web applications to SQL injection attacks. It explores various machine learning and deep learning methodologies to enhance the identification of such attacks, emphasizing the importance of improving defense mechanisms. The paper reviews existing algorithms, their performance metrics, and suggests future directions for research in this area.

Uploaded by

raja.2003.ajar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

A_Study_on_SQL_Injection_Detection_AI-based_Perspective

The document presents a study on SQL injection detection using AI techniques, highlighting the vulnerabilities of web applications to SQL injection attacks. It explores various machine learning and deep learning methodologies to enhance the identification of such attacks, emphasizing the importance of improving defense mechanisms. The paper reviews existing algorithms, their performance metrics, and suggests future directions for research in this area.

Uploaded by

raja.2003.ajar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

2023 International Conference on Energy, Materials and Communication Engineering (ICEMCE), December 14 – 15,

2023, Madurai, India

A Study on SQL Injection Detection: AI-based


Perspective
2023 International Conference on Energy, Materials and Communication Engineering (ICEMCE) | 979-8-3503-9337-8/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICEMCE57940.2023.10434216

Dharani Pujitha Manepalli Alekhya Garugu Venkata Amrutha Kandarpa


Dept. of Information Technology Dept. of Information Technology Dept. of Information Technology
GMR Institute of Technology GMR Institute of Technology GMR Institute of Technology
Rajam, India Rajam, India Rajam, India
[email protected] [email protected] [email protected]
Syam Manohar Kummari Sanketh Rohi Jangam Aravind Karrothu
Dept. of Information Technology Dept. of Information Technology Dept. of Information Technology
GMR Institute of Technology GMR Institute of Technology GMR Institute of Technology
Rajam, India Rajam, India Rajam, India
[email protected] [email protected] [email protected]

Abstract—Web applications have revolutionized the way we Attack (SQLiA) is the most common type to vulnerable web
interact with and utilize the internet, offering dynamic and applications and it was shown in Figure 2. In this scenario,
feature-rich experiences. However, alongside their numerous the attacker may enter into the network with some sort of
benefits, web applications also pose security challenges, with queries at the login stage. The web browser will
SQL injection vulnerability being one of the most common and automatically search in the database for that login details in
harmful weaknesses. SQL injection attacks have become a the form of query like SELECT * from Employee WHERE
prominent concern in web application security, as highlighted Empid = 3 OR (1==1) in the database. In the query the
on their Network Security Problems released by OWASP. SQL attacker performs OR operation, the result will always be
injection attacks present a significant danger to web-based
true because if any of the input is true the output will be true
programs, undermining their security and potentially resulting
in unapproved access and data violation. This work explores
so (1==1) will always be true. When the server searches in
the application of DL techniques to enhance the identification the database it will give that the login person is authenticated
of SQL injection Attacks (SQLiA) in web-based programs, so, to prevent these type techniques ML/DL algorithms are
aiming to strengthen their defense mechanisms. The work is used to train the model to identify them initially and deny the
going to analyze various ML/DL-based mechanisms in which access. The other type of scenario have explained that how
algorithms have achieved the best accuracy in terms of SQL injection may affect the normal working principle of
performance. web application and that was depicted in Figure 3.

Keywords—Structured Query Language (SQL), SQL injection


Attack (SQLiA), Web Application Security, Machine Learning
(ML), OWASP.

I. INTRODUCTION
In digital platforms, there are many web applications that
are involved for the communication of various users from
one place to another place via the internet. At the same time,
the web pages are carrying our sensitive information whereas
login credentials, credit card details, and other secure
information at database centers through their respective web
servers. This scenario is depicted at Figure 1.

Fig. 2. SQL injection Attack (SQLiA) scenario-1

In general, the attacker can use various penetration


scanning tools to enter the database using queries. So that we
need to identify such type malicious queries at entry level.
But there is a lack of knowledge to identify those malicious
attacks by using traditional algorithms. To enhance the
knowledge of the existing systems we required some sort of
intellectual and statistical algorithms like Machine Learning
(ML), Deep Learning (DL).
Fig. 1. WEB Application scenario
This paper is going to explain various recent ML/DL
As per Open Web Application Security Project techniques and analyse through their advantages, challenges,
(OWASP) analysis there are various types of security and various open issues. The paper is constructed in various
vulnerabilities that are identified based on the damage of the sections, section -1 is showing introduction, and followed by
e-ecosystem. As per this analysis, we have identified literature review of the existing methodologies, then
injection attacks as one of the most vulnerable attacks in the providing the analysis of state-of-art algorithms with various
digital platform. In the injection attack paradigm, SQL inject parameters like advantages, challenges, and future scope

979-8-3503-9337-8/23/$31.00 ©2023IEEE
Authorized licensed use limited to: Don Bosco Institute of Technology-Bengaluru. Downloaded on March 17,2025 at 07:09:11 UTC from IEEE Xplore. Restrictions apply.
along with types of their performance metrics. Finally, this break the sentences into tokens then do preprocessing. The
article is concluded based on the above analysis. approach employs the following preprocessing technique:
Term Frequency and Inverse Document Frequency (TF-IDF)
is used to clear the unwanted spaces and sentences. It will
give a table as a result that contains the occurrence of each
word across the document. After evolution the model has
given accuracy with 97%. Finally, the authors have
concluded that CNN is the best model for conducting
experiments on various ML models [5]. The authors
mentioned that they have utilized a dataset from the Python
library called Lib-injection, containing the data on all types
of SQL injection queries and plain text data. The paper tells
the process used for data preprocessing, which includes
Fig. 3. SQL injection Attack (SQLiA) scenario-2 tokenization, stop words removal, problem-based filtering,
stemming and word case uniformity. The author mention that
II. LITERATURE SURVEY they have used a deep learning model utilizing the
embedding layer technique, relying on raw data rather than
The authors have explained the scenario of SQLi predefined features [6].
detection with the help of two-stage deep-learning based
structure. In the first stage the authors have suggested offline Abdulmalik Yazeed have proposed a method to enhance
training and other sites have implemented online testing. In the effectiveness of identifying SQL injection attacks. This
the first stage they have used various SQL injection tools for approach comprises three stages: Dataset Phase, Static and
identifying vulnerable and genuine queries. Then these data Dynamic Analysis Phase, and Model Construction Phase.
could be transferred to structured format from unstructured. Here they had tested many algorithms like Random Forest
Finally, the structured data samples are entered into the algorithm, ANN, SVM and Logistic regression. The tenfold
testing stage. After the evaluation, the authors concluded that methodology will be employed to assess and verify the
this work may help to reduce the count of false positives and proposed detection model.[7]. M. Hasan et al., developed a
at the same time false negatives. Along with that they heuristic algorithm rooted in machine learning to mitigate
achieved high accuracy with this mechanism [1]. SQL injection attacks. The process of feature extraction
Muhammad, T., & Ghafory, H. have suggested a new ML- involved the utilization of a MATLAB program, specifically
based algorithms to identify and categorizes SQL injection designed to examine and analyze individual statements. This
assault in web-based application. The model was created program was employed to generate an array of features,
using WEKA 3.8.0 and trained using wait and 10-overlap which subsequently served as the predictive elements for the
cross validation assessments methods. The authors machine learning classifier. The author chose the top five
effectively identified malicious log files by using machine classifiers by considering their accuracy in detection and
learning to distinguish between malicious and normal online proceeded to create a Graphical User Interface (GUI)
requests produced from access log files. In this process the application using these selected classifiers. The study's
authors have used different types of algorithms like Logistic findings indicated that the Ensemble Boosted Trees
Regression, Multilayer Perceptron etc. [2]. algorithm outperformed others, showcasing its capability to
achieve a notably high accuracy (93.8%) in detecting SQL
Mondal B. et.al. had explained about various AI methods injection attacks.
for detecting and preventing SQLi attacks. They used many
classifiers like MLP, Support vector machine, Decision trees, Gandhi et al. have provides a literature conduct a study
etc. those are evaluated by using some performance metrics on the subject of SQL injection vulnerabilities and their
like Confusion matrix &F1 score [3]. The author describes identification. It cites statistics from Open Web Application
one of the process of SQL injection attacks and how Security Project (OWASP) and a study on SQL injection
attackers construct malicious user input crafted to exploit attacks to highlight the severity of the issue. The paper also
SQL syntax to modify SQL statements and illegally access addresses different conventional approaches and machine
the database. ITFIDF algorithm for attack detection: The learning algorithms employed in the detection of SQL
paper uses the ITFIDF algorithm in data mining technology injection attacks, including source code validation, SQL
to identify SQL injection vulnerability. SQL injection attack query verification, and classification algorithms such as
defense model: The paper establishes a SQL injection attack Naive Bayes and decision tree classifier. The paper then
defense model to defend against SQL injection attacks. proposes a hybrid approach combining CNN and BiLSTM
Experimental design and testing: The paper conducts for detecting SQL injection assaults and provides a
experiment to assess the efficacy of SQL injection attack and comparative assessment of various algorithms concerning
defense technologies using relevant SQL attack traffic sets their respective machine learning models. [9].
and hardware and software configurations [4]. K. Zhang proposed a ML based classifier technique to
The authors have discussed a novel approach involving a address the vulnerabilities facing SQLiA on PHP code. The
client-side model, rather than simply focusing on author uses input validation, sanitization features from the
safeguarding the database against injection attacks on the program text file to train and test the classification models.
server side. This model is designed to operate as a rectifier The author scripted Python code to determine if a file
on the client side, aiming to enhance security. Generally, ML invokes PHP library functions to perform input validation or
algorithms use numerical data for training and evaluating the sanitization, or if it is trying to steal the information of the
model but the inputs for queries is a text. So, to convert the users. It determined that the DL algorithms are better than
text into numerical values the author uses text parsing to the ML algorithms, for feature extraction Bag of Words

Authorized licensed use limited to: Don Bosco Institute of Technology-Bengaluru. Downloaded on March 17,2025 at 07:09:11 UTC from IEEE Xplore. Restrictions apply.
(BOW) is better than word2vector feature, Finally the author and a specific model training process is given. Additionally,
said that CNN with bag of words having precision 95% [10]. the paper introduces a positive sample generation method to
The authors had discussed web applications where the equalize the distribution of input data and enhance the
sensitive information is stored, no unauthorized access is training dataset, which can alleviate the issue of overfitting
allowed in databases. They analyzed and tested various SQL [13]. The authors have proposed a ML-based technique to
actions on web applications and he used many techniques classify incoming traffic as normal or malicious. Data is
such as feature ranking & feed forward networks it is used captured in two places-HTTP traffic between the traffic
for selection and detection of employee correlation [11]. generation server and the web app server and the remote
database server is captured. These two sets of data are then
According to this work, the authors have suggested an processed and correlated to create a separate dataset
approach utilizing adaptive deep forest involves significantly containing features from both datasets. The algorithms used
fewer hyperparameters compared to deep neural networks, are evaluated for classification accuracy as well as efficiency
making it more robust to hyperparameter settings. The in terms of time to build models and time to classify the
assessment criteria include accuracy (ACC), precision (P), training data with 5-fold cross validation.[14]. Kevin Ross
recall (R), and F1-score. The experiments were carried out has proposed an idea about unknown attacks can be
using Python and Scikit-learn library to build machine
identified by using ML techniques. This paper collects traffic
learning model, Kera to construct a neural network and from web application host logs to detect attacks and shows
employ TensorFlow as the underlying computing the performance comparison between decision tree and
framework. The method uses adaptive deep forest (ADF) to neural network algorithm, dataset captures traffic to the web
automatically adjust the model parameters throughout the applications and TCP and HTTP packets [15]. Asish Kumar
training process, enhancing the accuracy of detection. [12]. Dalai and Sanjay Kumar Jena said this novel is to prevent
The paper examines the current methods for detecting SQL attacks and explains about different attacks and SQL
SQL injection in intelligent transportation systems and attacks such as type mismatch, commenting the code and
proposes a new method based on LSTM networks and mentioned UNION SELECT statements for data extractions
positive sample generation. The authors compare in injected attacks also, this paper touches runtime
effectiveness in identifying SQL injection statements through monitoring approach it controls during the attack is in static
traditional machine learning techniques (SVM, KNN, analysis phase [16]. We have analysed various ML/DL-based
Decision Tree, NB, RF) and widely employed deep neural techniques with their advantages, limitations, along with
networks (CNN, RNN, MLP). The proposed method utilizes their performance analysis [17]. Also, we have given if any
LSTM to harness the benefits of data sequencing and long- future scope needed for any work, that also we have
term dependencies for the detection of SQL injection attacks, highlighted in the below table 1.

TABLE I. ANALYSIS OF EXISTING METHODOLOGIES


S. No. Author Names Methodology Used Advantages / Limitations Future Scope Performance
2-Stage Deep- Need to do the robustness of
Provide High accuracy and reduce F1 score- 95.64
1 Sun, H. et al., learning based the deep-learning detection
false positives & negatives Accuracy – 99.57
framework models
Integrating real-time
Muhammad, T., & LG, MLP, NB, Requires a large amount of data for detection could further
2 Accuracy - 98.79%
Ghafory, H. SMO model preparation strengthen and improve the
detection of SQL injections
MLP, SVM, Improving the performance
Lacks a comprehensive analysis of
Logistic Regression and precision of machine
3 Mondal, B. et al., the limitations of the proposed AI- Accuracy - 96%
(LR), Naive Bayes, learning-based SQL Injection
based SQLI detection strategies
and Decision Tree. detection algorithms.
Has a lower false alarm rate and It takes more time for large
SQL syntax tree,
4 Sheng, Jingyuan. faster running time compared to dataset at pre-processing Accuracy - 97%
Filtering
traditional defense technology. phase
Naive Bayes, LR,
Krishnan, S. A. et CNN, SVM, Passive Because of manual process it takes web-based application codes,
5 Accuracy - 97%.
al., Aggressive, K-fold more time HTTP scanning proxies
cross-validation
Can recognize SQL injected
6 Jothi, et al., SVM, RN, ANN Relies on single word tokenization. statements effectively with Accuracy - 98%
AI-based techniques
By the help of extracting semantic
Abdulmalik, LG, RN, ANN, Need to focus on an effective
7 features can easily detect —
Yazeed. SVM input validation attack
unauthorized access
Innovative techniques and
A deep learning-powered approach
approaches to enhance
8 Hassan, M. M. et al., CNN and LSTM to detect SQL injection Accuracy - 98.04%
identifying and thwarting
vulnerabilities.
SQL injection vulnerabilities
The use of DL models for
ML has led to improvements in
9 Gandhi, N. et al., CNN-Bi-LSTM detecting more complex SQL Accuracy - 97%
detection than traditional methods
injection attacks
Random Forest, LR, use of ensemble methods to
It is not adopted for large scale web
10 K. Zhang SVM, MLP, LSTM, improve the performance of Accuracy - 95.4%
applications
and CNN the classifier

Authorized licensed use limited to: Don Bosco Institute of Technology-Bengaluru. Downloaded on March 17,2025 at 07:09:11 UTC from IEEE Xplore. Restrictions apply.
There is a scope for
Ensemble Boosted
implement to identify non-
Trees, Ensemble The combination of both ML and
11 Hasan, M. et al., SQL injected statements Accuracy - 99%
Bagged Trees, DL can provide more accuracy
along with more features in
Cubic SVM
dataset
deep neural
Allowing it to capture a wide
networks (DNN), Better detection accuracy on less
range of attack patterns and Accuracy- 90%
12 Li, Q. et al., decision tree (DT), samples, Low computational cost;
adapt to new or unknown
random forest (RF),
types of attacks.
and SVM
Enhance detection accuracy while Can provide more efficiency
13 Li, Q. et al., LSTM minimizing false positives and false with DL Models (CNN and Accuracy - 95.64%
negatives. RNN)
Need more focus on
SVM, RF, Decision This approach doesn’t applicable to
14 Ross, K. et al., implementation by using DL Accuracy – 98.04%
Tree all type of datasets
algorithms
Suggest exploring other ML
Intrusion detection The combination of MLP and SVM
techniques for both accuracy
involve static, techniques has been shown to
15 Ross, K. and performance, indicating Accuracy - 92.03%
signature-based IDS achieve improved accuracy in
a potential for further
rules intrusion detection.
research in this area.
Command Injection, Code
SQL injection It does not involve input filtering so Injection, and File Injection,
Dalai, A. K., &
16 attacks in web it is easier to implement and to provide a more Accuracy - 95%
Jena, S. K.
applications maintain. comprehensive solution for
web application security

III. CONCLUSION [6]. Jothi, K. R., Pandey, N., Beriwal, P., & Amarajan, A. (2021,
In web ecosystem we need to maintain attack less network March). An efficient SQL injection detection system using deep
learning. In 2021 International conference on computational
as well as communicating to each from one device to
intelligence and knowledge economy (ICCIKE) (pp. 442-445).
another device. In this scenario, SQL injection attacks IEEE.
being one of the most widespread and destructive. [7]. Abdulmalik, Y. (2021). An improved SQL injection attack
vulnerabilities in e-platform. To identify those attacks detection model using machine learning techniques. International
(SQLiA) through various pen tools as well need to be Journal of Innovative Computing, 11(1), 53-57.
adopt some sort of intellectual models in web [8]. Hassan, M. M., Ahmad, R. B., & Ghosh, T. (2021). SQL injection
vulnerability detection using deep learning: a feature-based
environment. As per this Analysis, we have gathered
approach. Indonesian Journal of Electrical Engineering and
related information on detection models of SQLiA with a Informatics (IJEEI), 9(3), 702-718.
range of performance metrics such as Accuracy, [9]. Gandhi, N., Patel, J., Sisodiya, R., Doshi, N., & Mishra, S. (2021,
Precision, Recall, F1 score and so on. After analyzing all March). A CNN-BiLSTM based approach for detection of SQL
the works, we concluded that DL models have given best injection attacks. In 2021 International conference on
performance to detect SQLiA with higher accuracy than computational intelligence and knowledge economy (ICCIKE)
ML. Namely, MLP, CNN, LSTM are perfectly apt for (pp. 378-383). IEEE.
[10]. Zhang, K. (2019, November). A machine learning based approach
detect SQLiA in web applications. Mainly, out of these to identify SQL injection vulnerabilities. In 2019 34th IEEE/ACM
three models MLP is well suitable for identifying SQLiA International Conference on Automated Software Engineering
with high accuracy of 96% to 98.79%. In future, we are (ASE) (pp. 1286-1288). IEEE.
going to enhance existing model and try to provide a new [11]. Hasan, M., Balbahaith, Z., & Tarique, M. (2019, November).
approach that will perform well in all aspects of Detection of SQL injection attacks: a machine learning approach.
performance metrics. Also, we are planning to study more In 2019 International Conference on Electrical and Computing
Technologies and Applications (ICECTA) (pp. 1-6). IEEE.
and more related articles of ML/DL based SQLiA [12]. Li, Q., Li, W., Wang, J., & Cheng, M. (2019). A SQL injection
detection models and will provide the best suitable model. detection method based on adaptive deep forest. IEEE Access, 7,
145385-145394.
REFERENCES: [13]. Li, Q., Wang, F., Wang, J., & Li, W. (2019). LSTM-based SQL
injection detection method for intelligent transportation system.
[1]. Sun, H., Du, Y., & Li, Q. (2023). Deep Learning-Based Detection IEEE Transactions on Vehicular Technology, 68(5), 4182-4191.
Technology for SQL Injection Research and Implementation. [14]. Ross, K., Moh, M., Moh, T. S., & Yao, J. (2018, March). Multi-
Applied Sciences, 13(16), 9466. source data analysis and evaluation of machine learning
[2]. Muhammad, T., & Ghafory, H. (2022). SQL Injection Attack techniques for SQL injection detection. In Proceedings of the
Detection Using Machine Learning Algorithm. Mesopotamian ACMSE 2018 Conference (pp. 1-8).
journal of cybersecurity, 2022, 5-17. [15]. Ross, K. (2018). SQL injection detection using machine learning
[3]. Mondal, B., Banerjee, A., & Gupta, S. (2022). A review of SQLI techniques and multiple data sources.
detection strategies using machine learning. machine learning, [16]. Dalai, A. K., & Jena, S. K. (2017). Neutralizing SQL injection
6(S2), 9664-9677. attack using server-side code modification in web applications.
[4]. Sheng, J. (2022). Research on SQL Injection Attack and Defense Security and Communication Networks, 2017.
Technology of Power Dispatching Data Network: Based on Data [17]. Brindavathi, B., Karrothu, A., & Anilkumar, C. (2023, August).
Mining. Mobile Information Systems, 2022. An Analysis of AI-based SQL Injection (SQLi) Attack Detection.
[5]. Krishnan, S. A., Sabu, A. N., Sajan, P. P., & Sreedeep, A. L. In 2023 Second International Conference on Augmented
(2021). SQL injection detection using machine learning. vol, 11, Intelligence and Sustainable Systems (ICAISS) (pp. 31-35). IEEE.
11.

Authorized licensed use limited to: Don Bosco Institute of Technology-Bengaluru. Downloaded on March 17,2025 at 07:09:11 UTC from IEEE Xplore. Restrictions apply.

You might also like