A_Study_on_SQL_Injection_Detection_AI-based_Perspective
A_Study_on_SQL_Injection_Detection_AI-based_Perspective
Abstract—Web applications have revolutionized the way we Attack (SQLiA) is the most common type to vulnerable web
interact with and utilize the internet, offering dynamic and applications and it was shown in Figure 2. In this scenario,
feature-rich experiences. However, alongside their numerous the attacker may enter into the network with some sort of
benefits, web applications also pose security challenges, with queries at the login stage. The web browser will
SQL injection vulnerability being one of the most common and automatically search in the database for that login details in
harmful weaknesses. SQL injection attacks have become a the form of query like SELECT * from Employee WHERE
prominent concern in web application security, as highlighted Empid = 3 OR (1==1) in the database. In the query the
on their Network Security Problems released by OWASP. SQL attacker performs OR operation, the result will always be
injection attacks present a significant danger to web-based
true because if any of the input is true the output will be true
programs, undermining their security and potentially resulting
in unapproved access and data violation. This work explores
so (1==1) will always be true. When the server searches in
the application of DL techniques to enhance the identification the database it will give that the login person is authenticated
of SQL injection Attacks (SQLiA) in web-based programs, so, to prevent these type techniques ML/DL algorithms are
aiming to strengthen their defense mechanisms. The work is used to train the model to identify them initially and deny the
going to analyze various ML/DL-based mechanisms in which access. The other type of scenario have explained that how
algorithms have achieved the best accuracy in terms of SQL injection may affect the normal working principle of
performance. web application and that was depicted in Figure 3.
I. INTRODUCTION
In digital platforms, there are many web applications that
are involved for the communication of various users from
one place to another place via the internet. At the same time,
the web pages are carrying our sensitive information whereas
login credentials, credit card details, and other secure
information at database centers through their respective web
servers. This scenario is depicted at Figure 1.
979-8-3503-9337-8/23/$31.00 ©2023IEEE
Authorized licensed use limited to: Don Bosco Institute of Technology-Bengaluru. Downloaded on March 17,2025 at 07:09:11 UTC from IEEE Xplore. Restrictions apply.
along with types of their performance metrics. Finally, this break the sentences into tokens then do preprocessing. The
article is concluded based on the above analysis. approach employs the following preprocessing technique:
Term Frequency and Inverse Document Frequency (TF-IDF)
is used to clear the unwanted spaces and sentences. It will
give a table as a result that contains the occurrence of each
word across the document. After evolution the model has
given accuracy with 97%. Finally, the authors have
concluded that CNN is the best model for conducting
experiments on various ML models [5]. The authors
mentioned that they have utilized a dataset from the Python
library called Lib-injection, containing the data on all types
of SQL injection queries and plain text data. The paper tells
the process used for data preprocessing, which includes
Fig. 3. SQL injection Attack (SQLiA) scenario-2 tokenization, stop words removal, problem-based filtering,
stemming and word case uniformity. The author mention that
II. LITERATURE SURVEY they have used a deep learning model utilizing the
embedding layer technique, relying on raw data rather than
The authors have explained the scenario of SQLi predefined features [6].
detection with the help of two-stage deep-learning based
structure. In the first stage the authors have suggested offline Abdulmalik Yazeed have proposed a method to enhance
training and other sites have implemented online testing. In the effectiveness of identifying SQL injection attacks. This
the first stage they have used various SQL injection tools for approach comprises three stages: Dataset Phase, Static and
identifying vulnerable and genuine queries. Then these data Dynamic Analysis Phase, and Model Construction Phase.
could be transferred to structured format from unstructured. Here they had tested many algorithms like Random Forest
Finally, the structured data samples are entered into the algorithm, ANN, SVM and Logistic regression. The tenfold
testing stage. After the evaluation, the authors concluded that methodology will be employed to assess and verify the
this work may help to reduce the count of false positives and proposed detection model.[7]. M. Hasan et al., developed a
at the same time false negatives. Along with that they heuristic algorithm rooted in machine learning to mitigate
achieved high accuracy with this mechanism [1]. SQL injection attacks. The process of feature extraction
Muhammad, T., & Ghafory, H. have suggested a new ML- involved the utilization of a MATLAB program, specifically
based algorithms to identify and categorizes SQL injection designed to examine and analyze individual statements. This
assault in web-based application. The model was created program was employed to generate an array of features,
using WEKA 3.8.0 and trained using wait and 10-overlap which subsequently served as the predictive elements for the
cross validation assessments methods. The authors machine learning classifier. The author chose the top five
effectively identified malicious log files by using machine classifiers by considering their accuracy in detection and
learning to distinguish between malicious and normal online proceeded to create a Graphical User Interface (GUI)
requests produced from access log files. In this process the application using these selected classifiers. The study's
authors have used different types of algorithms like Logistic findings indicated that the Ensemble Boosted Trees
Regression, Multilayer Perceptron etc. [2]. algorithm outperformed others, showcasing its capability to
achieve a notably high accuracy (93.8%) in detecting SQL
Mondal B. et.al. had explained about various AI methods injection attacks.
for detecting and preventing SQLi attacks. They used many
classifiers like MLP, Support vector machine, Decision trees, Gandhi et al. have provides a literature conduct a study
etc. those are evaluated by using some performance metrics on the subject of SQL injection vulnerabilities and their
like Confusion matrix &F1 score [3]. The author describes identification. It cites statistics from Open Web Application
one of the process of SQL injection attacks and how Security Project (OWASP) and a study on SQL injection
attackers construct malicious user input crafted to exploit attacks to highlight the severity of the issue. The paper also
SQL syntax to modify SQL statements and illegally access addresses different conventional approaches and machine
the database. ITFIDF algorithm for attack detection: The learning algorithms employed in the detection of SQL
paper uses the ITFIDF algorithm in data mining technology injection attacks, including source code validation, SQL
to identify SQL injection vulnerability. SQL injection attack query verification, and classification algorithms such as
defense model: The paper establishes a SQL injection attack Naive Bayes and decision tree classifier. The paper then
defense model to defend against SQL injection attacks. proposes a hybrid approach combining CNN and BiLSTM
Experimental design and testing: The paper conducts for detecting SQL injection assaults and provides a
experiment to assess the efficacy of SQL injection attack and comparative assessment of various algorithms concerning
defense technologies using relevant SQL attack traffic sets their respective machine learning models. [9].
and hardware and software configurations [4]. K. Zhang proposed a ML based classifier technique to
The authors have discussed a novel approach involving a address the vulnerabilities facing SQLiA on PHP code. The
client-side model, rather than simply focusing on author uses input validation, sanitization features from the
safeguarding the database against injection attacks on the program text file to train and test the classification models.
server side. This model is designed to operate as a rectifier The author scripted Python code to determine if a file
on the client side, aiming to enhance security. Generally, ML invokes PHP library functions to perform input validation or
algorithms use numerical data for training and evaluating the sanitization, or if it is trying to steal the information of the
model but the inputs for queries is a text. So, to convert the users. It determined that the DL algorithms are better than
text into numerical values the author uses text parsing to the ML algorithms, for feature extraction Bag of Words
Authorized licensed use limited to: Don Bosco Institute of Technology-Bengaluru. Downloaded on March 17,2025 at 07:09:11 UTC from IEEE Xplore. Restrictions apply.
(BOW) is better than word2vector feature, Finally the author and a specific model training process is given. Additionally,
said that CNN with bag of words having precision 95% [10]. the paper introduces a positive sample generation method to
The authors had discussed web applications where the equalize the distribution of input data and enhance the
sensitive information is stored, no unauthorized access is training dataset, which can alleviate the issue of overfitting
allowed in databases. They analyzed and tested various SQL [13]. The authors have proposed a ML-based technique to
actions on web applications and he used many techniques classify incoming traffic as normal or malicious. Data is
such as feature ranking & feed forward networks it is used captured in two places-HTTP traffic between the traffic
for selection and detection of employee correlation [11]. generation server and the web app server and the remote
database server is captured. These two sets of data are then
According to this work, the authors have suggested an processed and correlated to create a separate dataset
approach utilizing adaptive deep forest involves significantly containing features from both datasets. The algorithms used
fewer hyperparameters compared to deep neural networks, are evaluated for classification accuracy as well as efficiency
making it more robust to hyperparameter settings. The in terms of time to build models and time to classify the
assessment criteria include accuracy (ACC), precision (P), training data with 5-fold cross validation.[14]. Kevin Ross
recall (R), and F1-score. The experiments were carried out has proposed an idea about unknown attacks can be
using Python and Scikit-learn library to build machine
identified by using ML techniques. This paper collects traffic
learning model, Kera to construct a neural network and from web application host logs to detect attacks and shows
employ TensorFlow as the underlying computing the performance comparison between decision tree and
framework. The method uses adaptive deep forest (ADF) to neural network algorithm, dataset captures traffic to the web
automatically adjust the model parameters throughout the applications and TCP and HTTP packets [15]. Asish Kumar
training process, enhancing the accuracy of detection. [12]. Dalai and Sanjay Kumar Jena said this novel is to prevent
The paper examines the current methods for detecting SQL attacks and explains about different attacks and SQL
SQL injection in intelligent transportation systems and attacks such as type mismatch, commenting the code and
proposes a new method based on LSTM networks and mentioned UNION SELECT statements for data extractions
positive sample generation. The authors compare in injected attacks also, this paper touches runtime
effectiveness in identifying SQL injection statements through monitoring approach it controls during the attack is in static
traditional machine learning techniques (SVM, KNN, analysis phase [16]. We have analysed various ML/DL-based
Decision Tree, NB, RF) and widely employed deep neural techniques with their advantages, limitations, along with
networks (CNN, RNN, MLP). The proposed method utilizes their performance analysis [17]. Also, we have given if any
LSTM to harness the benefits of data sequencing and long- future scope needed for any work, that also we have
term dependencies for the detection of SQL injection attacks, highlighted in the below table 1.
Authorized licensed use limited to: Don Bosco Institute of Technology-Bengaluru. Downloaded on March 17,2025 at 07:09:11 UTC from IEEE Xplore. Restrictions apply.
There is a scope for
Ensemble Boosted
implement to identify non-
Trees, Ensemble The combination of both ML and
11 Hasan, M. et al., SQL injected statements Accuracy - 99%
Bagged Trees, DL can provide more accuracy
along with more features in
Cubic SVM
dataset
deep neural
Allowing it to capture a wide
networks (DNN), Better detection accuracy on less
range of attack patterns and Accuracy- 90%
12 Li, Q. et al., decision tree (DT), samples, Low computational cost;
adapt to new or unknown
random forest (RF),
types of attacks.
and SVM
Enhance detection accuracy while Can provide more efficiency
13 Li, Q. et al., LSTM minimizing false positives and false with DL Models (CNN and Accuracy - 95.64%
negatives. RNN)
Need more focus on
SVM, RF, Decision This approach doesn’t applicable to
14 Ross, K. et al., implementation by using DL Accuracy – 98.04%
Tree all type of datasets
algorithms
Suggest exploring other ML
Intrusion detection The combination of MLP and SVM
techniques for both accuracy
involve static, techniques has been shown to
15 Ross, K. and performance, indicating Accuracy - 92.03%
signature-based IDS achieve improved accuracy in
a potential for further
rules intrusion detection.
research in this area.
Command Injection, Code
SQL injection It does not involve input filtering so Injection, and File Injection,
Dalai, A. K., &
16 attacks in web it is easier to implement and to provide a more Accuracy - 95%
Jena, S. K.
applications maintain. comprehensive solution for
web application security
III. CONCLUSION [6]. Jothi, K. R., Pandey, N., Beriwal, P., & Amarajan, A. (2021,
In web ecosystem we need to maintain attack less network March). An efficient SQL injection detection system using deep
learning. In 2021 International conference on computational
as well as communicating to each from one device to
intelligence and knowledge economy (ICCIKE) (pp. 442-445).
another device. In this scenario, SQL injection attacks IEEE.
being one of the most widespread and destructive. [7]. Abdulmalik, Y. (2021). An improved SQL injection attack
vulnerabilities in e-platform. To identify those attacks detection model using machine learning techniques. International
(SQLiA) through various pen tools as well need to be Journal of Innovative Computing, 11(1), 53-57.
adopt some sort of intellectual models in web [8]. Hassan, M. M., Ahmad, R. B., & Ghosh, T. (2021). SQL injection
vulnerability detection using deep learning: a feature-based
environment. As per this Analysis, we have gathered
approach. Indonesian Journal of Electrical Engineering and
related information on detection models of SQLiA with a Informatics (IJEEI), 9(3), 702-718.
range of performance metrics such as Accuracy, [9]. Gandhi, N., Patel, J., Sisodiya, R., Doshi, N., & Mishra, S. (2021,
Precision, Recall, F1 score and so on. After analyzing all March). A CNN-BiLSTM based approach for detection of SQL
the works, we concluded that DL models have given best injection attacks. In 2021 International conference on
performance to detect SQLiA with higher accuracy than computational intelligence and knowledge economy (ICCIKE)
ML. Namely, MLP, CNN, LSTM are perfectly apt for (pp. 378-383). IEEE.
[10]. Zhang, K. (2019, November). A machine learning based approach
detect SQLiA in web applications. Mainly, out of these to identify SQL injection vulnerabilities. In 2019 34th IEEE/ACM
three models MLP is well suitable for identifying SQLiA International Conference on Automated Software Engineering
with high accuracy of 96% to 98.79%. In future, we are (ASE) (pp. 1286-1288). IEEE.
going to enhance existing model and try to provide a new [11]. Hasan, M., Balbahaith, Z., & Tarique, M. (2019, November).
approach that will perform well in all aspects of Detection of SQL injection attacks: a machine learning approach.
performance metrics. Also, we are planning to study more In 2019 International Conference on Electrical and Computing
Technologies and Applications (ICECTA) (pp. 1-6). IEEE.
and more related articles of ML/DL based SQLiA [12]. Li, Q., Li, W., Wang, J., & Cheng, M. (2019). A SQL injection
detection models and will provide the best suitable model. detection method based on adaptive deep forest. IEEE Access, 7,
145385-145394.
REFERENCES: [13]. Li, Q., Wang, F., Wang, J., & Li, W. (2019). LSTM-based SQL
injection detection method for intelligent transportation system.
[1]. Sun, H., Du, Y., & Li, Q. (2023). Deep Learning-Based Detection IEEE Transactions on Vehicular Technology, 68(5), 4182-4191.
Technology for SQL Injection Research and Implementation. [14]. Ross, K., Moh, M., Moh, T. S., & Yao, J. (2018, March). Multi-
Applied Sciences, 13(16), 9466. source data analysis and evaluation of machine learning
[2]. Muhammad, T., & Ghafory, H. (2022). SQL Injection Attack techniques for SQL injection detection. In Proceedings of the
Detection Using Machine Learning Algorithm. Mesopotamian ACMSE 2018 Conference (pp. 1-8).
journal of cybersecurity, 2022, 5-17. [15]. Ross, K. (2018). SQL injection detection using machine learning
[3]. Mondal, B., Banerjee, A., & Gupta, S. (2022). A review of SQLI techniques and multiple data sources.
detection strategies using machine learning. machine learning, [16]. Dalai, A. K., & Jena, S. K. (2017). Neutralizing SQL injection
6(S2), 9664-9677. attack using server-side code modification in web applications.
[4]. Sheng, J. (2022). Research on SQL Injection Attack and Defense Security and Communication Networks, 2017.
Technology of Power Dispatching Data Network: Based on Data [17]. Brindavathi, B., Karrothu, A., & Anilkumar, C. (2023, August).
Mining. Mobile Information Systems, 2022. An Analysis of AI-based SQL Injection (SQLi) Attack Detection.
[5]. Krishnan, S. A., Sabu, A. N., Sajan, P. P., & Sreedeep, A. L. In 2023 Second International Conference on Augmented
(2021). SQL injection detection using machine learning. vol, 11, Intelligence and Sustainable Systems (ICAISS) (pp. 31-35). IEEE.
11.
Authorized licensed use limited to: Don Bosco Institute of Technology-Bengaluru. Downloaded on March 17,2025 at 07:09:11 UTC from IEEE Xplore. Restrictions apply.