0% found this document useful (0 votes)

16 views10 pages

Analyzing SQL Payloads Using Logistic Regression I

The document discusses using logistic regression and Spark ML to build a model that classifies SQL payloads as either malicious or benign in a big data environment. It proposes a three-layer framework with a middle protection layer that would receive payloads from users and analyze them to detect SQL injection attacks faster and with higher accuracy compared to previous methods.

Uploaded by

majeedster

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views10 pages

Analyzing SQL Payloads Using Logistic Regression I

Uploaded by

majeedster

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Journal of Intelligent Systems 2023; 32: 20230063

Research Article

Omar Salah F. Shareef*, Rehab Flaih Hasan, and Ammar Hatem Farhan

Analyzing SQL payloads using logistic

regression in a big data environment
https://doi.org/10.1515/jisys-2023-0063
received May 15, 2023; accepted July 28, 2023

Abstract: Protecting big data from attacks on large organizations is essential because of how vital such data
are to organizations and individuals. Moreover, such data can be put at risk when attackers gain unauthorized
access to information and use it in illegal ways. One of the most common such attacks is the structured query
language injection attack (SQLIA). This attack is a vulnerability attack that allows attackers to illegally access a
database quickly and easily by manipulating structured query language (SQL) queries, especially when dealing with
a big data environment. To address these risks, this study aims to build an approach that acts as a middle protection
layer between the client and database server layers and reduces the time consumed to classify the SQL payload sent
from the user layer. The proposed method involves training a model by using a machine learning (ML) technique
for logistic regression with the Spark ML library that handles big data. An experiment was conducted using the
SQLI dataset. Results show that the proposed approach achieved an accuracy of 99.04, a precision of 98.87, a recall
of 99.89, and an F-score of 99.04. The time taken to identify and prevent SQLIA is 0.05 s. Our approach can protect
the data by using the middle layer. Moreover, using the Spark ML library with ML algorithms gives better accuracy
and shortens the time required to determine the type of request sent from the user layer.

Keywords: big data, logistic regression, spark ML, SQL injection.

1 Introduction
Security has become a crucial component when developing web apps because of the massive amount of data
sent between businesses and the rising number of everyday users in various areas. Therefore, enterprises’ big
data require a web application architecture that can detect and stop application flaws. The Open Web
Application Security Project considers structured query language injection attack (SQLIA) among the most
dangerous threats to enterprise-scale databases [1,2].
The big data discipline uses a multi-scientific approach to analyzing and forecasting data, combining
computer science, mathematical modeling, and statistics. Access to data and methods for working with it
have emerged as critical factors. Companies may reliably manage big data by using and implementing
artificial intelligence and machine learning (ML) techniques [3].
A growing number of security risks are associated with the widespread use and storage of data online.
These risks arise from the proliferation of attacks that try to gain unauthorized access to the private informa-
tion of people and organizations [4].
SQLIA is among the most harmful assaults on database servers. By taking advantage of security holes, attackers
may compromise users’ and businesses’ data by tampering with, reading, erasing, or making copies of it [5].


* Corresponding author: Omar Salah F. Shareef, Computer Center, University of Fallujah Anbar, Fallujah 55621, Iraq,
e-mail: [email protected]
Rehab Flaih Hasan: Computer Sciences Department, University of Technology Baghdad, Baghdad 19006, Iraq,
e-mail: [email protected]
Ammar Hatem Farhan: Computer Center, University of Fallujah Anbar, Fallujah 55621, Iraq, e-mail: [email protected]

Open Access. © 2023 the author(s), published by De Gruyter. This work is licensed under the Creative Commons Attribution 4.0
International License.
2  Omar Salah F. Shareef et al.

Structured query language (SQL) injection flaws exist in every parameter a program uses to send an attack
to a database. An attacker may use various techniques in this kind of attack to gain unauthorized entry to
databases and extract information. The injection mechanism describes these procedures. The techniques used
are primarily divided into four categories: injection through cookies, injection through user input, injection
through server variables, and second-order or stored injections [6].
The increasing data exchange between individuals and institutions and daily transactions in various fields
have made data vulnerable to many attacks, such as illegal access. One of the most well-known attacks is
SQLIA. Standard methods for detecting and preventing these attacks can provide good results when dealing
with small data. However, these approaches do not work effectively with big data. Hence, another approach
must be developed to deal with big data and detect attacks against them.
This study was conducted to overcome the problems in previous works, which often did not mention the
time taken during the testing phase to detect the type of request sent by the user and whether it contains
harmful or benign payloads. In addition, the data protection method used when the protection model is
alongside the user layer or the data layer was not addressed. Accordingly, the aims of this study are as follows:
• To create a layer that separates the user layer from the data layer to increase data protection and prevent
unauthorized access.
• To protect user and institutional data, ensuring confidentiality, integrity, and prompt availability of data.
• To reduce the time required to classify the payloads sent to the data layer.

In this research, we presented an approach for detecting the real-time SQLIA by applying a logistic
regression (LR) approach in a big data environment using a distributed Spark ML system that acts as a middle
protection layer between the client and the database server to increase data protection and the classiﬁcation
accuracy of the sent payload. The contributions of this model are as follows:
• The ﬁrst contribution of this model is that it proposes an approach that uses a middle layer between the data
layer and the user layer to receive SQL payloads from the user layer and analyze them to classify whether
the request is harmful or benign by using the LR approach with the big data framework Spark ML library.
This layer prevents users from directly accessing data, thereby further protecting the data layer from
unauthorized access and from violations of the principles of basic information security.
• The second contribution is that the time taken to classify the request type is reduced by using the Spark ML
library because Spark ML works in the memory in a distributed way, thereby reducing the time taken to
classify the payload type.

The subsequent sections are organized as follows. The second section of this study will address the
proposed methodology for identifying and mitigating SQLIA within a big data setting. The third section
presents the outcomes. The ﬁnal section provides the conclusions.

2 Methodology
The proposed framework for detecting SQL injection attacks in a big data environment consists of three layers.
The first layer is the user layer, through which user requests are sent. The second layer represents the
protection layer, which includes the proposed framework for classifying requests sent from the first layer.
This layer consists of several stages, as illustrated in the following.
Stage 1: Data that contain both malicious and benign payloads are collected to train the proposed model.
Stage 2: Pre-processing is applied to the acquired data.
Stage 3: The acquired data are divided into two sets for training and testing.
Stage 4: The first dataset is used to train the proposed model using LR.
Stage 5: The second dataset is used to test the model.
Stage 6: The model is evaluated using a confusion matrix and a set of metrics to measure the performance
efficiency.
Analyzing SQL payloads using logistic regression  3

In the second layer, the LR approach determines whether incoming order loads are harmful or not. The
third layer represents the data layer that needs to be protected from attacks by unauthorized individuals who
try to access this data.
The LR approach contains three variables. The ﬁrst variable is “features,” which represents the condition
feature. The second variable is “label_col,” which represents the decision feature. The third variable

Figure 1: Proposed system to classify submitted queries.

4  Omar Salah F. Shareef et al.

represents the maxiteration, which is used as a time criterion to stop training the model. The ﬂow of the
experiment is shown in Figure 1.
The proposed approach is described in the following subsections.

2.1 SQL-i datasets

Collecting data pertaining to the research subject matter is essential in developing an ML approach. This study
used a dataset comprising 109,518 instances categorized into two groups on the basis of their payloads: those
with malicious and non-malicious intentions. The object in question is divided into two parts.
The following table provides an overview of the dataset (Table 1).

Table 1: Summary of the dataset

Name of dataset Number of cases Learning step Testing step Normal Malicious

SQLIA 109,518 76,670 32,848 52,213 57,305

The dataset used in this research is described in the Figure 2.

Figure 2: Dataset before pre-processing.

The data were collected from the Kaggle website [7]. The challenges initially contained the dataset con-
taining 109,518 samples that contained harmful and benign payloads, but the data were inaccurate because a
ﬁltering process was performed to delete entries that contained data. The data pre-processing process cannot
be applied to convert the initial dataset into a dataset that can be applied by ML algorithms.

2.2 Data pre-processing

In the second stage, the dataset is pre-processed and prepared to be used by the learning techniques. Data
preparation aims to reduce data volume; create connections between datasets; standardize data; and eliminate
outliers, duplicates, and missing values [8].
CountVectorizer is a tool that is used to pre-process the dataset, where textual data are transformed into
numerical vectors. For instance, the terms in articles could reference the characteristics of a speciﬁc class, and
Analyzing SQL payloads using logistic regression  5

a single vector could furnish all the phrases. This process is known as vectorization. Common text routing
technologies include CountVectorizer and TF-IDFVectorizer. These vectors convert textual data into vector
format [9].
CountVectorizer is frequently used to derive numerical properties from texts and generate class features.
In the instruction text, only frequently occurring words are taken into account. Using matrix ﬁt transform,
CountVectorizer converts the text into a word occurrence matrix, enabling users to calculate the frequency of
each word [10].

Algorithm 1: CountVectorizer for data pre-processing

Input: Dataset prior to initial processing

Output: array of word
Begin:
Stage 1: Transform text into a collection of words by using CountVectorizer.
Stage 2: Eliminate frequently used terms.
Stage 3: Remove the least frequently used terms.
Stage 4: Eliminate all end phrases.
Stage 5: Convert every word into lowercase letters.
Stage 6: Arrange the vocabulary in ascending order.
If the term is present, then it is indicated by a 1 in the text; if it is absent, then it is indicated by a 0.
Stage 7: Repeat stages 1–6 to convert the text dataset into numbers.
End

The example below shows the process of converting text into numbers using an algorithm to convert text
into an array of words.
Sample 1 “Convert text to an array using CountVectorizer using text datasets”
Sample 2 “Convert text to an array using” (Table 2)

Table 2: Dataset after pre-processing

Sample Array Convert CountVectorizer Datasets Text Using

Sample 1 1 1 1 1 1 1
Sample 2 1 1 0 0 1 1

2.3 Training and testing

The third stage in developing an ML approach is to divide the data into two distinct categories: the training
group and the testing group. The holdout method was used in this investigation, with 80% of the dataset used
for training and 20% for testing and evaluation [11]. The dataset used is a balanced set, containing 45,051
benign payloads and 40,923 malicious payloads out of the total dataset of 85,974.

2.4 Prediction approach

The fourth stage in developing an ML approach is selecting a classiﬁcation approach for SQL requests sent to
web-based databases. This study uses a supervised ML approach. This method categorizes requests into two
categories (0 and 1), which represent harmless and harmful requests. In addition, this technique aims to create
a classiﬁcation that accurately describes the relationship between dependent and independent variables.
6  Omar Salah F. Shareef et al.

The eﬀectiveness of the LR method is determined by the linear regression strategy in the following
equation:
j = h 0(i ) = θT i. (1)

The use of equation (1) may need to be more eﬃcient when dealing with binary numbers. By using
equation (2), we may determine whether the communicated request will have a harmful payload (probability
1) or a harmless payload (probability 0) [12].
1
p ( j = 1 | i ) = h θ (i ) = = σ (θT i )
1 + exp( − θT i )
p( j = 0|i ) = 1 − p( j = 1|i ) = 1 − h θ (i ) . (2)

Equation (3), sometimes referred to as the sigmoid function, allows us to keep the value of θT i within the
range [0, 1]. Then, we look for a number such that p( j = 1 | i ) = h θ (i ), i.e., p( j = 0 | i ), is large when i belongs to
the “0” class and small when i belongs to the “1” class [13–15].
1
σ (t ) = . (3)
(1 + e − t )

The LR regression algorithm was chosen to train and test the model. This model was chosen because of its
highly accurate results and the short time it takes to classify benign and harmful loads.
The variable that represents maxiteration was used as a time criterion to stop training the model, where
maxiteration = 100 was chosen, which gave the best accuracy and the shortest time.
Two variables were used to build the model, where the ﬁrst variable “features” represents the condition feature,
which contains both harmful and benign payloads. We pre-process these features using CountVectorization to extract
the desired features after removing the least and most frequent words that do not aﬀect the model training results.
The “Data pre-processing” subsection in the Methodology provided an example of how to capture the desired features.
The features obtained from the pre-processing results will be used as features in model training.
The second variable “label_col” represents the decision feature.

2.5 Performance evaluation measures of prediction approach

During the ﬁnal stage of developing the prediction approach, various metrics such as accuracy, time, precision,
and recall were used to evaluate the approach and determine the outcomes.
A confusion matrix with a variety of values was used to calculate these measurements. Table 3 shows the
widespread use of the confusion matrix, which consists of four classes, namely, false positive (FP), false
negative (FN), true negative (TN), and true positive (TP).

Table 3: Confusion matrix

Predicted class

Class X Class Y

True class Class X TN FP

Class Y FN TP

TP: This term is used to describe malicious payloads that the model has correctly predicted.
FN: This term relates to instances in which the prediction approach categorized a benign case as harmful.
FP: This term relates to instances where the prediction approach categorizes harmful conditions as benign.
TN: This term refers to instances that were identiﬁed as benign by the prediction approach and are, in fact, benign [15–17].
Analyzing SQL payloads using logistic regression  7

The following equations represent the metrics used to assess the approach and determine its performance
eﬃcacy.
Accuracy: It represents the total number of accurate predictions, both TP and TN. It is mathematically
expressed as follows:
Count of accurately categorized observations (TP + TN)
Accuracy = × 100. (4)
Total number of instances (TP + TN + FP + FN)
Precision: It displays the proportion of TP to the sum of TP and FP. It is mathematically expressed as follows:
No. of true positives (TP)
Precision = × 100. (5)
No. of true positive + false positive (TP + FP)
Recall: It displays the ratio of TP to the total TP and FN. It is mathematically expressed as follows:
No. of true positives (TP)
Recall = × 100. (6)
No. of true positive + false negative (TP + FN)

F1-score: This is the proportional mean of precision and recall. It is mathematically expressed as follows [18]:
Precision × Recall
F 1-score = 2 × × 100. (7)
Precision + Recall

3 Results and discussions

This section presents the results of using the LR approach when dealing with a big data environment, which
can be used to determine whether the payload sent by the user contains malicious or benign payloads.
Building an approach using ML or any other system requires providing a set of basic hardware and
software requirements. Tables 4 and 5 describe the basic requirements used in this study.

Table 4: Software requirement

Software requirement

System type 64-bit operating system, x64-based processor

Programming language Python programming languages (Spyder [Anaconda3])

Table 5: Hardware requirement

Hardware requirement

Processor Intel(R) Core (TM) i7-5500U CPU @ 2.40 GHz

Installed RAM 8 GB
Hard disk 500 GB
GPU AMD Radeon Graphics Processor HD (8500 M)

However, the results were obtained by using two experiments for training and testing the model. The purpose
is to achieve the best classiﬁcation accuracy and the shortest time for classifying the type of loads.
8  Omar Salah F. Shareef et al.

3.1 First experiment

The ﬁrst experiment was conducted using a dataset containing 85,974 malicious and benign payloads divided
into two sections. The ﬁrst section consists of 45,051 payloads representing benign loads, and the second
section consists of 40,923 payloads representing malicious loads. As for the data division method, the holdout
method was used, where 70% of the dataset was chosen for training, and the remaining portion was used for
testing and evaluation.

3.2 Second experiment

The second experiment was conducted using a dataset containing 85,974 malicious and benign payloads
divided into two sections. The ﬁrst section, which represents the benign payloads, consists of 45,051 samples,
while the second section, representing the malicious payloads, consists of 40,923 samples. As for the data
division method, the holdout method was used, where 80% of the dataset was chosen for training, and the
remaining portion was used for testing and evaluation.
Table 6 shows that the accuracy of the LR approach reached 99.04.

Table 6: Result of ﬁrst experiment

Seq Name of parameter Value

1. Time complexity 0.10 s

2. Accuracy 98.025
3. Precision 98.055
4. Recall 98.025
5. F-score 98.02
6. Training dataset 59,938
7. Test dataset 26,036

The results of the second experiment were chosen because they provided better accuracy and a shorter
testing time.
The LR approach accurately classiﬁed the process of sending SQL queries to databases used by web
applications. The value of TP and TN, which is 99.04%, indicates that malicious and benign payloads may
be discriminated with high accuracy. The detection of the query type took 0.05 s only (Table 7).

Table 7: Result of second experiment

Seq Name of parameter Value

1. Time complexity 0.05 s

2. Accuracy 99.04
3. Precision 98.18
4. Recall 99.89
5. F-score 99.04
6. Training dataset 68,604
7. Test dataset 17,370

The following table shows the results of the comparison between previous studies and this study (Table 8).
Standard methods for detecting and preventing these attacks can obtain optimal results when dealing
with small data. However, these methods are not optimal when used for big data. The signiﬁcant feature of the
proposed approach when implementing Spark ML and the Spark framework is that it can process large-scale
Analyzing SQL payloads using logistic regression  9

Table 8: Result of comparison between previous studies and this study

Ref Model Accuracy Time complexity Dataset size

[19] SVM 98.6 Non 181,303

[20] Neural network of direct signal propagation 95 Non 30,233
[21] Long short-term memory (LSTM) 95.2 37.1494 s 42,212
[22] Support vector machine 94.92 3.98 s 20,474
Gradient boosting 94.27
Naive Bayes classiﬁer 70.79
REGEX classiﬁer 97.48
[23] Naive Bayes 95 Non sec
LR 92
CNN 97
SVM 79
Passive aggressive 79
[24] CNN-BiLSTM 98 45 s 4,200
Proposed model 99.04 0.05 85,974

datasets efficiently and reliably. Scalability is achieved by distributing processing tasks, thus enabling the
handling of larger, more complex datasets. Achieving high performance requires using memory resources and
executing operations simultaneously. However, the limitation of this work is that the proposed approach has
difficulty dealing with large datasets when applying ML models because they require higher computational
power. In addition, some of the datasets used contain instances that cannot be processed and handled by ML
algorithms because the dataset must be filtered before it can be used by the proposed approach.

4 Conclusion
This work presented a method for detecting SQL attacks using the LR approach in a big data environment. The
dataset contained malicious and benign SQL payloads. The proposed approach then classified user queries as
containing either malicious or benign payloads. Several experiments were conducted, and the performances
were compared. The proposed method achieved the highest accuracy and the shortest running time when
handling large datasets in every experiment.
One of the main contributions of this work is that the proposed method prevents users from directly accessing
the data, and it maintains the data’s confidentiality, integrity, and availability. This protection is achieved by
creating a separation layer, which applies an approach trained on a large dataset for classifying new payloads
sent by the user, thus providing additional protection for the data layer before the request is sent by the user layer.
The second contribution is that the time required to classify the query type submitted by the user is reduced by
using the Spark ML library. Spark ML works in the memory in a distributed manner, thereby reducing the time
required to classify the payload type. Reducing the time to classify the type of request is essential when dealing with
big data because it enables timely access to the data and ensures that the data are available to users and
organizations. This work provides high accuracy and takes a short time to classify requests, thereby achieving
high data protection and maintaining the confidentiality, integrity, and availability of data. However, the proposed
approach can classify the SQL-type attack only. Future work will involve building a model that classifies more than
one type of attack such as cross-site scripting attacks or DDOS attacks using the LSTM algorithm.

Author contributions: Omar Salah F. Shareef conceived of the presented idea. Rehab Flaih Hasan and Ammar
Hatem Farhan designed and performed the experiments, derived the models, and analyzed the data. Omar
Salah F. Shareef supervised the project. Ammar Hatem Farhan wrote the manuscript in consultations Omar
Salah F. Shareef and Rehab Flaih Hasan. All authors discussed the results and contributed to the ﬁnal
manuscript.
10  Omar Salah F. Shareef et al.

Conﬂict of interest: Authors state no conﬂict of interest.

Data availability statement: The data that support the ﬁndings of this study are openly available on [Kaggle
website] at https://www.kaggle.com/datasets/gambleryu/biggest-sql-injection-dataset?resource=download.,
reference number [7].

References
[1] Farhan AH, Hasan RF. Detection SQL injection attacks against web application by using K-nearest neighbors with principal
component analysis. In: Proceedings of Data Analytics and Management: ICDAM 2022. Springer; 2023. p. 631–42.
[2] Durai KN, Subha R, Haldorai A. A novel method to detect and prevent SQLIA using ontology to cloud web security. Wirel Pers
Commun. 2021;117(4):2995–3014. doi: 10.1007/s11277-020-07243-z.
[3] Haldorai A, Devi S, Joan R, Arulmurugan L. Big data in intelligent information systems. Mob Netw Appl. 2022;October
2021;27:997–9. doi: 10.1007/s11036-021-01863-w.
[4] Awan MJ, Farooq U, Babar HM, Yasin A, Nobanee H, Hussain M, et al. Real-time ddos attack detection system using big data
approach. Sustain. 2021;13(19):1–19. doi: 10.3390/su131910743.
[5] Alghawazi M, Alghazzawi D, Alarifi S. Detection of SQL injection attack using machine learning techniques: A systematic literature
review. J Cybersecur Priv. 2022;2(4):764–77. doi: 10.3390/jcp2040039.
[6] Crespo-Martínez IS, Campazas-Vega A, Guerrero-Higueras ÁM, Riego-DelCastillo V, Álvarez-Aparicio C, Fernández-Llamas C. SQL
injection attack detection in network flow data. Comput Secur. 2023;127:103093. doi: 10.1016/j.cose.2023.103093.
[7] https://www.kaggle.com/datasets/gambleryu/biggest-sql-injection-dataset? resource = download.
[8] Alasadi SA, Bhaya WS. Review of data preprocessing techniques in data mining. J Eng Appl Sci. 2017;12(16):4102–7.
[9] El Rifai H, Al Qadi L, Elnagar A. Arabic text classification: the need for multi-labeling systems. Neural Comput App.
2022;34(2):1135–59. doi: 10.1007/s00521-021-06390-z.
[10] Yang JS, Zhao CY, Yu HT, Chen HY. Use GBDT to predict the stock market. Procedia Comput Sci. 2020;174(2019):161–71. doi: 10.1016/j.
procs.2020.06.071.
[11] Rafało M. Cross validation methods: Analysis based on diagnostics of thyroid cancer metastasis. ICT Express. 2022;8(2):183–8.
doi: 10.1016/j.icte.2021.05.001.
[12] Arif ZH, Cengiz K. Severity Classification for COVID-19 Infections based on Lasso-Logistic Regression Model. Int J Mathematics,
Statistics, Computer Sci. 2023;1:25–32. doi: 10.59543/ijmscs.v1i.7715.
[13] Yassine S, Stanulov A. A comparative analysis of machine learning algorithms for the purpose of predicting Norwegian air
passenger traffic. Int J Mathematics, Statistics, Computer Sci. 2023;2:28–43. doi: 10.59543/ijmscs.v2i.7851.
[14] Zhu C, Idemudia CU, Feng W. Improved logistic regression model for diabetes prediction by integrating PCA and K-means
techniques. Inform Med Unlocked. 2019;17:100179. doi: 10.1016/j.imu.2019.100179.
[15] Shah K, Patel H, Sanghvi D, Shah M. A comparative analysis of logistic regression, random forest and KNN models for the text
classification. Augmented Hum Res. 2020;5(1):1–16. doi: 10.1007/s41133-020-00032-0.
[16] Shaukat K, Luo S, Varadharajan V, Hameed IA, Xu M. A survey on machine learning techniques for cyber security in the last decade.
IEEE Access. 2020;8:222310–54. doi: 10.1109/ACCESS.2020.3041951.
[17] Abuhaiba ISI, Dawoud HM. Combining different approaches to improve Arabic text documents classification. Int J Intell Syst Appl.
2017;9(4):39–52. doi: 10.5815/ijisa.2017.04.05.
[18] Alarfaj FK, Khan NA. Enhancing the performance of SQL injection attack detection through probabilistic neural networks. Appl Sci.
2023 Mar 29;13(7):4365.
[19] Uwagbole SO, Buchanan WJ, Fan L. Applied machine learning predictive analytics to SQL injection attack detection and prevention.
Proc. IM 2017 - 2017 IFIP/IEEE Int. Symp. Integr. Netw. Serv. Manag; 2017. p. 1087–90. doi: 10.23919/INM.2017.7987433.
[20] Hubskyi O, Babenko T, Myrutenko L, Oksiiuk O. Detection of SQL injection attack using neural networks. Advances in Intelligent
Systems and Computing. Vol. 1265 AISC. 2021. p. 277–86. doi: 10.1007/978-3-030-58124-4_27.
[21] Tang P, Qiu W, Huang Z, Lian H, Liu G. Detection of SQL injection based on artificial neural network. Knowl Syst. 2020;190:105528.
doi: 10.1016/j.knosys.2020.105528.
[22] Kranthikumar B, Velusamy RL. SQL injection detection using REGEX classifier. J Xi’an Univ Archit Technol. 2020;7(6):800–9.
[23] Joshi A, Geetha V. SQL Injection detection using machine learning. In: 2014 International Conference on Control, Instrumentation,
Communication and Computational Technologies, ICCICCT 2014; 2014. p. 1111–5. doi: 10.1109/ICCICCT.2014.6993127.
[24] Aggarwal P, Kumar A, Michael K, Nemade J, Sharma S. Random decision forest approach for mitigating SQL injection attacks.
In: 2021 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT). 2021. p. 1–5.

Cisa Ebook PPT New 538522584-Cisa-eBook-new
100% (3)
Cisa Ebook PPT New 538522584-Cisa-eBook-new
887 pages
Minor
No ratings yet
Minor
29 pages
Techreport
No ratings yet
Techreport
28 pages
A MACHINE LEARNING APPROACH TO PREVENTING SQL INJECTION ATTACK ON CRITICAL INFORMATION INFRASTRUCTURE
No ratings yet
A MACHINE LEARNING APPROACH TO PREVENTING SQL INJECTION ATTACK ON CRITICAL INFORMATION INFRASTRUCTURE
47 pages
Final Report
No ratings yet
Final Report
48 pages
IOMP Mini Project- National Confernce PPT[1]
No ratings yet
IOMP Mini Project- National Confernce PPT[1]
31 pages
SQL Injection Detection Using Machine Learning Techniques and Mul
No ratings yet
SQL Injection Detection Using Machine Learning Techniques and Mul
28 pages
SQLPrevent Effective Dynamic Detection A PDF
No ratings yet
SQLPrevent Effective Dynamic Detection A PDF
25 pages
SQL Injection Detection Using Machine Learning
No ratings yet
SQL Injection Detection Using Machine Learning
51 pages
Assignment 1 - Nguyen Van Huy Quang - 105027350
No ratings yet
Assignment 1 - Nguyen Van Huy Quang - 105027350
22 pages
Sqligot: Detecting SQL Injection Attacks Using Graph of Tokens and SVM
No ratings yet
Sqligot: Detecting SQL Injection Attacks Using Graph of Tokens and SVM
42 pages
Information Security Analysis and Audit CSE3501: Slot: G1+TG1
No ratings yet
Information Security Analysis and Audit CSE3501: Slot: G1+TG1
31 pages
Final Year Project Presentation (P-1) Format
No ratings yet
Final Year Project Presentation (P-1) Format
22 pages
Detection of SQL Injection Attack Using Machine Le
No ratings yet
Detection of SQL Injection Attack Using Machine Le
11 pages
Assignemnt 1 - 103802759-1
No ratings yet
Assignemnt 1 - 103802759-1
16 pages
Final Project Synopsis
No ratings yet
Final Project Synopsis
26 pages
Literature Review On SQL Injection
100% (2)
Literature Review On SQL Injection
8 pages
Enhancing SQL Injections
No ratings yet
Enhancing SQL Injections
13 pages
B1e0 PDF
No ratings yet
B1e0 PDF
13 pages
SQL-CB-GuArd: a deep learning mechanism for structured query language injection attack detection
No ratings yet
SQL-CB-GuArd: a deep learning mechanism for structured query language injection attack detection
13 pages
Viraj
No ratings yet
Viraj
24 pages
SQL Injection Attack Detection and Preve PDF
No ratings yet
SQL Injection Attack Detection and Preve PDF
12 pages
Static_Analysis_Framework_For_Detecting_SQL_Injection_Vulnerabilities
No ratings yet
Static_Analysis_Framework_For_Detecting_SQL_Injection_Vulnerabilities
9 pages
article_6152_5d639f30fc3a4c3f9e049454d1f8f9c4
No ratings yet
article_6152_5d639f30fc3a4c3f9e049454d1f8f9c4
10 pages
Detection of Structured Query Language Injection Attacks Using Machine Learning Techniques
No ratings yet
Detection of Structured Query Language Injection Attacks Using Machine Learning Techniques
14 pages
Intrusion Detection System
No ratings yet
Intrusion Detection System
28 pages
Intelligent Web Security: Machine Learning-Based SQL Injection Detection and Honeypot Integration
No ratings yet
Intelligent Web Security: Machine Learning-Based SQL Injection Detection and Honeypot Integration
7 pages
PDF
No ratings yet
PDF
11 pages
Sat - 94.Pdf - Detection of SQL Injection Attack Usiing Adaptive Deep Forest
No ratings yet
Sat - 94.Pdf - Detection of SQL Injection Attack Usiing Adaptive Deep Forest
11 pages
A STUDY OF MACHINE LEARNING-BASED APPROACHES FOR SQL INJECTION DETECTION AND PREVENTION
No ratings yet
A STUDY OF MACHINE LEARNING-BASED APPROACHES FOR SQL INJECTION DETECTION AND PREVENTION
10 pages
A Review of SQL Injection Attack
No ratings yet
A Review of SQL Injection Attack
16 pages
Classification of SQL Injection Detection and Prevention Measure
No ratings yet
Classification of SQL Injection Detection and Prevention Measure
13 pages
An Analysis of AI-based SQL Injection SQLi Attack Detection (1)
No ratings yet
An Analysis of AI-based SQL Injection SQLi Attack Detection (1)
5 pages
Comprehensive Review of Advanced Techniques for Mitigating SQL Injection Vulnerabilities in Modern Applications
No ratings yet
Comprehensive Review of Advanced Techniques for Mitigating SQL Injection Vulnerabilities in Modern Applications
8 pages
Pondicherry University: Project Phase - 1
No ratings yet
Pondicherry University: Project Phase - 1
12 pages
Chen 2021 J. Phys. Conf. Ser. 1757 012055
No ratings yet
Chen 2021 J. Phys. Conf. Ser. 1757 012055
8 pages
RamificationAnalysisOfSQL InjectionD
No ratings yet
RamificationAnalysisOfSQL InjectionD
7 pages
A_Study_on_SQL_Injection_Detection_AI-based_Perspective
No ratings yet
A_Study_on_SQL_Injection_Detection_AI-based_Perspective
4 pages
A Novel System For Detecting and Preventing SQL Injection and Cross-Site-Script
No ratings yet
A Novel System For Detecting and Preventing SQL Injection and Cross-Site-Script
6 pages
Data Science Course
100% (1)
Data Science Course
51 pages
SSRN Id3141112
No ratings yet
SSRN Id3141112
6 pages
Detection and Prevention of SQL Injection Attacks Using Hybrid Approach
No ratings yet
Detection and Prevention of SQL Injection Attacks Using Hybrid Approach
11 pages
Study On SQL Injection Attacks: Mode, Detection and Prevention
No ratings yet
Study On SQL Injection Attacks: Mode, Detection and Prevention
7 pages
Intrusion Detection Framework For SQL in
No ratings yet
Intrusion Detection Framework For SQL in
7 pages
Evolution and Impact of SQL Injection Attacks in India: Analysis, Prevention Mechanisms, and Future Directions
No ratings yet
Evolution and Impact of SQL Injection Attacks in India: Analysis, Prevention Mechanisms, and Future Directions
5 pages
SQL Injection Research Paper
No ratings yet
SQL Injection Research Paper
5 pages
A Method of Detecting SQL Injection Attack To Secure Web Applications
No ratings yet
A Method of Detecting SQL Injection Attack To Secure Web Applications
9 pages
Implementation of Pattern Matching Algorithm To Defend SQLIA
No ratings yet
Implementation of Pattern Matching Algorithm To Defend SQLIA
7 pages
SQL Injection Attack Detection by Machine Learning Classifier
No ratings yet
SQL Injection Attack Detection by Machine Learning Classifier
7 pages
Term Paper On SQL Injection
No ratings yet
Term Paper On SQL Injection
6 pages
Final Year Project
No ratings yet
Final Year Project
18 pages
Attacking Modern Environments With MSSQL Server SPs
No ratings yet
Attacking Modern Environments With MSSQL Server SPs
67 pages
A System For The Prevention of SQL Injection Attacks: Spring 2023
No ratings yet
A System For The Prevention of SQL Injection Attacks: Spring 2023
6 pages
SR 11-7, Validation and Machine Learning Models
100% (1)
SR 11-7, Validation and Machine Learning Models
31 pages
A Review On SQL Injection Prevention Technique: Navu - Verma@yahoo - in
No ratings yet
A Review On SQL Injection Prevention Technique: Navu - Verma@yahoo - in
6 pages
SQL Injection Detection and Correction Using Machine
No ratings yet
SQL Injection Detection and Correction Using Machine
8 pages
Irjet Study On SQL Injection Techniques
No ratings yet
Irjet Study On SQL Injection Techniques
5 pages
Detection and Prevention of SQL Injectio PDF
No ratings yet
Detection and Prevention of SQL Injectio PDF
5 pages
SQL Injection Detection and Prevention Techniques: University Technology Malaysia
No ratings yet
SQL Injection Detection and Prevention Techniques: University Technology Malaysia
8 pages
IJSRDV6I10368
No ratings yet
IJSRDV6I10368
2 pages
Case Study On SQL Injection
No ratings yet
Case Study On SQL Injection
5 pages
Using Positive Tainting and Syntax-Aware Evaluation To Counter SQL Injection Attacks
No ratings yet
Using Positive Tainting and Syntax-Aware Evaluation To Counter SQL Injection Attacks
11 pages
Dr. Nirav Vyas Numerical Method 4 PDF
100% (1)
Dr. Nirav Vyas Numerical Method 4 PDF
156 pages
Analysis and Design of Algorithms 1st edition by Amrinder Arora ISBN 1634870212 9781634870214 - The ebook with all chapters is available with just one click
100% (9)
Analysis and Design of Algorithms 1st edition by Amrinder Arora ISBN 1634870212 9781634870214 - The ebook with all chapters is available with just one click
90 pages
AI algorithm - detect glasses
No ratings yet
AI algorithm - detect glasses
41 pages
CH 3 Random Variables and Probability Distributions
No ratings yet
CH 3 Random Variables and Probability Distributions
32 pages
How To Use Ettercap
No ratings yet
How To Use Ettercap
18 pages
Linear Regression With Multiple Variables
100% (1)
Linear Regression With Multiple Variables
38 pages
Yao's Minimax Principle: Game Tree Evaluation
No ratings yet
Yao's Minimax Principle: Game Tree Evaluation
30 pages
A Brief Introduction To Markov Chains - Towards Data Science
0% (1)
A Brief Introduction To Markov Chains - Towards Data Science
26 pages
Lec 34
No ratings yet
Lec 34
16 pages
An Improved Brain Emotional Learning Algorithm For Accurate and Efficient Data Analysis
No ratings yet
An Improved Brain Emotional Learning Algorithm For Accurate and Efficient Data Analysis
15 pages
DSUC
No ratings yet
DSUC
13 pages
UNIT 3
No ratings yet
UNIT 3
12 pages
PREC
No ratings yet
PREC
14 pages
Web Deploy From Visual Studio 2012 To A Remote IIS 8 Server
No ratings yet
Web Deploy From Visual Studio 2012 To A Remote IIS 8 Server
5 pages
Affiliated To Savitribai Phule Pune University & Approved by AICTE, Ministry of HRD, Govt. of India Accredited by NAAC
No ratings yet
Affiliated To Savitribai Phule Pune University & Approved by AICTE, Ministry of HRD, Govt. of India Accredited by NAAC
8 pages
Chapter 2
No ratings yet
Chapter 2
9 pages
Handout 12977 CP12977 Simulationfor Fusion 360
No ratings yet
Handout 12977 CP12977 Simulationfor Fusion 360
11 pages
An Overview of Fourier Transform On Image Processing: Agpe The Royal Gondwana Research Journal
No ratings yet
An Overview of Fourier Transform On Image Processing: Agpe The Royal Gondwana Research Journal
6 pages
A Brief History of Feedback Control - Lewis
No ratings yet
A Brief History of Feedback Control - Lewis
19 pages
Non-Linear Discrete Model of BLDC Motor For Studyi
No ratings yet
Non-Linear Discrete Model of BLDC Motor For Studyi
7 pages
Addendum
No ratings yet
Addendum
2 pages
rohini_23767166120
No ratings yet
rohini_23767166120
3 pages
Course Outline For ASPNet
No ratings yet
Course Outline For ASPNet
3 pages
Tutorial Linear Feedback Shift Registers
No ratings yet
Tutorial Linear Feedback Shift Registers
4 pages
HSV-based Color Texture Image Classification Using
No ratings yet
HSV-based Color Texture Image Classification Using
8 pages
ridge and lasso
No ratings yet
ridge and lasso
2 pages
EEE 411 Lab2
No ratings yet
EEE 411 Lab2
11 pages
Indian Institute of Technology Roorkee: RD TH
No ratings yet
Indian Institute of Technology Roorkee: RD TH
2 pages
Birla Institute of Technology and Science, Pilani, Hyderabad Campus First Semester 2020-21 Course Handout (Part-II) Date: 17.08.2020
No ratings yet
Birla Institute of Technology and Science, Pilani, Hyderabad Campus First Semester 2020-21 Course Handout (Part-II) Date: 17.08.2020
2 pages
1978 Journel and Huijbregts 2
No ratings yet
1978 Journel and Huijbregts 2
3 pages
Print It 2
No ratings yet
Print It 2
5 pages
Cybersecurity in Cloud Computing
From Everand
Cybersecurity in Cloud Computing
Akula Achari
No ratings yet
L 12 Deadlocks 2
No ratings yet
L 12 Deadlocks 2
2 pages
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
From Everand
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Byron Ellis
No ratings yet

Analyzing SQL Payloads Using Logistic Regression I

Uploaded by

Analyzing SQL Payloads Using Logistic Regression I

Uploaded by

Journal of Intelligent Systems 2023; 32: 20230063

Analyzing SQL payloads using logistic

Keywords: big data, logistic regression, spark ML, SQL injection.

Figure 1: Proposed system to classify submitted queries.

2.1 SQL-i datasets

Table 1: Summary of the dataset

SQLIA 109,518 76,670 32,848 52,213 57,305

The dataset used in this research is described in the Figure 2.

Figure 2: Dataset before pre-processing.

2.2 Data pre-processing

Algorithm 1: CountVectorizer for data pre-processing

Input: Dataset prior to initial processing

Table 2: Dataset after pre-processing

Sample Array Convert CountVectorizer Datasets Text Using

2.3 Training and testing

2.4 Prediction approach

2.5 Performance evaluation measures of prediction approach

Table 3: Confusion matrix

True class Class X TN FP

3 Results and discussions

Table 4: Software requirement

System type 64-bit operating system, x64-based processor

Table 5: Hardware requirement

Processor Intel(R) Core (TM) i7-5500U CPU @ 2.40 GHz

3.1 First experiment

3.2 Second experiment

Table 6: Result of ﬁrst experiment

Seq Name of parameter Value

1. Time complexity 0.10 s

Table 7: Result of second experiment

Seq Name of parameter Value

1. Time complexity 0.05 s

Table 8: Result of comparison between previous studies and this study

Ref Model Accuracy Time complexity Dataset size

[19] SVM 98.6 Non 181,303

Conﬂict of interest: Authors state no conﬂict of interest.

You might also like