SQL Injection Attack Detection and Preve PDF
SQL Injection Attack Detection and Preve PDF
569-580
©Research India Publications. http://www.ripublication.com
569
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 15, Number 6 (2020) pp. 569-580
©Research India Publications. http://www.ripublication.com
570
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 15, Number 6 (2020) pp. 569-580
©Research India Publications. http://www.ripublication.com
• Extracting data: These types of attacks employ • Blind SQL injection: Blind SQL injection is a type of
techniques in order to extract data values from the SQLI attack that asks the database true or false questions
database. This attack presents critical risk to web and determines the answer based on the application’s
application as extracted information could be sensitive response. This attack is often used when the web
and highly top secret to the web application (example application is configured to show generic error messages,
getting customer bank information). Attacks with this but has not mitigated the code that is vulnerable to SQL
intent are the most common type of SQLIA. injection.
• Database alteration: The goal of these attacks is to alter • Union query: In union query attack, the attacker
or change information in a database. For illustration, uses the UNION operator to join a malicious query
a hacker can pay much less for an online product to the original query. The result of the malicious
by modifying its price, which is generally stored in a query will be joined to the result of the original
database. Another possible attack, consists in adding query, allowing the attacker to obtain the values
a malicious link in an online discussion database to of columns of other tables. The following query is
commence succeeding Cross-Site-Scripting attacks. an example of a union query SQL injection attack:
• Performing denial of service: This attack intend is to
deny service to other users and can have different form, SELECT accounts FROM users WHERE login=”
such as shutdown the database of a web application, UNION SELECT cardNo from CreditCards where
locking or dropping database tables, etc. acctNo=10032 - - AND pass=” AND pin=
• Bypassing authentication: The aim of this attack is
to bypass the authentication mechanisms of the web
application. If the intruder succeed to launch such attack Although, the original first query returns the null
it could take the rights and privileges of another user, set, whereas the second query returns data from the
generally with high rights and privileges [6]. ”CreditCards” table.
• Executing remote commands: Remote commands are • Piggy-backed query: In piggy-backed query attack, the
executable code resident on the compromised database attacker intends to inject additional queries to extract
server. These commands can be stored procedures or data, modify or add data. Attackers inject additional
functions available to database users. In this type of queries to the original query, and as a result the DBMS
attacks, hackers attempt to execute arbitrary commands receives multiple SQL queries. The following query is an
on the database, which can lead to denial of service by example of a piggy-backed query SQL injection attack:
executing the shutdown command or database disruption.
• Performing privilege escalation: These attacks try SELECT ∗ FROM userDetails WHERE userid =
to escalate the privileges of the attacker taking ’12’ and password = ’cle’; drop table userDetails ;
advantage of some implementation errors or logical
flaws in the database. As opposed to bypassing • Stored procedures: In stored procedures, the attacker
authentication attacks, these attacks focus on exploiting aims to run stored procedures already saved in the
the database user privileges. This attack can have a database. Indeed, most existing databases are extended
critical consequence especially when the attacker gains with a standard set of implemented functions called
the root privilege. stored procedures that allow even the interaction with
C. SQLI Attack Types the operating system. These stored procedures are
generally invoked by developers in their codes to avoid
SQLI attack can have several types and forms [6]. In this re-writing standard functions. Therefore, an attacker
section, we discuss and classify the main SQLI attack types, can exploit this property and once determines the web
which are: application backend database, SQLIAs can be crafted
• Tautologies: The general goal of a tautology-based to execute stored procedures existing in the determined
attack is to inject code in one or more conditional database, including even procedures that interact with
statements so that they always evaluate to true. The the operating system [6].
most common usages are to bypass authentication • Alternate encoding: In alternate encoding, the attacker
pages and extract data. In the following example, tries to conceal the injected text in order to
an attacker submits ” ’ or 1=1 - -” for the avoid detection by defensive coding practices and
login input field (the values submitted for the automated prevention techniques. More precisely,
other fields are irrelevant). The resulting query is: alternate encodings are enabling techniques that allows
attackers to escape detection countermeasures. These
SELECT accounts FROM users WHERE login=” evasion techniques are widely used by intruder because
or 1=1 - - AND pass=” AND pin=” they know that most IDS scan the query for certain
known ”bad characters”, such as single quotes and
The code injected in the conditional (OR 1=1) transforms comment operators. By changing the character encoding
the entire WHERE clause into a tautology, which results a malicious code can evade detection mechanism. The
in a successful authentication of the attacker and the following query shows an example of alternate encoding
attacker can show all the accounts saved in the database. SQLIA:
571
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 15, Number 6 (2020) pp. 569-580
©Research India Publications. http://www.ripublication.com
SELECT accounts FROM users of the CANDID approach is to dynamically generate the
WHERE login=”legalUser”; structure of the programmer-intended query using candidate
exec(char(0x73687574646f776e)) - - AND benign inputs. Then, this benign query structure is compared
pass=”” AND pin= with the structure of the user constructed query with user
inputs. The main weakness of CANDID is that it cannot deal
The char() function, cited in this example, takes an with external functions call and that it fails to transform some
integer or hexadecimal encoding of a character as particular code (such as code containing loop). Inspired by the
input and returns the character spelling. The stream of CANDID approach, the authors in [9] proposed SQLStor an
numbers in the second part of the injection is the ASCII SQLI detection scheme that is based on semantic comparison.
hexadecimal encoding of the string ”SHUTDOWN”. The semantic comparison is done by comparing the syntax
This result in database shutdown and might lead to denial tree structure of an original query to the syntax tree structure
of service attack. of a constructed benign query. The scheme, first, generates
• Illegal/logically incorrect queries: In illegal incorrect a Benign Query from the Original Query generated by the
queries, attackers intend to input a manipulated query application. This is done by replacing user inputs to the
into the database to generate an error message which query with benign inputs. Then, the scheme compare the
contains some information about the cause of the error, syntax tree structure of both queries. If the syntax trees of
in general error message contains give an idea what’s both queries are equivalent, then the queries are inducing
look like of what the database schema looks like. Figure equivalent semantic actions and therefore the original query is
1 shows an example of such error message. considered as a safe query, else it is considered as malicious
Table I summarizes the different SQLI attack sources, goals query. One limitation of the SQLStor scheme is that it
and types. addresses only the stored procedure type of the SQLI attack.
The authors in [10] presented an approach based on XML
TABLE I: SQLI Attack Sources, Types and Goals (eXtensible Markup Language) to detect only the tautology
classification type of the SQLI attack. The approach contains an XML
file maker that intercepts the user query and converts it into
Parameter For Classification Categories XML format. Then, the XML file pass the user credential
Attack Sources User input
Cookies to an Xschema validator. This later compares the produced
Server variables user query with the already defined legitimate query in the
Second order injection Xschema file. The user is granted access to the system only
Attack Goals Database finger printing
Analysing schema if both queries matches, otherwise the access to system is
Extracting data blocked. Comparing to other existing approaches used to
Amending data detect SQLIA, this method is more suitable in sense of
Executing dos
Equivocating detection execution time but still limited for other types of SQLIA.
Bypassing authentication Another drawback of this approach is the use of predefined
Remote control pattern and therefore a hacker can inject malicious patterns
Privilege intensification
Attack Types Tautology that are not addressed by the developers. The authors in
Illegal/logically incorrect queries [11], [12] focused on the prevention of the injection attacks
Union query that target the database management system (DBMS) behind
Piggyback query
Stored procedure the web applications by embedding the protections in the
Inference DBMS itself. This permits to avoid the semantic mismatch
Alternate encoding between how SQL queries are believed to be executed by
the DBMS and how they are actually executed. The proposed
approach called SElf-Protecting daTabases preventIng attaCks
III. CLASSIFICATION OF SQLI ATTACK (SEPTIC) runs in three modes, one for training (training
DETECTION AND PREVENTION TECHNIQUES mode) and two for operation mode (prevention mode and
A large number of techniques have been proposed to address detection mode). The output of the training mode is a set
the SQLI attack, as it presents the most widely spread attack. of valid query models stored in SEPTIC. SEPTIC attacks
Proposed techniques can be classified according to the web detection mechanism consists in comparing each query
application life cycle. Some techniques address the secure structure with the stored query models. If there is no match, an
development of the web application, others address security of SQLI attack was detected. In the prevention mode, the queries
the web application at runtime. In this paper, we have mainly considered as insecure are dropped and the DBMS stops the
addressed these later techniques and we give a classification query processing. In the detection mode, these queries are
according to the concepts used in these techniques. executed, not dropped. SEPTIC was implemented in MySQL
and evaluated experimentally within web applications written
A. Query-model based SQLI countermeasures in PHP and Java/Spring.
CANDID ”CANdidate evaluation for Discovering Intent
Dynamically” [8], is a tool that permits to check the user B. Obfuscation based SQLI countermeasures
generated query structure in a flexible manner at the runtime The authors in [13] presented a technique of
with those that are self-evidently non-attacking. The idea Obfuscation/deObfuscation in order to detect SQLIA
572
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 15, Number 6 (2020) pp. 569-580
©Research India Publications. http://www.ripublication.com
before routing the query to DMBS. This Obfuscation Collector: plays the role of capturing information, stacks
technique based on two phases static stage, which modifies of executions, and the analysis tree in a query, 2-Modeler:
queries of application in a obfuscation form and dynamic to analyze for role requests, browse trees and compress
stage, which aims to combine the obfuscation queries with the execution stack captured by data collector by functions
the input data of user. The original query is retrieved in step hashes, 3- Repository: Repository saves compressed piles and
of deObfuscation. In Random4 [14], the authors suggested trees, 4- Comparator: checks if the request is not vulnerable
using an encryption algorithm based on randomization it sends the request to the database otherwise it rejects this
to prevents illegal access to the database. The random4 request and sends an answer to the log, 5- Log: saves the
algorithm is based on randomization and it converts the request sending by the Comparator. CCSD is characterized
user input into a cipher text incorporating the concept of by high accuracy rate, low false positive rate, and low time
cryptographic salt. The drawback of this solution is that it consumption.
limits the user input possible values. In [15], a scheme called In [20], the authors defined a tool for DEtection and
SQLshield was proposed to prevent SQL injection attacks in PREvention website from Input Validation Attacks. It consists
web applications. The idea of the SQLshield is to randomize of two modules detection and Prevention. Detection is
the user input data before the SQL query is executed at accomplished by using multi-layer defense mechanism:
the database server. The randomization technique used in Removal of illegal character, Validation through regular
SQLshield modifies the user input data without diverting expression check and IP address tracking. Prevention is
the resultant SQL query from its programmer-intended defined by using Mutation testing and a graphical user
execution. The randomization is achieved by appending interface, which the user can select an attack (SQLI or
a random key to some specific tokens of the user input. Cross-site Scripting or Buffer Overflow) against which he
Comparison with other schemes such as CANDID [8] shows wants to protect website from it, then the website is tested
that SQLshield outperforms these approaches. The drawback for the vulnerability selected.
of SQLshield is that it requires the intervention of user to The authors in [21] presented a Web Defacement and
determine which user input contribute in the SQL query and Intrusion Monitoring Tool (WDIMT). This tool aims to detect
therefore need to be randomized. Moreover, the execution defacements, degradations and intrusions on the web and
time of the algorithm depends on the length of the user allows changes and deletion of files containing intruders.
input and therefore might be ineffective for longer length WDIMT also proposes a solution to recover the original
of user input. To overcome the need of user intervention, content of the web page, these commands are executed only
the authors in [16] proposed AutoRand for Automatic after the use of the Linux terminal command.
Keyword Randomization to prevent SQLI attacks. AutoRand In [22], the authors presented JoanAudit: a new tool of
transforms all SQL keywords in a Java program into a security slicing for auditing java web services and web
randomized value. Before a query execution, AutoRand applications from common injection vulnerabilities. The
checks that each SQL command is randomized to the process of JoanAudit allows to generate a vulnerability report
appropriate value. If it is the case the query is de-randomized and to extract control information and data dependency
and executed. One weakness of AutoRand is its limitation to from the analysed program. This tool is characterized by its
Java program. scalability to web systems and its simple configuration for
injection vulnerabilities.
C. Monitoring and Auditing Based SQLI countermeasures
Preventing SQL Injection Attacks based on Query
D. Entropy Based SQLI countermeasures
Optimization Process (PSIAQOP) [17] tries to prevent
SQLI by optimizing query of user input. The process begins The authors in [23] proposed an automated testing approach,
by analyzing the source code used in the web application and namely µ4SQLi. µ4SQLi can produce effective inputs that
vulnerable in execution, then optimizing these vulnerable lead to executable and malicious SQL queries. The idea is
queries. The optimization engine generates a set of valid to produce inputs that bypass web application firewalls. The
execution in accordance with the heuristic rules. Finally, the goal of this tool is to detect potential SQL vulnerabilities
hotspot (vulnerable part of code) is replaced by the web in a given web application. Inspired by the previous work,
application code with its optimized query. Kumar et al., proposed in [24] a Detection Block Model for
In [18], the authors discussed a novel model based on the SQL Injection Attacks. The proposed solution is also based
dynamic analyzer. Dynamic analyzer plays the role of the user on information theory. For this purpose, entropy for each
who demands a page and analyzing it: the Tester examine the possible query is computed and then saved in a database.
page if it is vulnerable or not then generate a response to the Then, for each submitted query the entropy is computed
user. If the response shows that the page is not vulnerable, the and compared to the ones saved in database. The scheme
request is processed, otherwise, the request is rejected and the was improved in [25] by adding a Message Authentication
knowledge base must be updated by adding the vulnerabilities Code (MAC) derived from entropy. This prevents attackers
found in the page and rules. from knowing the value of acceptable entropy. The authors
The authors in [19] proposed a mechanism called Cloud in [26] also proposed a system based on information theory. It
Computing SQLIA Detection (CCSD) applied to cloud measures query entropy using token probability distribution.
computers in order to distinguish SQLIA from legitimate Then, during execution the system computes the complexity
queries. This approach consists of five units: 1- Data to identify any changes from the measured entropy. Dynamic
573
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 15, Number 6 (2020) pp. 569-580
©Research India Publications. http://www.ripublication.com
query with malicious inputs alters its intended structure and (OWL) to create their security ontology that consists of
therefore its entropy level changes. several parts, like main security, algorithms, assurance or
Combining the information theory with the machine learning, semantic web services. Dobson et al. [32] focused on the
the authors in [27] proposed to defeat SQLI attack field of reliability requirements and revised their ontologies
in authentication. The proposed scheme first generate a for requirements engineering that presented large number
path-index of the SQL query. The path-index consists in of concepts to model a specific domains as cyber attacks.
replacing each keyword in the query with its corresponding Undercoffer et al. [33] used ontology to describe computer
index in an index table defined by the scheme. Then, the network attacks for distributed IDS. The proposed ontology
scheme compute the edit-distance between the run-time query stored the victim parameters for each attack, and then
and the developer intended query. If this distance in the assigns the attack to a class. Moreover, the authors analyzed
threshold range the query is considered benign, otherwise it around 4000 vulnerabilities with their exploitation strategies
is considered as malicious. To determine the optimal value of and present several scenarios of uses cases with common
this threshold, the authors used machine learning techniques. attacks. The authors claim that ontology provides to IDS a
Tested on some scenarios the scheme gives a detection widespread understanding of information issues. Raskin et
rate of more than 92%. However, the scheme complexity al. [34] summarized a large variety of item sets with precise
might introduce an overhead that delay the query execution. specification of security knowledge to improve prevention and
Moreover, the main drawback of this method is that it requires reaction capabilities.
effective training phase and has a limitation with the use of The main drawbacks of the previous cited works is that they
maximum variables in a dynamic SQL query. only model the SQLI attack and did not precise how the
prevention or the detection of the SQLI attack can be done
E. Ontology Based SQLI countermeasures based on these models. Moreover, they only model a subset of
Ontology based solution try to benefits from the attack the SQLI attack types. To overcome these limits, Abderazed
vocabulary and their semantic relations. Ontology is a good et al. [35] created an ontological model of attack detection
way to represent the vocabulary of an attack, which is that effectively captures the context of user input. The authors
not only restricted to its signature, but it also includes the proposed two ontologies: ontology of attack and ontology of
characteristics, features, and any actors related to system. By communication protocol. These ontologies are developed with
using ontology, systems are able to capture the context of OWL-DL through interaction with OWL GUI and using the
users input, which allows to design the defense mechanisms Methontology framework [36], and they have been carried
against web attackers. Ontology also offers an excellent out by OntoCLean [37] to check their consistency. The
comprehensive of metrics as key words and can provide proposed ontology of attack cover all attacks mentioned in the
a generic solution for various environments. Ontology can Common Vulnerabilities and Exposures [38] which includes
also classify the attacks, describes their signatures and the SQLIA. It captures the context source and the target of the
their outcomes on systems. Creation of ontology for web attacks, the various techniques used by the hackers, the impact
applications is similar to building ontology for human on the system components, the vulnerabilities exploited by the
anatomy which requires precision and systematic labelling attacks, and the control in terms of policies to mitigate these
for each component. Ontology can be considered as a set of attacks. The Security Web Application Ontology (SecWAO)
concepts. Taking the example of ontology cited in [28], the [28] [39] proposed by Marianna Bush was integrated with
knowledge objects or core concepts are defined as follows: Uml-based Web Engineering (UWE). It is an ontology used
class Assets: can be databases, class Methods: present the in the scope of Software Development Life Cycle, based
used strategy to reach a goal as antivirus, class Tools: can be on UML. It supports web developers to specify security
a software that support methods, class Security Properties: requirements to make design decisions. This ontology used
present the security goals as confidentiality or integrity, class libraries for prepared statements to avoid database query
Vulnerabilities: present weakness that makes it possible for injection, it provides relevant instances and relates them by
a threat to arise, class Threats: a potential violation of different kinds of relationships such as: belongs to, uses, and
security, and class Notation: present the characteristics of depends on. In [40], the authors proposed a new paradigm
attacks. In what follows, we present the main SQLI attack called Network Security Situation Awareness (NSSA) in the
countermeasures based on ontology. domain of Internet-Of-Things (IOT). This work focuses on
Denker et al. [29], [30] addressed the security of semantic Detecting and Prediction attacks in application and network
web using Ontology Web Language (OWL) and Security Web layer. The authors reflect four key domains to describe
Services using DARPA Agent Markup Language (DAML). the IOT security situation: context, attack, vulnerability and
The authors make a big conceptualization of the domain but network flow. They build the proposed ontology with OWL
use fewer attributes to define concepts. These ontologies need DL enriched by Semantic Web Rule Languages (SWRL) to
to assign more properties to each concept in order to set the enhance the reasoning ability of the NSSA model. These
attributes correctly. Kim et al. [31] developed a more general rules offer more meaningful power than OWL alone. The
ontology that has more concepts than the one proposed by building ontology includes six top classes: Context, Sensor,
Denker [29]. It represents security instructions such as attacks Alert, Attack, Vulnerability and Netflow. They are enriched
mechanisms, protocols, algorithms and credentials. Moreover, by the relation Object properties: hasVulnerability, exploited
the proposed ontology can be applied to any electronic by, exploit, supplyInformation, reflect, generate, and reason.
resource. The authors used the Web Ontology Language For the evaluation of their work, the authors considered
574
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 15, Number 6 (2020) pp. 569-580
©Research India Publications. http://www.ripublication.com
CVE-2013-0375 [38], a vulnerability that describes a subset authors introduced two parameters: λ to give more weight
of types of the SQLI attack. The main weakness of this to the error that considers a legitimate request as a malicious
solution is that it does not detect zero-day attacks, as the request than other errors, and α threshold which represents an
detection mechanism is based on the already described attacks arbitration parameter between the result of Bayesian classifier
on the ontology model. and the result of the inspection engine. Despite HIPS achieved
a good accuracy of 97.6%, it did not grant a solution to handle
F. Machine Learning Based SQLIA countermeasures false negatives and false positives.
Machine Learning (ML) is a sub-field of artificial intelligence In SQL-IDS [43], [44], the authors proposed to use two
that gives machines the skills to learn from data and predicts different neural networks. In their first paper [43], they
the optimum model. ML techniques can be grouped into three proposed to use Back Propagation Neural Network that
main categories Supervised Learning, Unsupervised Learning aims to detect 7 types of SQLI Tautology, Illegal/logically
and Reinforcement Learning. The Supervised Learning incorrect queries, Piggy-backed query, Union Query, Stored
method is based on a set of input data X which deals to Procedures, Inference, Alternate Encoding. Using a small
the result Y, while in the unsupervised learning, the system dataset that contains only 13,000 URLs, the authors classified
only has input data X without being told the expected output and tested one by one the different types of SQLIA. Their
result Y. Recently, several works based on machine learning model achieved an overall accuracy of 96.8%. In their second
was proposed to detect the SQLI attack. These proposed paper [44], they proposed to use a Neural Network Based
solutions used different metrics to evaluate their ML models. Model which has three elements a URL generator, a URL
However, Accuracy and precision are the two significant used classifier, and a Neural Network model. The authors used a
metrics. These metrics define the ability and effectiveness of new and larger dataset than the first one to train and test their
the proposed model to detect SQLI attacks. Accuracy and model. The achieved accuracy was 95%.
precision can be defined by these two equations. In [45], the authors proposed a machine learning algorithm in
TP + TN order to create new rules for a network firewall. These rules
Accuracy = permit to differentiate between malicious and normal network
TP + FP + FN + TN
TP traffic. The proposed methodology consists of the following
P recision = steps: create data, extract features, combine data, training
TP + FP
and testing the model. The authors used two computers on
T N is the True Negative rate. It presents the number of a same LAN that generates web traffic. They created their
correctly predicted normal requests. properly dataset of http request because they consider that
T P is the True Positive rate. It presents the number of the existing open dataset like KDD Cup [46] and DARPA
correctly predicted malicious requests. [47] are old. Moreover, they used the command TCPDump
F N is the False Negative rate. It presents the number of in UNIX to capture all network traffic, in result a PCAP file
incorrectly predicted normal requests. was generate which contains all packets. This PCAP file was
F P is the False Postive rate. It presents the number of loaded into the Wireshark tool to visualize and inspect the
incorrectly predicted malicious requests. data. After the features extraction from malicious and normal
data, the authors combine both data to test the different
Table II presents the confusion matrix that describes the machine learning techniques. To train and test the proposed
necessary metrics to evaluate the machine learning classifiers. model, they choose to use Decision Tree, SVM, Random
Tree, Jribber, Neural Network, and Random Forest. To train,
TABLE II: Confusion Matrix
validate and test their neural network, the authors used 6
Predicted as Predicted as different datasets that contains 1876 malicious packets and
Normal Request Malicious Request
Normal Request TN FN
11444 normal packets. The scheme achieved 96.3%, 61.4%
Malicious Request FP TP and 100% of correct responses for Piggy Backed Query,
Union Query, and the rest types of SQLI attack respectively.
In [41], the authors presented Hybrid architecture (HIPS) to In [48], the authors proposed to use a Stacked AutoEncoder
detect web application attacks including SQLI attack. They (SAE) based on protocol traffic classification. This work is
proposed a novel method to dissect the http request and to based on Neural Network and Deep Learning and justify that
detect anomaly. This method defines collaboration between a SAE is more efficient in terms of features extraction and
machine learning classifier and a firewall inspection engine features selection than artificial neural networks methods. The
based on attacks signature. The authors used the classifier work in [49] used Decision Tree in IDS systems to classify
Bayesian Naif [42] and Bayesian Multinomial [42] to classify and detect SQLIA. The authors used the NSL-KDD [50]
the malicious http request from legitimate http request. They dataset to train and test their method that achieved 83.7%
used Total Cost Rate method (TCR) to extract features from of accuracy. In [51], the authors used a Neural network
Http Request and Mutual Information method to select the Multi-layer Feed Forward to detect SQLI attacks. Even with
basic features which represent the input of the machine a small dataset of 300 SQLI and 200 XSS attacks used
learning classifiers. The authors used Received Operating to train and test the proposed neural network, this method
Characteristic (ROC) a standard performance measurement achieved a low accuracy rate = 66.67%. In [52], the authors
tool which allows reproducing a curve of true positive rate concentrated on detecting three types of SQLIA namely,
according to the false positive rate. To evaluate the HIPS, the alternate encoding, union query and tautology. They proposed
575
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 15, Number 6 (2020) pp. 569-580
©Research India Publications. http://www.ripublication.com
to use the Naive Bayes machine to extract and select key optimal the legal queries model. Although, the authors claim
features from the dataset and Role Based Access Control a high accuracy rate the reported results are done on small
mechanism to make the decision. This method achieved datasets (only 1655 queries). Moreover, as already mentioned
93.3% of accuracy. the scheme might have a high false positive rate and this
Based on graph theory and machine learning, the authors problem is resolved only for authenticated users.
in [53] presented SQLiGOT: a novel SQLI attack detection Table III describes and compares between the previously
methods that uses graph of tokens. SQLiGOT is mainly discussed works using machine learning algorithms. It
composed by a graph generator and the SVM classifier. After exposes the different ML classifiers that are used within
converting the SQL query into tokens, the generator construct different context, trained and tested with different datasets.
a graph having as nodes the tokens and as weights the relation The different operating conditions of the classifiers explain
between these tokens. Then, the SVM is used to classify query the various performance measures found. The performance of
and to detect SQLIA. To evaluate SQLiGOT, the authors used these techniques depends on the choice of the classifier. The
a dataset collected from different sources that contains 4610 best performance reached 100% of precision using Stacked
injected sequences and 4884 genuine sequences. With a ratio AutoEncoder classifier, while the less achieved accuracy is
of 80% for training (3907 genuine + 3688 injected) and 20% 66.67% using neural network multi-layer feed forward with
for testing (977 genuine + 922 injected), the scheme achieved a small dataset size. The different performances metrics
99.47% accuracy and 0.31% false positive rate. Even, the total (Accuracy, Precision, Recognition Rate) obtained prove that
theoretical processing time introduced by SQLiGoT is within the machine learning algorithms are among the most suitable
50 ms over each page load, it presents an overhead for a web techniques for security problem and detection of the SQLI
server executing more than 100 requests per second. attack.
In [54], the authors used machine learning algorithms to tune
the rules of the Web Application Firewall (WAF) Modsecurity IV. CONCLUSION AND DISCUSSION
[55]. The authors used three classifiers SVM, KNN (with
K = 3), and Random Forest classifiers. They showed that According to the main security consortiums, OWASP [2] and
they are able to improve the detection capabilities of the SANS [60], the injection vulnerabilities continue to be the
Modsecurity WAF and more precisely they have reduced the most spread and dangerous attacks on web applications. A
false negative rate. However, in order to work properly the diversity of works have been proposed, and new techniques
proposed scheme need a large dataset of malicious attacks are emerging to deal with this kind of attacks. However,
related to the given application. They used three datasets these huge amounts of work make the selection of the best
CSIC-2010 [56], DRUPAL [57], and PKDD2007 [58]. They solution, that fit a given web application, a difficult task.
used three scenarios to train and test their model. The average This paper tries to resolve this later problem by giving a
precision of SVM, random forest and KNN remains on 97%. general overview, and taxonomy of the main and recent
Merging the tokenization technique with neural network, the proposed solutions. Indeed, a classification of the SQLI
authors in [59] proposed a Token based Detection and Neural attack countermeasures was presented and discussed. We also
Network based Reconstruction (TbD-NNbR) framework to highlighted the main characteristics and weaknesses of each
detect and block SQLI attacks. The proposed framework solution.
consists of two modules; the detection module and the Table IV presents a comparison between the discussed
reconstruction module and is located on a proxy server that SQL injection solutions. For each solution we enumerated
lies in between the web server and the database server. The its capabilities in terms of Detection (D), Prevention (P),
detection module (TbD) is token based meaning that it tries to Generate Report (R), Modelization (M), and Classification
match the statically generated legal query tokens against the (C). Moreover, we indicate the SQLI attack types addressed
parsed dynamic query tokens at runtime. An SQLI attack is by the proposed solution.
detected if there is no match between the analyzed query and It is worth noting that some techniques such as [8]–[12] are
the stored legal queries in the template repository. The main based on the modelling of the dynamic SQL query and then
drawback of this techniques is that it needs a large repository comparing it with a set of legal models of SQL queries.
that contains all possible valid queries, otherwise the false These techniques have the advantages of simplicity and easy
negative rate will be high. For this purpose, the authors implementation, however, having all the legal queries models
added the neural network reconstruction module NNbR. The is a difficult task. The lack of enough legal queries models
idea of the reconstruction module is to reduce the denial can lead to a high false positive rate and therefore might
of service against authenticated users. Indeed, when the put the web application out of service. Another category
detection module TbD detects an authenticated user query of techniques are based on the randomization of the SQL
as attack it forwards this query to the NNbR module for query such as [13]–[16]. The idea of these techniques is to
reconstruction. The reconstruction consists in generating a randomize token of the SQL query. This randomization can
valid query from the malicious one by either eliminating the be achieved by concatenating a random number or by adding
injected portions of the actual query or substituting it with a a cryptographic salt. This randomization is removed once the
null value. This process permits to reduce the denial of service query is considered as valid. The main drawback of these
and to provide better system availability for authenticated techniques is that the randomization technique can change
users. The NNbR module uses the Back Propagated-Neural the intended user query and it also limits the possible user
Network (BP-NN) model to train the queries and learn the input.
576
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 15, Number 6 (2020) pp. 569-580
©Research India Publications. http://www.ripublication.com
SQLI-IDS [43] Backpropagation Neural Overall accuracy= 96.8% Dataset 13,000 URL addresses including 500
Network benign URLs and 12,500 malicious URLs
Sheykhkanloo et al. [44] Neural Network Based Model Accuracy= 95% Dataset 25000 URL addresses including 12250
benign URLs and 12250 malicious URLs
Verbruggen et al. [45] Decision Tree, SVM, Random Recognition Rate= 98.6% for 6 different datasets contains 1876 malicious
Tree, Jribber, Neural Network, Neural Network packets and 11444 normal packets.
Random Forest
Wang et al. [48] Stacked AutoEncoder Precision =100% 0.3 million TCP flow data collected from
internal network
Ingre e al. [49] Decision Tree Accuracy= 83.7% NSL-KDD [50]
Moosa et al. [51] Neural network Multi-layer Accuracy= 66.67% 300 SQL injection signatures and around 200
Feed Forward XSS signatures collected from different websites
Joshi et al. [52] Naive Bayes Accuracy= 93.9% 178 Codes including 101 normal codes and 77
malicious codes
SQLiGOT [53] SVM Accuracy= 96.23% 4610 injected sequences and 4884 genuine
sequences
Modsecurity with ML [54] K-NN (K=3), SVM, Random Precision=97% CSIC-2010 [56], DRUPAL [57], and
Forest PKDD2007 [58]
TbD-NNbR [59] Neural Network Accuracy= 99.23% 1655 queries tested, 451 malicious queries and
1204 legal queries.
Monitoring and auditing based techniques such as [17]–[22] handling the web based threats. As a future work, we are
generally try to audit and analyse code to detect possible designing an SQLI attack detection scheme based on machine
vulnerable parts of code. Some techniques make prevention learning that is lightweight and can fit a large number of
by replacing these vulnerable parts of code. Others, just requests per second. Moreover, we considered that there is
detect the SQLI attack. The advantage of these techniques a lack of an updated and standard dataset that can be used
is that they can work offline before the deployment of the by researchers to evaluate their work and compare it with
web application and therefore they can prevent and detect the existing solutions. Therefore, we are planning to provide
SQLI attack before the deployment of the web application. such dataset to be a useful tool for researchers working in
However, these techniques might present a high false negative this field.
rate as some attacks can be launched at runtime, and therefore
are undetectable in the offline mode. R EFERENCES
Some techniques use information theory and detect the
presence of the attacks by measuring the entropy of the query [1] Statista. (2019) Number of web attacks blocked
[23]–[27]. These techniques can be bypassed by hackers who daily worldwide 2015-2018. [Online]. Available:
could produce malicious queries with acceptable value of https://www.statista.com/statistics/494961/web-attacks-
entropy. blocked-per-day-worldwide/
Ontology based countermeasures such as [28]–[35], [40] try [2] OWASP, “Owasp top ten project,”
to avoid signatures based techniques weaknesses by giving https://www.owasp.org/index.php/Category:
good models of the SQLI attack types. Modelling the SQLIA OWASP Top Ten Project, 2019, accessed on April
is the main tasks of the ontology, some works intend to model 2019.
attack, while others are enriched by semantic rules to give [3] G. Deepa, P. S. Thilagam, F. A. Khan, A. Praseed, A. R.
hand to the ontology model to prevent or detect the SQLIA. Pais, and N. Palsetia, “Black-box detection of xquery
Ensuring the semantic relations between the different parts injection and parameter tampering vulnerabilities in
of the attack signature presents the main advantage of using web applications,” International Journal of Information
ontology as a detection technique. Security, vol. 17, no. 1, pp. 105–120, 2018.
In addition, several researchers rely on using machine learning [4] Y. Fang, J. Peng, L. Liu, and C. Huang, “Wovsqli:
algorithms [41], [43]–[45], [48], [49], [51]–[54], [59]. With Detection of sql injection behaviors using word vector
ML, the proposed models able to learn without being and lstm,” in Proceedings of the 2nd International
explicitly programmed. Choosing a well-ordered dataset that Conference on Cryptography, Security and Privacy.
is close to reality leads to a successful completeness of the ACM, 2018, pp. 170–174.
training and testing phases of the classifier. Hence, it can [5] Q. Li, F. Wang, J. Wang, and W. Li, “Lstm-based sql
effectively detect the SQLIA. injection detection method for intelligent transportation
Although these techniques can detect or prevent the SQL system,” IEEE Transactions on Vehicular Technology,
injection attack, there still gaps in their effectiveness in 2019.
577
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 15, Number 6 (2020) pp. 569-580
©Research India Publications. http://www.ripublication.com
[6] W. G. Halfond, J. Viegas, and A. Orso, “A classification Proceedings of the Sixth ACM on Conference on Data
of sql-injection attacks and countermeasures,” in and Application Security and Privacy. ACM, 2016, pp.
Proceedings of the IEEE International Symposium on 295–306.
Secure Software Engineering, vol. 1. IEEE, 2006, pp. [12] ——, “Septic: Detecting injection attacks and
13–15. vulnerabilities inside the dbms,” IEEE Transactions
[7] M. S. Aliero, I. Ghani, S. Zainudden, M. M. Khan, and on Reliability, 2019.
M. Bello, “Review on sql injection protection methods [13] R. Halder and A. Cortesi, “Obfuscation-based analysis
and tools,” Jurnal Teknologi, vol. 77, no. 13, 2015. of sql injection attacks,” in 2010 IEEE Symposium on
[8] P. Bisht, P. Madhusudan, and V. Venkatakrishnan, Computers and Communications (ISCC),. IEEE, 2010,
“Candid: Dynamic candidate evaluations for automatic pp. 931–938.
prevention of sql injection attacks,” ACM Transactions [14] S. Avireddy, V. Perumal, N. Gowraj, R. S. Kannan,
on Information and System Security (TISSEC), vol. 13, P. Thinakaran, S. Ganapthi, J. R. Gunasekaran, and
no. 2, p. 14, 2010. S. Prabhu, “Random4: an application specific randomized
[9] S. Mamadhan, T. Manesh, and V. Paul, “Sqlstor: encryption algorithm to prevent sql injection,” in IEEE
Blockage of stored procedure sql injection attack 11th International Conference on Trust, Security and
using dynamic query structure validation,” in Intelligent Privacy in Computing and Communications (TrustCom),.
Systems Design and Applications (ISDA), 2012 12th IEEE, 2012, pp. 1327–1333.
International Conference on. IEEE, 2012, pp. 240–245. [15] P. Mehta, J. Sharda, and M. L. Das, “Sqlshield:
[10] R. J. Manoj, A. Chandrasekhar, and M. A. Praveena, Preventing sql injection attacks by modifying user input
“An approach to detect and prevent tautology type sql data,” in Information Systems Security. Springer, 2015,
injection in web service based on xschema validation,” pp. 192–206.
International Journal Of Engineering And Computer [16] J. Perkins, J. Eikenberry, A. Coglio, D. Willenson,
Science ISSN, pp. 2319–7242, 2014. S. Sidiroglou-Douskos, and M. Rinard, “Autorand:
[11] I. Medeiros, M. Beatriz, N. Neves, and M. Correia, Automatic keyword randomization to prevent injection
“Hacking the dbms to prevent injection attacks,” in attacks,” in International Conference on Detection of
578
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 15, Number 6 (2020) pp. 569-580
©Research India Publications. http://www.ripublication.com
Intrusions and Malware, and Vulnerability Assessment. [30] G. Denker, L. Kagal, T. Finin, M. Paolucci, and
Springer, 2016, pp. 37–57. K. Sycara, “Security for daml web services: Annotation
[17] E. Al-Khashab, F. S. Al-Anzi, and A. A. Salman, and matchmaking,” in International Semantic Web
“PSIAQOP: preventing sql injection attacks based on Conference. Springer, 2003, pp. 335–350.
query optimization process,” in Proceedings of the [31] A. Kim, J. Luo, and M. Kang, “Security ontology
Second Kuwait Conference on e-Services and e-Systems. for annotating resources,” in OTM Confederated
ACM, 2011, p. 10. International Conferences” On the Move to Meaningful
[18] R. M. Nadeem, R. M. Saleem, R. Bashir, and S. Habib, Internet Systems”. Springer, 2005, pp. 1483–1499.
“Detection and prevention of sql injection attack by [32] G. Dobson and P. Sawyer, “Revisiting ontology-based
dynamic analyzer and testing model,” INTERNATIONAL requirements engineering in the age of the semantic
JOURNAL OF ADVANCED COMPUTER SCIENCE web,” in Proceedings of the International Seminar on
AND APPLICATIONS, vol. 8, no. 8, pp. 209–214, 2017. Dependable Requirements Engineering of Computerised
[19] T.-Y. Wu, C.-M. Chen, X. Sun, S. Liu, and J. C.-W. Systems at NPPs, 2006, pp. 27–29.
Lin, “A countermeasure to sql injection attack for [33] J. Undercoffer, A. Joshi, and J. Pinkston, “Modeling
cloud environment,” Wireless Personal Communications, computer attacks: An ontology for intrusion detection,” in
vol. 96, no. 4, pp. 5279–5293, 2017. International Workshop on Recent Advances in Intrusion
[20] P. P. Churi and K. Mistry, “DE-PRE tool for Detection. Springer, 2003, pp. 113–135.
detection and prevention from input validation attacks [34] V. Raskin, C. F. Hempelmann, K. E. Triezenberg,
on website,” Circulation in Computer Science, vol. 2, and S. Nirenburg, “Ontology in information security: a
no. 5, pp. 23–27, June 2017. [Online]. Available: useful theoretical foundation and methodological tool,”
https://doi.org/10.22632/ccs-2017-252-21 in Proceedings of the 2001 workshop on New security
[21] M. Masango, F. Mouton, P. Antony, and B. Mangoale, paradigms. ACM, 2001, pp. 53–59.
“Web defacement and intrusion monitoring tool: [35] A. Razzaq, Z. Anwar, H. F. Ahmad, K. Latif, and
WDIMT,” in 2017 International Conference on F. Munir, “Ontology for attack detection: An intelligent
Cyberworlds (CW). IEEE, 2017, pp. 72–79. approach to web application security,” computers &
[22] J. Thomé, L. K. Shar, D. Bianculli, and L. Briand, security, vol. 45, pp. 124–146, 2014.
“JoanAudit: a tool for auditing common injection [36] M. Fernández-López, A. Gómez-Pérez, and N. Juristo,
vulnerabilities,” in 11th Joint Meeting of the European “Methontology: from ontological art towards ontological
Software Engineering Conference and the ACM engineering,” 1997.
SIGSOFT Symposium on the Foundations of Software [37] N. Guarino and C. Welty, “Evaluating ontological
Engineering. ACM, 2017. decisions with ontoclean,” Communications of the ACM,
[23] D. Appelt, C. D. Nguyen, L. C. Briand, and vol. 45, no. 2, pp. 61–65, 2002.
N. Alshahwan, “Automated testing for sql injection [38] CVE, “Common vulnerabilities and exposures,”
vulnerabilities: an input mutation approach,” in https://cve.mitre.org/, 2016, accessed on May 2016.
Proceedings of the 2014 International Symposium [39] M. Busch, “Evaluating & engineering: an approach for
on Software Testing and Analysis. ACM, 2014, pp. the development of secure web applications,” 2016.
259–269. [40] G. Xu, Y. Cao, Y. Ren, X. Li, and Z. Feng, “Network
[24] D. G. Kumar and M. Chatterjee, “Detection block security situation awareness based on semantic ontology
model for sql injection attacks,” International Journal and user-defined rules for internet of things,” IEEE
of Computer Network and Information Security, vol. 6, Access, vol. 5, pp. 21 046–21 056, 2017.
no. 11, p. 56, 2014. [41] A. Makiou, Y. Begriche, and A. Serhrouchni,
[25] ——, “Mac based solution for sql injection,” Journal “Improving web application firewalls to detect advanced
of Computer Virology and Hacking Techniques, vol. 11, sql injection attacks,” in Information Assurance and
no. 1, pp. 1–7, 2015. Security (IAS), 2014 10th International Conference on.
[26] H. Shahriar and M. Zulkernine, “Information-theoretic IEEE, 2014, pp. 35–40.
detection of sql injection attacks,” in High-Assurance [42] N. Friedman, D. Geiger, and M. Goldszmidt, “Bayesian
Systems Engineering (HASE), 2012 IEEE 14th network classifiers,” Machine learning, vol. 29, no. 2-3,
International Symposium on. IEEE, 2012, pp. 40–47. pp. 131–163, 1997.
[27] D. Das, U. Sharma, and D. Bhattacharyya, “Defeating [43] N. M. Sheykhkanloo, “SQL-IDS: evaluation of sqli
sql injection attack in authentication security: an attack detection and classification based on machine
experimental study,” International Journal of Information learning techniques,” in Proceedings of the 8th
Security, vol. 18, no. 1, pp. 1–22, 2019. International Conference on Security of Information and
[28] M. Busch and M. Wirsing, “An ontology for secure web Networks. ACM, 2015, pp. 258–266.
applications.” Int. J. Software and Informatics, vol. 9, [44] ——, “A learning-based neural network model for the
no. 2, pp. 233–258, 2015. detection and classification of sql injection attacks,”
[29] G. Denker, L. Kagal, and T. Finin, “Security in the International Journal of Cyber Warfare and Terrorism
semantic web using owl,” Information Security Technical (IJCWT), vol. 7, no. 2, pp. 16–41, 2017.
Report, vol. 10, no. 1, pp. 51–58, 2005. [45] R. Verbruggen and T. Heskes, “Creating firewall rules
579
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 15, Number 6 (2020) pp. 569-580
©Research India Publications. http://www.ripublication.com
580