0% found this document useful (0 votes)
31 views

SQL Injection Detection Using Hybrid Model

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

SQL Injection Detection Using Hybrid Model

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

e-ISSN: 2582-5208

International Research Journal of Modernization in Engineering Technology and Science


( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:06/June-2021 Impact Factor- 5.354 www.irjmets.com

Sql INJECTION DETECTION USING HYBRID MODEL


Dr. Sandeep Kumar *1, Tanya Ahuja*2, Bhavya Choudhary*3
*1 Associate Professor, Department of Computer Science, Maharaja Surajmal Institute of Technology,
New Delhi, India.
*2 Student, Department of Computer Science, Maharaja Surajmal Institute of Technology,
New Delhi, India.
*3 Student, Department of Computer Science, Maharaja Surajmal Institute of Technology,
New Delhi, India.
ABSTRACT
All of us are surrounded by technology. So much information and millions of files are being shared all across the
Internet over web applications. Online payments and Internet banking have also become so common recently.
Web-based applications store crucial information from users in databases. The database in the backend is
integrated with web frontends, which allows injection attacks to be performed. SQL injection means placing
harmful code in the original code by inputting malicious SQL statements. Therefore, testing SQLi vulnerabilities
is important, but at the same time, it is practically impossible to check everything without using a proper
algorithm. This paper attempts to detect SQLi attacks using basic Machine Learning algorithms and to improve
the performance stacking technique was used in which one model was chosen as meta model - Logistic
Regression and different combination of basic algorithms (Logistic regression, k-nearest neighbor, formed the
base models. The reason for using these basic models is to highlight that to improve the performance matrix we
don't necessarily need deep learning models which require large datasets and high computational power.
Keywords- SQLi, KNN, Logistic regression, LDA, Neural Networks, Stacking.
I. INTRODUCTION
Web applications are very popular these days. To increase the exposure, organizations make these applications
available on the Internet to increase their gain. Being exposed to Internet increases the security challenges. All
the transactions that we perform today are mostly done online. The data of these websites is stored in
databases. One such type of database is Relational Database in which information can be fetched through
Structured Query Language, i.e. SQL.
SQL Injection is probably the easiest way of stealing the data from any database which stores data on the basis
of web inputs, hackers can get access to the database and make changes through such attacks. As said by “Open
Web Application Security Project”- “injection attack is a technique used to access information or unauthorized
activity”. As a result, hackers rely on SQL injection for stealing information. There are three things that can be
done i.e. Prevention, detection or correction. Preventing is not an easy task because it requires a lot of
knowledge. Purpose of sql injection attack: It is done mainly because of two reasons. One is to gain benefit by
grabbing others sensitive data and another one is to test the knowledge in learning new tasks and try to prove
them.
In the proposed method we’ll use three approaches - Logistic regression, LDA and KNN classification to detect
SQL Injection attacks. Stacking technique will also be used as an approach for detecting SQLIA to get a better
accuracy. The current work done in this domain uses Deep learning algorithms to improve accuracy but we
want to improve the accuracy by only using Machine Learning algorithm so we used hybrid model which is
combination of various base models and meta models
II. METHODOLOGY
The dataset being used is the “SQL1” and “SQL2” dataset. SQL1 contains 4200 valid/invalid sql queries and
SQL2 contains 33761 valid/invalid sql queries. The data is labelled (0 = valid SQL query, 1 = malicious SQL
query). It is taken from kaggle website. It contains the following 2 fields:
1. label: 0 = valid SQL query, 1 = malicious SQL query
2. sentence: the text of the SQL query

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[3846]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:06/June-2021 Impact Factor- 5.354 www.irjmets.com
We have used various supervised Learning Algorithms and to improve the accuracy stacking methods are used.
Supervised learning can be used to classify and process data using machine learning. For which we need
labelled data, for which we already know whether the query is malicious or not, this dataset is then used to
train the model. After the
training is done through various algorithms then this model can be used on unlabeled data for classification of
queries. Following approaches are used -
(1) Linear discriminant Analysis – It picks a new dimension such that it maximizes separation between
means of projected classes and minimize variance within each projected class. For multiple variables,
similar properties are calculated over the multivariate Gaussian. The statistical properties are then
estimated from the data.

(2) Logistic regression - Logistic regression algorithm is used both for classification as well as regression
problems using a set of independent variables i.e. we have only two possible scenarios—either the text
is plain text or it is a malicious text.

(3) KNN classifier - It is a memory based classification algorithm. The steps are as follows-
 First the K-most suitable instances to the data that is being tested are identified.
 Then suitable labels are extracted.
 Labels for data being tested are predicted by combining the data being tested

Figure 1: KNN Illustration


(4) Hybrid model based on stacking algorithm – To improve the accuracy of machine learning models,
three hybrid models are created for which different models are used at different levels to get the best
results.
Initially, the training data(x) has m observations and n features, the training dataset was split into k
folds just like k-cross-validation, then the base model was fitted on k-1 parts and predictions were
made on the kth part, this process was repeated for each part and finally, the base model was fitted on
whole training data to calculate the performance on the test set. This procedure was followed for
different base models KNN, LR, and LDA.
Then the predictions from the training data set were used as features for the second-level model, then
the prediction from the training data set was used as features for the second-level model, then the
second-level model was used to make predictions on the test set.
Brute force analysis was followed – all combinations of base models were tried in the existing models.
And for meta-model, only LR was chosen as in such hybrid models complexity increases when we use
any advanced models. We restricted ourselves to only two levels because after that model becomes
highly overfitted.Three hybrid models are implemented-
a) Base model- KNN and Meta model- LR
b) Base model- LR, KNN and Meta model- LR
c) Base models- LR, KNN, LDA and Meta models- LR
After trying all the approach, the highest accuracy was obtained when LR was used at level 1; KNN,
LDA and LR used at level 0.
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[3847]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:06/June-2021 Impact Factor- 5.354 www.irjmets.com
The Machine Learning detection method is used to check if the incoming parameters which are inputted by the
user consist of any malicious code or not which can be threat to security of the system. But the real-time
performance of machine learning algorithm is poor. Therefore, stacking technique and deep learning-based
approach are used to detect SQL injection. Stacking technique first finds the suitable features and then these are
used to train the model and finally checked on unlabeled data.
III. RESULTS AND DISCUSSION
Successfully detected SQL injection attacks using the following methods and obtained the following
results-
(1) Linear discriminant analysis

Figure 2: Performance Metrics of LDA Model


(2) Logistic regression

Figure 3: Performance Metrics of Logistic Regression Model

(3) KNN classifier

Figure 4: Performance Metrics of KNN Model

(4) Neural Network

Figure 5: Performance Metrics of NN with LR Model


(5) Stacking Algorithm

Figure 6: Performance metric when Base model - KNN and Meta model - LR

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[3848]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:06/June-2021 Impact Factor- 5.354 www.irjmets.com

Figure 7: Performance metric when Base Model - LR, KNN and Meta model – LR

Figure 8: Performance metric when Base Model - LR, KNN, LDA and Meta model – LR
This research paper implemented three machine learning approaches along with neural networks (deep
learning) to detect SQL injection attacks. Accuracy of each approach is-
Table 1: Comparison of accuracy of models
ALGORITHM ACCURACY
1. LR 93%
2. LDA 73%
3. KNN 71%
4. NN 97%
5. STACKING 81% (KNN + LR)
97.5% (KNN, LR + LR)
97.7% (KNN, LDA, LR + LR)

IV. CONCLUSION
SQL injection attack is one of the main security issues in the various sectors mainly in finance and defence
sector where the losses can be huge. In this paper certain algorithms were tried along with neural networks to
improve accuracy in detection of SQL injection attacks. The best accuracy was given by the model in which base
models were Logistic Regression, K-Nearest Neighbour and Linear Discriminant Analysis and the meta model
was Logistic Regression.
The logistic regression was used as meta model because it is one of the most basic models, and also only upto
three models were used as base models because after that complexity increases and also there is no further
improvement in accuracy.
In Future this project can be modified to detect other types of web attacks like dos and css attacks and lot of
work can be done to improve accuracy and performance. The project can further be modified to enhance
usability and efficiency. A larger dataset collected from multiple sources can be used to improve accuracy of
deep learning model. The machine learning model can also be improved for better feature selection. Currently
tokenization approach is used. Different methods can be tried for training the model more effectively. Other
validation techniques like cross validation technique can be used.
V. REFERENCES
[1] Ke Wei, M. Muthuprasanna, Suraj Kothari, “Preventing SQL Injection Attacks in Stored Procedures”
2006 -ASWEC'06 IEEE.
[2] Shikhar Jain & Alwyn R. Pais, “Model Based Approach to Prevent SQL Injection Attacks on.NET
Applications” International Journal of Computer Science & Informatics, Volume-1, Issue-11, 2011.
[3] William G. J. Halfond , Alessandro Orso, “AMNESIA: analysis and monitoring for NEutralizing SQL-
injection attacks”, Proceedings of the 20th IEEE/ACM international Conference on Automated software
engineering, November 07-11, 2005.

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[3849]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:06/June-2021 Impact Factor- 5.354 www.irjmets.com
[4] Nausheen, K.: “Detection and Prevention of SQL Injection Attacks by Request Receiver, Analyzer and
Test Model” 2011.
[5] Uwagbole, S., J. Buchanan, and Lu Fan, “Applied machine learning predictive analytics to SQL injection
attack detection and prevention”, Proceeding of the IFIP/IEEE Symposium on Integrated Network and
Service Management (IM), Lisbob, Portugal, 8-12 May, 2017, pp.10871090.
[6] Kemalis, K. and T. Tzouramanis, “SQL-IDS: A Specification-based Approach for SQL injection Detection”,
Proceedings of the ACM symposium on Applied computing (SAC), Fortaleza, Ceará, Brazil, March 16-20,
2008, pp. 2153 2158.
[7] J. Choi, C. Choi, H. Kim, and P. Kim, “Efficient malicious code detection using Ngram analysis and SVM,”
2011 International Conference on Network-Based Information Systems, NBiS 2011, 2011, pp. 678–689.
[8] D. Kar, S. Panigrahi, and S. Sundararajan, “SQLiGoT: Detecting SQL Injection Attacks using Graph of
Tokens and SVM,” Comput. Secur., vol. 60, pp. 200–203, 2016.
[9] Yuji Kosuga, Kenji Kono, Miyuki Hanaoka, “Sania: Syntactic and Semantic Analysis for Automated
Testing against SQL Injection” Inc. 3-22-23, Tokyo, Japan.
[10] Stephen W. Boyd, Angelos D. Keromytis, “SQLrand: Preventing SQL Injection Attacks”, Columbia
University.

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[3850]

You might also like