XSS Cross-Site Scripting Attack Detection by Machine Learning Classifiers
XSS Cross-Site Scripting Attack Detection by Machine Learning Classifiers
11th International Conference on System Modeling & Advancement in Research Trends, 16th–17th, December, 2022
College of Computing Sciences & Information Technology, Teerthanker Mahaveer University, Moradabad, India
Authorized licensed use limited to: KLE Technological University. Downloaded on March 30,2023 at 09:23:11 UTC from IEEE Xplore. Restrictions apply.
XSS: Cross-site Scripting Attack Detection by Machine Learning Classifiers
3) Dom Base Cross-site Scripting Umehara et al. proposed SVM, SCW, and Random
This cross-site scripting flaw usually takes place when Forest model samples for converting raw data into 128
data passes from a malware source, including a URL, to vectors. This model is tested on a five-dimensional feature
a sink that supports dynamic code execution. As a result, vector dataset as well as a 128-dimensional feature vector
attackers can use malicious JavaScript to gain access to dataset. The five-dimensional feature vectors are classified
other users' accounts. This eliminates the need to contact into five groups. For both activities and experimenting, five-
the server before making changes to the web page [3]. fold cross-validation was used. They achieved 98.9 percent
accuracy using SVM on a 128-dimensional extracted
II. Literature Review features dataset [9].
Web application security has become a crying need for With a view to detecting XSS attacks, PMD Nagarjun
everyone nowadays as rising new techniques have made et al. proposed a model to train a large labeled and balanced
the web system more vulnerable. Throughout this section, dataset by using supervised ensemble learning techniques.
we'll look at different researchers who have come up with They used several methods like Random forest, AdaBoost,
different ideas to mitigate Cross-site Scripting attacks using SVM bagging, Extra-Trees, gradient boosting, and
machine learning methods. histogram-based gradient boosting in this paper. Finally,
Dimaz Arno Prasetio et al. proposed the hybrid features they obtained an accuracy of 99.89 percent by using the
model to classify XSS attacks and obtained the best results histogram-based gradient-boosting classification model
possible with a 99.87 percent accuracy and no false [10].
positives. With this XSS detection model, false positives Rathore et al. proposed a model for detecting Cross-
can be reduced to 0.039 percent. They used the Kaggle site scripting (XSS) attacks. They used URLs and extracted
dataset as well as datasets from various sources on GitHub features from web pages to train models for their research.
in this test, totaling 16361 datasets that typically contain Among the features are domain names in URLs, Iframe,
XSS attacks [4]. external links, and malicious scripts in SNSs webpages.
Shreyas Sudhir Barde proposed Random Forest over They use a dataset of 1000 XSS malware codes for testing.
the other models by combining Gradient Boosting and They scored 97.2 percent on their tests [11].
Decision Tree, Bagging algorithms, and the knowledge Mereani and Howe proposed a model for detecting
base's the unified graph. The results demonstrated this Cross-Site Scripting attacks that merge Random Forest,
number of advantages over traditional methods in the kNN, and SVM. They used 2000 samples to train and 13,000
majority of cases, with 97.16 percent accuracy in the worst samples to test their model. They were able to achieve up
scenario using a balanced ensembled dataset [5].
to 99.75 percent accuracy in their work. According to the
Ravi Pallam et al. proposed that data tokenization is
experiment, they extracted successfully [12].
the best pre-processing technique and that it significantly
reduced computational time. Using the Kaggle dataset, III. Materials and Approaches.
the Light GBM clearly outperformed the other boosting A. Dataset
algorithms, with 99.51 percent and 99.59 percent accuracy
The most crucial aspect in finding out a cross-site
for SQLi and XSS, respectively [6].
scripting attack is compiling a dataset that contains Cross-
Fawaz Mahiuob Mohammed Mokbal et al. submited a
site scripting attack payloads. In this study, the Kaggle
method for detecting Cross-site Scripting attacks by NLP-
data set is used to test and evaluate the results of various
SVM. Text payload attacks were typically processed using
classifiers. There are 13,600 distinct data sets in total. The
Natural Language Processing, and the SVM model was
data set is available in Kaggle's repository. The data set
used to detect them. Following a thorough examination
contains several payload types that will aid in the detection
of the results, it was determined that the proposed method
of Cross-site Scripting.
is capable of precisely detecting XSS-based attacks with
low FN and FP. Actually, when compared to eight other B. Methodology
algorithms using the same data, the proposed method had Machine learning is a cutting-edge technology which
several significant advantages. With an accuracy of 99.44 enables a system to think as accurately as a human being by
percent, it produced promising and cutting-edge results on using algorithms.
the customized dataset [7]. The attackers have been incessantly trying to ameliorate
Nunan et al. proposed a sample model that detects the strategies over the last few decades to gain control over the
presence of XSS scripts using Support Vector Machine and systems. We have applied Logistic Regression, AdaBoost,
Naive Bayes. Malware code, duplicated special characters, Naive Bayes, XG-Boost, and Decision Tree to identify the
and keywords are all included in Weka's data preprocessing Cross-site Scripting payloads from the dataset. The Cross-
and classification. According to their findings, the maximum site Scripting data set from Kaggle is used to find the best
accuracy rate in similar works is 99.89 percent [8]. classifier for detecting cross-site scripting.
Authorized licensed use limited to: KLE Technological University. Downloaded on March 30,2023 at 09:23:11 UTC from IEEE Xplore. Restrictions apply.
11th International Conference on System Modeling & Advancement in Research Trends, 16th–17th, December, 2022
College of Computing Sciences & Information Technology, Teerthanker Mahaveer University, Moradabad, India
Table 1: Output of Used Classifiers
Authorized licensed use limited to: KLE Technological University. Downloaded on March 30,2023 at 09:23:11 UTC from IEEE Xplore. Restrictions apply.
XSS: Cross-site Scripting Attack Detection by Machine Learning Classifiers
Authorized licensed use limited to: KLE Technological University. Downloaded on March 30,2023 at 09:23:11 UTC from IEEE Xplore. Restrictions apply.
11th International Conference on System Modeling & Advancement in Research Trends, 16th–17th, December, 2022
College of Computing Sciences & Information Technology, Teerthanker Mahaveer University, Moradabad, India
(Contd.) Table 2...
[4] Prasetio, D., Kusrini, K. and Arief, M. R. (2021) “Cross-site
Shreyas 2020 Kaggel XSS Random 97.16 Scripting Attack Detection Using Machine Learning with Hybrid
Sudhir Data set Forest Features”, JURNAL INFOTEL, vol: 13(1), pp. 1-6. doi: 10.20895/
Barde [5] Bagging. Infotel.v13i1.606.
Dataset [5] Shreyas Sudhir Barde, “Cross-Site Scripting detection using Random
Ensemble Forest and Dataset Ensemble Modelling.” pp:1-17, 2020, http://
Modelling. norma.ncirl.ie/4486/1/shreyassudhirbarde.pdf.
[6] Ravi Pallam, Sai Prasad Konda, Lasya Manthripragada, Ram
Fawaz 2022 Kaggel XSS Average 99.44 Akhilesh Noone,” Detection of Web Attacks using Ensemble
Mahiuob Data set Word Learning”, International Research Journal of Engineering and
Mohammed Embedding Technology (IRJET), pp:2931-2939, vol:08(7), July 2021.
Mokbal [7] and Support [7] Mokbal, Fawaz & Dan, Wang & Wang, Xiaoxi. (2022). Detect
Vector Cross-Site Scripting Attacks Using Average Word Embedding and
Machine Support Vector Machine. International Journal of Network Security.
24. 20-28. 10.6633/IJNS.202201.
Proposed 2022 Kaggel XSS Adaboost 99.92 [8] Nunan, Angelo & Souto, Eduardo & Santos, Eulanda & Feitosa,
Method Data set Eduardo. (2012). Automatic Classification of Cross-Site Scripting
in Web Pages Using Document-based and URL-based Features.
V. Conclusion 10.1109/ISCC.2012.6249380.
Cross-site Scripting is a crucial web vulnerability. [9] A. Umehara, T. Matsuda, M. Sonoda, S. Mizuno and J. Chao,
Consideration on the Cross-Site Scripting Attacks Detection Using
Attackers is constantly search to find hidden endpoints Machine Learning, IPSJ SIG Technical Report, Vol.2015-CSEC-71,
which accept any kind of argument and input. If there is No.13, pp.1– 4, 2015 (Japanese).
[10] PMD Nagarjun and Shaik Shakeel Ahamad, “Ensemble Methods to
any lack of filtration, an attacker can quickly acquire Detect XSS Attacks” International Journal of Advanced Computer
that web application and begin attacking the company or Science and Applications(IJACSA), 11(5), 2020. http://dx.doi.
org/10.14569/IJACSA.2020.0110585.
organization's users. To resolve this issue, we used various [11] S. Rathore, P. K. Sharma, and J. H. Park, "XSSClassifier: An
machine learning classifiers to detect payload. On Kaggle, Efficient XSS Attack Detection Approach Based on Machine
we used the dataset. Finally, AdaBoost detects a Cross-site Learning Classifier on SNSs.," JIPS, vol. 13, no. 4, pp. 1014–1028,
2017.
Scripting Payload with 99.92 percent accuracy. It is capable [12] F. A. Mereani and J. M. Howe, "Detecting cross-site scripting attacks
of detecting and protecting the payload from Cross-site using machine learning," in International Conference on Advanced
Machine Learning Technologies and Applications, 2018, pp. 200–
Scripting attacks. 210.
[13] Prince Roy, Rajneesh Kumar, and Pooja Rani, “SQL Injection
References Attack Detection By Machine Learning Classifier," International
[1] S. Akaishi and R. Uda, "Classification of XSS Attacks by Machine Conference on Applied Artificial Intelligence and Computing
Learning with Frequency of Appearance and Co-occurrence," 2019 ICAAIC 2022, pp: 396-401, Proceedings Paper.
53rd Annual Conference on Information Sciences and Systems [14] S., Shilpashree. (2019). Decision Tree: A Machine Learning for
(CISS), 2019, pp. 1-6, DOI: 10.1109/CISS.2019.8693047. Intrusion Detection. International Journal of Innovative Technology
[2] Mwila, Kingston. (2020). An Assessment of Cyber Attacks and Exploring Engineering. 8. 5. 10.35940/ijitee.F1234.0486S419.
Preparedness Strategy for Public and Private Sectors in Zambia. [15] S. Akaishi and R. Uda, “Classification of xss attacks by machine
[3] Vishnu, B. & Kp, Jevitha. (2014). Prediction of Cross-Site learning with frequency of appearance and co-occurrence,” in The
Scripting Attack Using Machine Learning Algorithms. 1-5. 53rd Annual Conference on Information Sciences and Systems
10.1145/2660859.2660969. (CISS’19), pp. 1–6, 2019.
Authorized licensed use limited to: KLE Technological University. Downloaded on March 30,2023 at 09:23:11 UTC from IEEE Xplore. Restrictions apply.