0% found this document useful (0 votes)
12 views15 pages

Machine Learning-Based Detection of SQL Injection and Data Exfiltration Through Behavioral Profiling of Relational Query Patterns

This document reviews the application of machine learning techniques for detecting SQL injection and data exfiltration by profiling relational query patterns. It highlights the limitations of traditional detection methods and emphasizes the advantages of machine learning models in identifying anomalous behaviors in SQL queries. The review synthesizes recent advancements in various machine learning approaches, evaluates their effectiveness, and discusses emerging challenges in the field.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views15 pages

Machine Learning-Based Detection of SQL Injection and Data Exfiltration Through Behavioral Profiling of Relational Query Patterns

This document reviews the application of machine learning techniques for detecting SQL injection and data exfiltration by profiling relational query patterns. It highlights the limitations of traditional detection methods and emphasizes the advantages of machine learning models in identifying anomalous behaviors in SQL queries. The review synthesizes recent advancements in various machine learning approaches, evaluates their effectiveness, and discusses emerging challenges in the field.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Volume 10, Issue 8, August – 2025 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25aug324

Machine Learning-Based Detection of


SQL Injection and Data Exfiltration Through
Behavioral Profiling of Relational Query Patterns
Semirat Abidemi Balogun1; Onuh Matthew Ijiga2; Nonso Okika3;
Lawrence Anebi Enyejo4; Ogboji James Agbo5
1
Department of Information Science, North Carolina Central University, Durham North Carolina, USA
2
Departmant of Physics Joseph Sarwan Tarka University, Makurdi, Benue State, Nigeria.
3
Network Planning Analyst, University of Michigan, USA
4
Department of Telecommunications, Enforcement Ancillary and Maintenance,
National Broadcasting Commission Headquarters, Aso-Villa, Abuja, Nigeria.
5
School of Engineering and the Built Environment, Birmingham City University, United Kingdom

Publication Date 2025/08/08

Abstract: SQL injection and data exfiltration remain among the most severe threats to relational database security, often
leading to critical data breaches in enterprise systems. This review explores the application of machine learning techniques
for detecting such threats by profiling the behavioral patterns of relational SQL queries. Unlike traditional rule-based
approaches, machine learning models enable the dynamic identification of anomalous query structures and access behaviors
indicative of malicious intent. The study synthesizes recent advancements in supervised, unsupervised, and deep learning
methods tailored for query classification, anomaly detection, and user behavior modeling. Furthermore, it evaluates the
efficacy of these techniques in detecting stealthy exfiltration attacks under evolving threat landscapes. Emphasis is placed
on data preprocessing strategies, feature extraction from SQL logs, and the use of graph-based and sequence-aware models
for enhanced detection accuracy. The review concludes by outlining emerging challenges such as adversarial query
generation, concept drift, and the need for explainable models in high-assurance environments.

Keywords: SQL Injection Detection, Data Exfiltration, Machine Learning, Behavioral Profiling, Relational Query Patterns,
Anomaly Detection.

How to Cite: Semirat Abidemi Balogun; Onuh Matthew Ijiga; Nonso Okika; Lawrence Anebi Enyejo; Ogboji James Agbo (2025)
Machine Learning-Based Detection of SQL Injection and Data Exfiltration Through Behavioral Profiling of Relational
Query Patterns. International Journal of Innovative Science and Research Technology,
10(8), 49-63. https://doi.org/10.38124/ijisrt/25aug324

I. INTRODUCTION et al., 2022). Data exfiltration, the act of illegally extracting


confidential data, often follows a successful SQLi attack.
 Background on SQL Injection and Data Exfiltration Modern attackers exploit SQLi vectors not just for access but
SQL injection (SQLi) continues to rank among the most also for systematically extracting information using advanced
prevalent and damaging web-based vulnerabilities, often enumeration and obfuscation strategies (Zheng et al., 2020).
serving as a primary conduit for unauthorized data With organizations increasingly relying on database-driven
exfiltration. This exploit leverages flaws in input validation services, the attack surface has expanded, making it critical to
to manipulate database queries, enabling attackers to gain understand the nuanced interplay between SQLi vectors and
access to restricted datasets, alter records, or execute systemic data loss. Moreover, multi-staged exfiltration
administrative operations. In modern relational database campaigns now involve lateral movement across
environments, SQLi techniques have evolved beyond basic interconnected systems, highlighting the necessity of early
attacks into sophisticated forms, including blind SQLi, time- detection based on query behavior analysis rather than static
based attacks, and compound methods that evade pattern recognition (Liu et al., 2023). As the boundary
conventional security filters (Alshammari et al., 2021). These between structured and semi-structured data becomes
vulnerabilities are exacerbated in cloud-native and web- blurred, traditional security mechanisms struggle to delineate
facing systems, where dynamic content and complex input benign queries from malicious ones with high fidelity. This
parsing heighten exposure to injection-based attacks (Sajjad

IJISRT25AUG324 www.ijisrt.com 49
Volume 10, Issue 8, August – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25aug324
necessitates more adaptive and intelligent defense paradigms environments where rapid response is crucial. The ability to
capable of behavioral reasoning over query patterns. adapt to changing workloads, recognize rare but impactful
anomalies, and minimize false positives underscores the
 Limitations of Traditional Detection Techniques growing motivation to integrate ML-driven profiling into
Traditional SQL injection detection approaches core database security architectures (Bashir et al., 2023).
primarily rely on signature-based mechanisms, static
analysis, and rule sets to identify known attack patterns.  Scope and Contributions of the Review
While effective against basic and well-documented attack This review aims to synthesize the latest advancements
forms, these techniques often fail to detect polymorphic or in machine learning-based detection of SQL injection and
novel SQLi payloads that do not match existing patterns. data exfiltration through behavioral profiling of relational
Static scanners and web application firewalls (WAFs) are query patterns. By consolidating state-of-the-art research
typically limited to string-matching heuristics, which provide between 2020 and 2025, the paper offers a structured analysis
minimal resilience against obfuscation and encoding tricks of supervised, unsupervised, and hybrid ML techniques that
used by attackers to disguise malicious input. Moreover, most model SQL query behavior to detect malicious activity. It
rule-based systems operate under predefined threat models highlights core innovations in feature engineering, anomaly
that fail to adapt to emerging attack vectors or novel data scoring, sequence learning, and role-based activity profiling,
exfiltration techniques. Dynamic or black-box testing offering a multi-dimensional view of behavioral modeling
techniques have similarly demonstrated poor coverage and approaches. In doing so, the review responds to emerging
high false positive rates, especially in production systems security needs in increasingly dynamic and distributed
where performance overheads and environmental constraints database environments. The paper also identifies critical
limit the scope of runtime inspection. These techniques are research gaps in existing literature, particularly in adversarial
inherently reactive and do not account for contextual robustness, model explainability, and operational integration.
variations in user behavior or query execution patterns. Even Furthermore, it outlines the implications of privacy-
with the integration of contextual filters, rule-based models preserving learning paradigms such as federated learning in
offer limited granularity in differentiating malicious from sensitive data environments, offering future research
legitimate but unusual database operations. Additionally, directions for scalable and secure database protection. By
traditional models struggle to enforce continuous learning, bridging theory with real-world implementations, the review
making them susceptible to degradation over time due to informs both academic and industry stakeholders of practical
concept drift or system updates. This rigid architecture strategies for enhancing SQLi and data exfiltration detection
impedes proactive defense and is ill-suited for environments capabilities using machine learning. Finally, it establishes a
where threat landscapes evolve rapidly. Consequently, there foundational roadmap for future system designs that combine
is a critical need for more intelligent, context-aware systems behavioral profiling with explainable AI to achieve both
that utilize adaptive profiling rather than static detection accuracy and trust in database threat detection systems.
thresholds.
 Structure of the Paper
 Motivation for Machine Learning-Based Profiling The structure of the paper systematically explores the
The limitations of conventional SQLi detection systems evolution of SQL injection (SQLi) and data exfiltration
have accelerated interest in machine learning (ML) detection through machine learning-driven behavioral
techniques that leverage query behavior profiling and profiling. It begins with an Introduction that presents the
adaptive analytics. ML-based detection models have shown background on SQLi, outlines the shortcomings of traditional
significant promise in identifying nuanced anomalies in SQL detection techniques, and highlights the motivation for
transaction logs that elude traditional rule-based mechanisms. adopting machine learning (ML) methods, culminating in a
Behavioral profiling enables the modeling of normal query clear statement of the review’s scope and contributions. In
patterns across time, users, and applications, allowing for Section 2, the paper establishes foundational ML concepts
real-time detection of deviations that may signal SQL relevant to database security, covering supervised and
injection or data exfiltration attempts (Adebayo & Al-Dubai, unsupervised learning strategies, feature engineering from
2020). Unlike static rule sets, ML systems can continuously SQL logs, appropriate evaluation metrics, and requirements
learn from new data, improving their ability to detect novel for real-time detection systems. Section 3 delves into
threats over time. Advanced models such as LSTM-based behavioral profiling by examining structural, temporal,
sequence learners, graph neural networks, and hybrid deep contextual, role-based, and graph-based query analysis
learning frameworks have demonstrated superior precision in methods, supported by tables and diagrams to illustrate
analyzing the semantic and syntactic structure of SQL queries advanced detection strategies. Section 4 focuses on practical
(Chatterjee et al., 2021). These methods treat SQL logs not detection models and system architectures, evaluating
merely as static inputs but as evolving behavioral artifacts— supervised classifiers (e.g., SVM, CNN), unsupervised
capable of encoding user intent and application context. By techniques (e.g., autoencoders, Isolation Forests), sequence-
capturing latent behavioral features, machine learning models aware models (e.g., LSTM, GRU), and integration with
can distinguish subtle differences in query payloads and database management systems and SIEM platforms. Section
access patterns, enabling detection even when queries are 5 addresses prevailing challenges, including adversarial
syntactically valid yet semantically suspicious (Liu et al., evasion, concept drift, explainability, and privacy-preserving
2022). Additionally, machine learning supports real-time detection through federated learning, while also proposing
stream processing, making it suitable for operational recommendations for future research. This cohesive structure

IJISRT25AUG324 www.ijisrt.com 50
Volume 10, Issue 8, August – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25aug324
not only contextualizes the technological landscape but also models to generalize across unseen query structures offers a
bridges theory with practice in securing relational databases scalable advantage in monitoring dynamic, multi-tenant
against evolving SQL-based threats. database systems. Therefore, selecting between supervised
and unsupervised strategies should consider factors such as
II. MACHINE LEARNING FOUNDATIONS dataset availability, expected attack variability, and
FOR DATABASE SECURITY computational constraints within the target deployment
environment.
 Overview of Supervised and Unsupervised Learning
Machine learning (ML) offers two principal Figure 1 illustrates the framework of supervised
paradigms—supervised and unsupervised learning—for learning, where labeled input data—such as images of cows,
detecting SQL injection and data exfiltration activities based elephants, and camels—are paired with their corresponding
on relational query behaviors. Supervised learning models are labels and provided to an algorithm through a training dataset
trained on labeled datasets where normal and malicious under the guidance of a supervisor. This structured learning
queries are explicitly defined, allowing classifiers such as process enables the algorithm to map input features to
support vector machines, random forests, and neural specific output classes, allowing it to accurately classify new,
networks to learn discriminative patterns as shown in figure unseen examples based on patterns learned during training. In
1. These models have demonstrated high precision and recall contrast, unsupervised learning operates without labeled data;
when sufficient labeled training data are available (Sabottke the system autonomously analyzes input data to uncover
& Abraham, 2022). In contrast, unsupervised learning hidden structures or groupings, such as clustering similar
techniques, which detect anomalies based on deviations from animal images without being told what each one represents.
learned normal behavior, are especially valuable in scenarios While supervised learning excels in scenarios where labeled
lacking labeled attack data. Algorithms such as Isolation datasets are available and precise classification is critical,
Forests (Liu et al., 2021) and clustering-based outlier unsupervised learning is better suited for exploratory analysis
detection (Aggarwal & Sathe, 2020) enable the dynamic or anomaly detection, especially in environments where
profiling of unknown or stealthy threats. Hybrid methods, labeled data are scarce or non-existent. For instance, in
which combine unsupervised pretraining with supervised detecting data exfiltration in databases, supervised models
fine-tuning, have gained traction in high-variability require predefined examples of malicious queries, whereas
environments such as streaming SQL logs. Li, Yang, and unsupervised models can identify suspicious behavior purely
Jiang (2023) propose a hybrid neural model that uses an by recognizing deviations from learned normal patterns.
unsupervised autoencoder for anomaly scoring, followed by Thus, the supervised learning pipeline shown in the image
a supervised classifier to refine predictions in real time. These demonstrates a guided, label-driven training method, whereas
approaches are essential when dealing with concept drift or unsupervised learning relies on self-organized discovery of
evolving attacker behavior that may render fixed-label patterns without explicit instruction.
models obsolete. The adaptive capacity of unsupervised

Fig 1 Picture of Workflow of Supervised Learning in Machine Learning Classification Tasks (Geeksforgeeks, 2024).

IJISRT25AUG324 www.ijisrt.com 51
Volume 10, Issue 8, August – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25aug324
 Feature Engineering from SQL Query Logs deployment scenarios. They note that models trained and
Feature engineering serves as the foundational step in tested on non-temporally disjointed data often produce
modeling SQL query behavior for anomaly detection. It inflated results. Xu and Tan (2023) propose using graph-
involves transforming raw SQL logs into structured structured query datasets to benchmark models designed for
representations that can effectively characterize both graph neural networks and sequence-based profiling.
syntactic and semantic patterns of query activity. One Furthermore, standardized testbeds such as the CICIDS and
prominent method is token-based embedding, which treats Web Application Attack and Audit Framework (WAAF)
SQL keywords, operators, and table names as linguistic provide labeled SQL injection instances and contextual
tokens to be vectorized using embedding models such as metadata, enabling comparative benchmarking across
word2vec or BERT-based encoders (Sharma et al., 2022). studies. Ultimately, the reliability and generalizability of
These semantic representations enable the detection of detection models hinge on transparent evaluation protocols,
context-sensitive attacks such as tautology-based injections comprehensive benchmarking, and clearly defined
and piggybacked queries. Another approach involves performance criteria tailored to the adversarial nature of SQL-
dependency graph modeling, where queries are parsed into based threats.
directed acyclic graphs to reflect their logical execution flow
(Zhang & Yu, 2021). These graphs are then transformed into  Real-Time Detection Requirements
features that capture structural anomalies and data access Effective real-time detection of SQL-based intrusions
hierarchies. Unsupervised encoders such as autoencoders or requires models and system architectures optimized for
variational encoders can learn latent behavioral signatures latency, adaptability, and minimal overhead. Traditional
from high-dimensional query logs, facilitating the extraction batch-processing detection techniques are ill-suited for high-
of abstract features not immediately observable in raw syntax throughput, transactional environments. In contrast,
(Mehrotra & Thakur, 2023). Additionally, frequency-based streaming models and low-latency neural networks offer the
features—such as token n-grams, temporal histograms, and responsiveness needed for just-in-time threat mitigation.
inter-query latency—offer complementary insights into Akhtar and Farooq (2020) demonstrate the use of deep
repetitive versus novel access behaviors. Kaushik and Joshi autoencoders with Apache Kafka stream processors for
(2020) further emphasize the use of session-level microsecond-level anomaly detection in enterprise-grade
aggregations, such as total write operations or join databases. These frameworks not only ingest live query logs
complexity, to profile users’ access patterns. These but also integrate anomaly scores into decision-making
engineered features play a critical role in improving model engines with minimal delay. Chen and Zhao (2021) propose
interpretability and reducing false positives. As SQL queries a recurrent neural network model that processes query
exhibit structured formats with varying parameterizations, streams in sliding windows, capturing temporal dependencies
robust feature engineering is essential to distinguish between and inter-query relationships without sacrificing speed. This
benign variations and malicious payloads embedded in is essential for detecting time-sensitive exfiltration patterns,
otherwise syntactically valid statements. such as slow data leaks via time-based SQL injection.
Ouyang, Lin, and Zhang (2023) introduce attention
 Evaluation Metrics and Benchmarking Datasets mechanisms into online detection models to prioritize high-
A key element in assessing the effectiveness of machine risk query segments in real time, further enhancing
learning-based detection systems is the choice of evaluation performance. Meanwhile, Adekunle and Zhang (2024) argue
metrics and datasets. For SQL injection and data exfiltration for the integration of event-driven triggers within database
scenarios, traditional metrics like accuracy may be management systems that invoke detection models upon
misleading due to class imbalance. More reliable indicators specific execution contexts—e.g., large-volume data
include precision, recall, F1-score, and Area Under the transfers or schema modifications. Scalability remains a
Receiver Operating Characteristic Curve (AUC-ROC), which persistent challenge, particularly in multi-tenant cloud-hosted
better capture the trade-off between false positives and false environments. Therefore, detection systems must be
negatives (Banerjee & Singh, 2022). Given the rarity of real- lightweight, horizontally scalable, and capable of operating
world attack samples, datasets are often synthetically with partial context. These requirements necessitate a shift
augmented or derived from simulated environments. toward edge-computing paradigms and hardware-accelerated
Vakharia and Patel (2020) underscore the need for diverse inference to support sub-second detection without
benchmarking datasets incorporating multiple types of compromising accuracy. Collectively, these innovations
injection attacks and user behaviors to prevent overfitting. underscore the vital importance of real-time architectural
Zhang and Wang (2021) advocate for stratified sampling and design in safeguarding relational databases against agile and
temporal partitioning when evaluating model performance on evasive SQL-based attacks.
production-like logs, thereby simulating real-world

Table1 Summary of Real-Time Detection Requirements for SQL-Based Intrusion Detection


Aspect Description Techniques/Models Challenges & Considerations
Need for sub-second response Deep autoencoders with Apache Traditional batch models are too
Latency
time to detect intrusions in Kafka; streaming models; low-latency slow for high-throughput SQL
Optimization
real-time systems. neural networks. environments.

IJISRT25AUG324 www.ijisrt.com 52
Volume 10, Issue 8, August – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25aug324
Capturing inter-query Required to detect slow
Temporal Recurrent Neural Networks (RNNs)
relationships and detecting exfiltration or time-dependent
Sensitivity with sliding window mechanisms.
time-based attacks. SQL injection patterns.
Real-time prioritization without
Context-Aware Emphasizing risky queries for Attention mechanisms in online
delay adds computational
Prioritization focused analysis. detection models.
complexity.
Efficient operation in cloud or Event-driven triggers; hardware- Lightweight and horizontally
Scalability &
edge-hosted, multi-tenant accelerated inference; edge- scalable models must function
Deployment
environments. computing architectures. with incomplete data.

III. BEHAVIORAL PROFILING OF RELATIONAL Lee, Kim, and Park (2023) extended normalization by
QUERY PATTERNS implementing an adaptive pipeline that learns rewriting rules
from benign and malicious corpora, dynamically updating
 Query Structure Analysis and Normalization Techniques normalization mappings. They observed over 8%
Machine-learning approaches for detecting SQL improvement in true-positive detection rates with continued
injection attacks increasingly rely on detailed query structure model retraining, indicating interactive pipelines can
analysis and normalization. Alomari and Wang (2021) maintain performance under evolving attack strategies (Lee,
demonstrated that decomposition of query syntax into Kim, & Park, 2023). Moreover, Zhang et al. (2024) focused
abstract representation—such as AST subtrees—enables on structural feature extraction, identifying relational
supervised classifiers to learn structural abnormalities rather operators, nested subqueries, and join structures as key
than relying on surface-level patterns. This method improves predictive attributes. By coupling these features with
resilience to obfuscated injections by capturing structural ensemble learning models, near 99% detection accuracy was
deviations (Alomari & Wang, 2021). Complementarily, Khan achieved on benchmark datasets (Zhang, Xu, & Li, 2024).
and Ahmad (2022) introduced grammar-based normalization, Taken together, structural analysis and normalization are
mapping SQL statements into canonical formats by rewriting foundational to robust SQL injection detection. They abstract
literals, removing whitespaces, and applying grammar rules. away superficial syntax and expose meaningful deviations,
This process yields high-quality features for downstream enabling adaptive, high-accuracy machine learning
models, as redundant variations are collapsed and model applications. They also support efficient feature engineering
complexity reduced (Khan & Ahmad, 2022). pipelines, reducing variance and computational cost, and
facilitate integration across domains and data sources (Khan
& Ahmad, 2022; Zhang et al., 2024).

Table 2 Structural Analysis and Normalization Techniques for Machine Learning-Based SQL Injection Detection
Technique Key Contribution Impact on Detection Reference
Abstract Syntax Tree Query syntax is parsed into AST subtrees to Enhances detection of obfuscated
Alomari &
(AST) enable classifiers to detect structural injections by exposing deep structural
Wang (2021)
Decomposition irregularities beyond surface-level syntax. deviations.
SQL statements are rewritten into canonical Improves feature consistency and
Khan &
Grammar-Based formats by removing literals and reduces model complexity, enabling
Ahmad
Normalization whitespaces and applying grammar more robust downstream
(2022)
transformation rules. classification.
Rewriting rules are dynamically learned
Adaptive Achieves over 8% increase in true-
from benign and malicious query corpora; Lee, Kim, &
Normalization positive rates and adapts to evolving
normalization mappings are updated Park (2023)
Pipeline injection strategies.
iteratively.
Key relational components like joins, Attains near 99% accuracy by
Structural Feature
subqueries, and operators are extracted and enhancing model interpretability and Zhang, Xu, &
Extraction +
combined with ensemble learning precision across complex SQL Li (2024)
Ensembles
algorithms for detection. injection types.

 Temporal and Contextual Profiling of Query Behavior Sánchez, and Molina (2022) incorporated contextual
Temporal and contextual profiling enhances detection metadata—including user location, device fingerprints, and
systems by capturing execution patterns over time rather than session duration—to enrich query representations. Their
analyzing queries in isolation. Chen et al. (2020) used time- supervised model combining contextual embeddings with
series models to learn sequences of SQL operations such as query features significantly reduced false positives by 12%,
SELECT-UPDATE-INSERT, modeling them with recurrent evidencing the value of layered behavior modeling (Gomez,
neural networks to detect deviations from normal execution Sánchez, & Molina, 2022). Singh and Kumar (2023)
flows. Their LSTM-based approach demonstrated strong implemented behavioral profiling over streaming SQL
sensitivity to temporal anomalies, detecting sequence-level activity using sliding windows and an LSTM classifier. This
manipulations indicative of data exfiltration (Chen, Rao, & allowed detection of abnormal query bursts or timing
Zhao, 2020). Complementing temporal analysis, Gomez, irregularities, such as rapid repeated SELECT queries

IJISRT25AUG324 www.ijisrt.com 53
Volume 10, Issue 8, August – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25aug324
designed to pull large data volumes. The sliding-window constructed query graphs where nodes represent tables and
technique gave early warnings within seconds of suspicious attributes, and edges represent joins and predicates as shown
behavior (Singh & Kumar, 2023). Wang and Li (2024) in figure 2. They applied graph neural networks to these
advanced this further with context-aware data exfiltration structures, enabling detection of structural anomalies typical
detection, applying transformers to query contexts including of injection or exfiltration attempts. Their approach achieved
previous operations, query results size, and resource access over 97% detection accuracy across datasets (Bianchi, Grana,
patterns. Their hybrid feature set enabled 94% detection of & Rossi, 2021). Lu, Chen, and Fang (2022) extended graph
stealthy extraction campaigns that mimic normal user modeling to multi-query sessions, creating session-level
behavior, without causing unacceptable false alerts (Wang & graphs that aggregate multiple queries into interaction
Li, 2024). networks. By analyzing graph centrality and motif patterns,
they captured coordinated attack behaviors, such as low-
In summary, combining temporal sequence learning frequency multi-step exfiltration sequences, with high
with user and session context significantly improves detection fidelity (Lu, Chen, & Fang, 2022). Sharma and
detection capabilities. These methods allow security systems Desai (2023) introduced query–table interaction graphs that
to identify attack patterns, detect exfiltration acts in progress, integrate user-session contexts and table access metadata.
and adapt to shifts in legitimate usage over time. They used subgraph matching to identify abnormal traversal
patterns across tables. This approach effectively detected
 User and Role-Based Activity Modeling attempts to access sensitive data across unrelated relational
User- and role-based modeling introduces a tables, achieving low false alarm rates (Sharma & Desai,
personalized layer for anomaly detection. Garcia and Watts 2023). Finally, Zhou, Li, and Xu (2024) applied graph stream
(2021) developed a system that profiles user groups based on learning, processing continuous streams of query graphs with
their assigned roles (e.g., admin, analyst). By learning normal incremental graph neural networks. Their system detected
query distributions per role, it flags queries that deviate from emerging structural patterns indicative of new attack vectors
role-based norms, reducing false positives and with minimal latency, suitable for real-time environments
contextualizing alerts (Garcia & Watts, 2021). Hussain, (Zhou, Li, & Xu, 2024). Graph-based representations
Ahmed, and Nazir (2022) proposed dynamic profiling of transform relational query behavior into expressive, structural
users and roles using statistics such as average query length, data amenable to advanced deep learning. This enables robust
access frequencies for sensitive tables, and query language detection of both injection and stealthy data exfiltration
features. Their clustering approach grouped users with similar attacks through structural prior exploitation.
behavioral patterns according to roles; queries falling outside
cluster boundaries were marked anomalous. This method Figure 2 is structured into two main branches: Graph
achieved high detection precision in enterprise deployment Structures and Representations and Detection Capabilities
(Hussain, Ahmed, & Nazir, 2022). Patel, Sharma, and Mehta and Outcomes. The first branch outlines the various graph
(2023) built supervised models that incorporate user and role modeling techniques used to represent SQL query behavior.
IDs as categorical embeddings alongside SQL query features. It includes query graphs, which map tables and attributes as
This approach allowed the model to learn user-specific nodes and joins/predicates as edges to detect structural
nuances and flag unusual deviations, resulting in reduced anomalies; session-level graphs, which aggregate multiple
false positive rates by ~15% compared to role-agnostic queries to uncover coordinated, low-frequency attack
models (Patel, Sharma, & Mehta, 2023). Yoon, Park, and Han sequences using motif and centrality analysis; query–table
(2024) developed a hybrid framework combining role-based interaction graphs, which incorporate user-session context
baselines and user-specific anomaly detection. Initially, and access metadata to detect abnormal traversals between
queries are compared to role-level expectations; if unrelated tables; and streaming query graphs, which employ
anomalous, they are further individualized based on user incremental graph neural networks to process real-time data
history. This multi-tiered strategy offers both context- and identify emerging attack vectors. The second branch
sensitive detection and granularity, effectively identifying focuses on the detection performance enabled by these
both general and highly targeted internal threats (Yoon, Park, models, highlighting their ability to detect both SQL injection
& Han, 2024). User and role–based activity modeling and stealthy exfiltration attacks by exploiting structural
enhances detection systems by embedding identity context patterns. These methods demonstrate low false alarm rates
into behavioral analysis, crucial for identifying privileged due to enhanced contextual modeling, enable real-time
misuse and internal exfiltration efforts while reducing noise. suitability through low-latency streaming analysis, and
leverage deep learning integration—specifically graph neural
 Graph-Based Query Behavior Representations networks—for robust structural feature learning. Together,
Graph-based representations model queries as the diagram illustrates how transforming relational queries
structured graphs, encoding relationships between tables, into graph-based representations empowers intelligent,
fields, and operations. Bianchi, Grana, and Rossi (2021) adaptive security systems in database environments.

IJISRT25AUG324 www.ijisrt.com 54
Volume 10, Issue 8, August – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25aug324

Fig 2 Diagram Illustration of Hierarchical Overview of Graph-Based Representations and their Role in Anomaly Detection for
SQL Query Behavior

IV. DETECTION TECHNIQUES AND SYSTEM complex malicious patterns, achieving >95% accuracy even
ARCHITECTURES on heterogeneous datasets. The model's convolutional filters
were effective in capturing token-level anomalies and
 Supervised Classification Models (e.g., SVM, Random syntactic irregularities. Recurrent Neural Networks (RNNs),
Forest, CNN) Long Short-Term Memory (LSTM) units in particular, also
Supervised learning models such as Support Vector show excellent performance. Tang et al. (2020) reported near-
Machines (SVM), Random Forests (RF), and Convolutional perfect detection accuracy (~99%) for SQL injections by
Neural Networks (CNN) have proven highly effective for modeling sequences of tokenized queries. Similarly, Ibrahim
SQL injection detection by learning discriminative patterns and Suryani (2023) explored ensemble approaches that
between malicious and benign queries. Demilie and Deriba combine SVM and Naïve Bayes, indicating that boosted
(2022) demonstrated an ensemble framework combining ensemble models can outperform isolated classifiers by
supervised classifiers with traditional techniques, achieving balancing trade-offs between false positives and detection
high detection rates (>98%) by leveraging Random Forest rates. In operational environments, latency and real-time
and SVM alongside handcrafted features extracted from requirements necessitate lightweight implementations.
query logs. This hybrid approach mitigates the limitations of Random Forests offer fast inference with interpretable feature
single models and enhances resilience against variant SQL importance as seen in Table 3. whereas CNNs offer high
attacks (Demilie & Deriba, 2022). Deep learning models, accuracy with slightly greater computational cost. Hybrid
especially CNNs, provide another layer of robustness due to ensemble systems combining RF for fast detection and
their ability to perform automated feature learning from raw CNN/LSTM for confirmatory analysis provide a robust
SQL text. Falor et al. (2022) implemented a CNN-based pipeline capable of minimizing detection delays while
model that parsed encoded SQL queries and identified maintaining high accuracy.

Table 3 Comparative Summary of Supervised Classification Models for SQL Injection Detection
Model Type Core Strengths Performance Highlights Operational Considerations
Achieved >98% accuracy when
Strong at classifying Lightweight, interpretable; best
Support Vector combined with Random Forest and
high-dimensional, linearly when paired in ensembles for
Machine (SVM) handcrafted features (Demilie & Deriba,
separable data complex query structures
2022)
Fast inference, high Demonstrated high detection rates in Suitable for real-time systems;
Random Forest
interpretability, ensemble hybrid frameworks with minimal latency used as a first-pass filter in
(RF)
resilience impact (Demilie & Deriba, 2022) layered detection frameworks

IJISRT25AUG324 www.ijisrt.com 55
Volume 10, Issue 8, August – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25aug324
Convolutional Automated feature >95% accuracy on heterogeneous Higher computational cost than
Neural Network learning, captures local datasets using encoded SQL inputs RF; effective in confirmatory
(CNN) token anomalies (Falor et al., 2022) stages of hybrid models
~99% detection accuracy; ensemble with Best for modeling long SQL
Sequence modeling,
RNN/LSTM and Naïve Bayes showed superior false- sequences; requires GPU
memory retention for
Hybrid Ensembles positive trade-off (Tang et al., 2020; acceleration for real-time
temporal query patterns
Ibrahim & Suryani, 2023) performance

 Unsupervised and Semi-Supervised Anomaly memory capabilities. Their ensemble, combining stacked
Detection(e.g., Autoencoders, Isolation Forests). autoencoders with both RNN types, achieved solid detection
Autoencoder‐based approaches have gained attention in rates (~89–95%) on benchmark datasets, proving effective for
SQL security research. Alghawazi et al. (2023) trained streaming environments. Sequence-aware models provide
RNN-autoencoders on benign query sequences, flagging deep contextual analysis by learning user or session-level
deviations as anomalies and subsequently classifying them behavior patterns and highlighting deviations. Their strength
using an LSTM classifier, achieving 94% accuracy and is especially valuable for stealthy attacks that disguise
demonstrating strong generalization to unseen SQLi variants. malicious actions within normal-looking query streams.
Singh and Jang-Jaccard (2022) further highlighted that
multiscale convolutional recurrent autoencoders capture both  Integration with Database Management and
local and temporal query patterns, outperforming classical SIEMSystems
autoencoders when combined with Isolation Forest. Isolation To deploy machine learning-based SQL injection
Forest stands out for its efficiency in high-dimensional data. detection effectively, integration within DBMS and SIEM
In web log anomaly detection, it frequently identifies ecosystems is essential. Uetz et al. (2023) introduced
anomalous SQL query features without requiring labeled AMIDES, an adaptive misuse detection extension for SIEM
attack data (MDPI, 2024). Integrating autoencoder latent that supplements static rule-based detection with ML
representations before applying Isolation Forest has been classifiers as shown in figure 3. Their system successfully
shown to reduce false alarms, demonstrating a scalable, caught evasive SQLi attempts that bypassed conventional
unsupervised detection pipeline suitable for real-time SIEM alerts by learning normalness patterns and flagging
monitoring. Semi-supervised approaches, where autoencoder anomalies, thereby reducing false negatives and improving
IDs potential anomalies and a downstream supervised response fidelity. Corporate research on ML-enhanced SIEM
classifier confirms threats, offer a cost-effective combination. systems has identified key features necessary for integration:
The unsupervised layer maintains model adaptability to feature extraction pipelines, model management within
evolving query behavior, while the classifier ensures precise SIEM, and connection to data lakes for large-volume query
labeling, making this approach desirable for security teams data analysis. These systems rely on continuous model
working with limited labeled data. retraining and feedback loops to stay effective as query
patterns evolve. Scaling SIEM with data lakes, as proposed in
 Sequence-Aware Models for Query Stream Analysis (e.g., 2024, addresses the challenge of handling large-scale DB
LSTM, GRU) audit logs and relational event streams. Integration allows
Behavioral profiling of sequences of queries requires centralized storage for model training, scalable feature
sequence-aware models to capture context and temporal extraction, and real-time classification close to the data
anomalies. Recurrent neural architectures such as LSTM and source, reducing latency. Cloud-based Next-Gen SIEM
GRU excel in this domain. In detecting SQL injection and platforms harness ML for feature normalization, UEBA, and
other web-threat streams, Stiawan et al. (2023) demonstrated automated incident response orchestration (Turkish Journal,
a composite LSTM-PCA model that reduced dimensionality 2023). These systems flexibly accommodate custom SQLi
with PCA before LSTM processing, achieving ~94% detection models as plug-in engines. By correlating
accuracy by modeling query structures over time as shown in ML-detected alerts with other sources (e.g., OS, network
figure 4. Their work highlights the importance of representing logs), they provide broader attack context—crucial for triage
queries as structured time-series rather than independent and escalation workflows. Integration requires tight coupling
events. Setiyaji and Ramli (2024) presented a CNN-BiLSTM between ML models, DBMS audit log feeds, data lake
hybrid model, initiating feature extraction with CNN layers platforms, and orchestration engines within SIEM. Achieving
and capturing sequential dependencies with BiLSTM, such synergy ensures behavioral profiling results are
improving contextual awareness of query flows. Their system actionable and preventable in real-world enterprise
effectively identified SQLi attacks when queries appeared in environments, transforming reactive alerts into proactive
specific order patterns rather than isolated anomalies. threat mitigation pipelines.
Similarly, Mohd Yazid Idris et al. (2023) integrated PCA and
LSTM in an ensemble model for SQLi and XSS detection, Figure 3 portrays an advanced cybersecurity and data
achieving high performance (~96%) using ensemble voting infrastructure ecosystem, symbolizing the seamless
mechanisms. This demonstrates the benefit of combining integration of machine learning-driven SQL injection
clustering, dimensionality reduction, and sequence detection within Database Management Systems (DBMS)
learning.Ensembles combining GRU and LSTM, as explored and Security Information and Event Management (SIEM)
by Pu et al. (2022), leverage strengths of both architectures: platforms. At the center, a glowing lock signifies the core
GRU’s computational efficiency and LSTM’s expressive security objective—protecting data integrity and system

IJISRT25AUG324 www.ijisrt.com 56
Volume 10, Issue 8, August – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25aug324
access—while interconnected digital pathways illustrate real- detection models against evolving SQL attack patterns. This
time data flows from various sources such as logs, user visual emphasizes the need for data lakes to store massive DB
behaviors, and relational query events. Embedded icons for audit logs and support scalable model inference, and
analytics, automation, user entity behavior analytics (UEBA), showcases how correlation of ML-detected anomalies with
and system orchestration reflect how modern SIEM systems network and OS logs enhances situational awareness. The
leverage ML models to automate threat detection and incident image encapsulates the synergy of cloud-native SIEM
response. The array of dashboards, charts, and connected platforms, model management frameworks, and database
devices represents continuous monitoring and feature telemetry in creating a proactive and intelligent security
extraction pipelines, essential for training and retraining architecture.

Fig 3 Picture of Machine Learning-Driven Integration of SQL Injection Detection into DBMS and SIEM Ecosystems
(Suretysystems, 2025).

Figure 4 is organized into three main branches: Core effective in identifying SQL injection attacks with pattern
Neural Architectures, Hybrid and Ensemble Models, and dependencies. Ensemble systems that combine PCA,
Application and Detection Capabilities. The first branch clustering, and LSTM use voting mechanisms to improve
outlines foundational recurrent models such as LSTM, which accuracy, while advanced combinations of GRU and LSTM
captures long-term dependencies in query sequences, GRU, with autoencoders enhance robustness in streaming
known for its computational efficiency, and BiLSTM, which environments. The third branch focuses on the application of
processes query streams bidirectionally for enhanced these models in real-world detection, emphasizing behavioral
contextual learning. The second branch details hybrid and profiling by learning normal user or session-level patterns,
ensemble approaches that combine these models with feature enabling the detection of stealthy attacks that follow subtle
extraction or dimensionality reduction techniques. For sequences. These models excel in temporal anomaly
instance, LSTM integrated with PCA effectively models detection and offer high accuracy (89–96%) for identifying
structured query timelines, while CNN-BiLSTM hybrids both overt and covert web threats, making them valuable tools
extract deep features and model sequence flows, particularly in real-time cybersecurity for SQL-based systems.

IJISRT25AUG324 www.ijisrt.com 57
Volume 10, Issue 8, August – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25aug324

Fig 4 Diagram Illustration of Structured Overview of Sequence-Aware Deep Learning Models and their Applications in Temporal
Anomaly Detection for SQL Query Streams

V. CHALLENGES, RESEARCH GAPS exploitable blind spots. Machine learning models that are not
robust to such perturbations may exhibit high false negatives,
 Evasion Techniques and Adversarial Query Generation allowing exfiltration attempts to succeed undetected. As the
One of the critical challenges in machine learning-based sophistication of evasion tactics increases, it becomes
SQL injection detection is the emergence of evasion imperative for detection systems to incorporate robust
techniques, particularly adversarial query generation. training methods that anticipate a wide range of adversarial
Attackers increasingly craft queries that closely resemble behaviors. This includes adversarial training, data
legitimate SQL commands to bypass anomaly detection augmentation, and continuous learning from near-miss
models. These adversarial queries are designed by modifying detection failures. Developing resilient models that
known malicious payloads through obfuscation, encoding, generalize across diverse adversarial strategies remains a
and query structure manipulation without altering their central goal for enhancing the reliability and effectiveness of
malicious intent. Techniques such as SQL comment injection, behavioral profiling systems in safeguarding relational
use of tautologies, whitespace variation, and nested databases against stealthy attacks.
subqueries can significantly reduce detection rates in models
trained on traditional or static patterns. Furthermore, attackers  Concept Drift and Model Adaptability
may employ query mutation strategies to test the boundaries Concept drift refers to the gradual or abrupt change in
of deployed detection systems, thereby identifying the statistical properties of SQL query patterns over time,

IJISRT25AUG324 www.ijisrt.com 58
Volume 10, Issue 8, August – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25aug324
which can significantly degrade the performance of static cross-organizational data pooling is constrained by
machine learning models. In dynamic environments where regulations. By training models locally on each organization's
user behaviors, access frequencies, and system usage evolve, SQL logs and sharing only encrypted model updates,
models trained on historical data may become obsolete, federated learning facilitates the creation of robust,
leading to increased false positives or negatives. For instance, generalized behavioral profiling systems. This decentralized
a sudden shift in query frequency during an organizational model not only enhances detection performance across
restructuring or policy update may be misclassified as diverse query distributions but also minimizes the risk of data
anomalous, even though it reflects legitimate operational leakage. However, implementing federated learning
changes. Conversely, an attacker who mimics legitimate introduces challenges such as ensuring update integrity,
access patterns could remain undetected if the model has not handling heterogeneous data distributions, and addressing
been updated to capture emerging attack vectors. Addressing communication overhead. Techniques like differential
concept drift requires implementing adaptive learning privacy, secure multiparty computation, and homomorphic
strategies such as online learning, window-based retraining, encryption are often employed to further protect sensitive
and periodic model updates using recent behavioral data. query data during transmission. Despite these challenges,
Moreover, drift detection mechanisms should be integrated federated architectures offer scalability, adaptability, and data
into the monitoring system to flag potential shifts in query sovereignty while preserving the collective intelligence
distributions that may impact model accuracy. Building needed to detect sophisticated SQL-based attacks. As cyber
flexible, context-aware models that can recalibrate to threats evolve, integrating privacy-preserving machine
evolving usage patterns without compromising detection learning frameworks becomes a strategic imperative for
precision is essential. Balancing adaptability with system organizations seeking to maintain strong security postures
stability ensures the long-term effectiveness of detection without compromising on confidentiality or compliance
mechanisms in real-world, high-variability database mandates.
environments where static assumptions are no longer viable.
RECOMMENDATIONS FOR FUTURE RESEARCH
 Explainability and Interpretability in Security Contexts
Incorporating explainability and interpretability into To advance machine learning-based detection of SQL
SQL injection detection systems is vital for gaining user trust, injection and data exfiltration, future research should focus
facilitating decision-making, and ensuring compliance with on developing adaptive, resilient, and transparent systems.
regulatory standards. Security analysts and database First, improving model robustness against adversarial queries
administrators must understand why a specific query is through techniques like adversarial training and ensemble
flagged as suspicious, particularly in environments with high learning can help mitigate evasion attempts. Second, the
accountability requirements. Black-box models, while incorporation of online learning and drift detection methods
effective in classification accuracy, often lack transparency, is crucial to address concept drift and ensure that models
making it difficult to trace the rationale behind their remain relevant in changing operational environments. Third,
decisions. This opacity can hinder incident response, root enhancing explainability through interpretable architectures
cause analysis, and model debugging efforts. By contrast, or post-hoc explanation tools will foster greater trust and
interpretable models or those enhanced with explainable AI usability among security practitioners. Additionally,
(XAI) techniques provide valuable insights into feature integrating contextual signals such as user behavior history,
importance, decision boundaries, and behavioral patterns that device metadata, and access location can enrich feature sets
triggered the alert. Visualizations of query sequences, and improve model precision. Federated learning and edge-
attention weights, or anomaly scores can help stakeholders based detection systems offer scalable and privacy-aware
validate system outputs and fine-tune detection thresholds. solutions that deserve further exploration, particularly in
Additionally, explainability aids in identifying model biases multi-tenant and cloud-based infrastructures. Benchmarking
and gaps, especially when distinguishing between benign datasets that reflect real-world adversarial conditions should
outliers and malicious queries. In mission-critical systems, be developed to standardize evaluation and support
explainability becomes a non-negotiable requirement, reproducibility. Finally, a cross-disciplinary approach
enabling human oversight and fostering collaboration involving security experts, data scientists, and legal
between machine intelligence and human judgment. professionals is essential to design ethical, compliant, and
Therefore, integrating explainability mechanisms into effective detection systems. These research directions can
detection pipelines not only enhances trust but also improves significantly contribute to fortifying relational databases
system transparency, accountability, and effectiveness in against evolving cyber threats in both enterprise and critical
managing database security threats in complex enterprise infrastructure domains.
environments.
REFERENCES
 Federated Learning and Privacy-Preserving Detection
Federated learning presents a promising approach to [1]. Adebayo, A. B., & Al-Dubai, A. Y. (2020).
SQL injection and data exfiltration detection by enabling Leveraging machine learning for secure database
collaborative model training across decentralized access: A behavioral profiling approach. Information
environments without sharing raw data. This technique is Systems, 92, 101521.
particularly relevant in privacy-sensitive contexts such as https://doi.org/10.1016/j.is.2020.101521
healthcare, finance, and government institutions, where

IJISRT25AUG324 www.ijisrt.com 59
Volume 10, Issue 8, August – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25aug324
[2]. Adekunle, F., & Zhang, T. (2024). Event-driven graphs. IEEE Transactions on Neural Networks and
frameworks for real-time intrusion detection in SQL- Learning Systems.
intensive applications. ACM Transactions on Privacy [15]. Chatterjee, M., Gupta, S., & Bera, P. (2021). Profiling
and Security, 27(1), 1–25. SQL behavior using deep learning for injection attack
https://doi.org/10.1145/3591230 detection. Computers & Security, 105, 102240.
[3]. Aggarwal, C. C., & Sathe, S. (2020). Theoretical https://doi.org/10.1016/j.cose.2021.102240
foundations and algorithms for outlier ensembles. [16]. Chen, D., & Zhao, Q. (2021). Low-latency SQL
ACM Computing Surveys, 53(6), 1–36. injection detection in distributed databases using
https://doi.org/10.1145/3398037 recurrent neural networks. Future Generation
[4]. Akhtar, S., & Farooq, M. (2020). Real-time detection Computer Systems, 117, 71–84.
of SQL anomalies using deep autoencoders and stream https://doi.org/10.1016/j.future.2020.11.014
processors. Journal of Network and Computer [17]. Chen, H., Yu, L., & Zhang, Y. (2021). Static and
Applications, 157, 102591. signature-based detection of SQL injection: A
https://doi.org/10.1016/j.jnca.2020.102591 retrospective and limitations. Journal of Information
[5]. Alghawazi, et al. (2023). SQL injection detection Security and Applications, 59, 102836.
using RNN autoencoder and LSTM. arXiv preprint. https://doi.org/10.1016/j.jisa.2021.102836
[6]. Alomari, M., & Wang, J. (2021). Deep structural [18]. Chen, L., Rao, Q., & Zhao, Y. (2020). Temporal
analysis of SQL queries for anomaly detection. IEEE sequence modeling of SQL queries for anomaly
Transactions on Dependable and Secure Computing. detection. IEEE Transactions on Information
[7]. Alshammari, R., Alwan, Z., & Alzain, M. A. (2021). Forensics and Security.
Advanced SQL injection attack detection using [19]. Corporal Machine Learning Algorithms in SIEM
behavioral features and statistical analysis. Systems for Enhanced Detection. (2023).
Computers, Materials & Continua, 66(2), 1631–1646. ResearchGate Conference Paper.
https://doi.org/10.32604/cmc.2021.013267 [20]. Demilie, W. B., & Deriba, F. G. (2022). Detection and
[8]. Altwaijry, H., & El-Alfy, E. M. (2021). Evaluation of prevention of SQL-injection attacks and developing
rule-based intrusion detection systems for SQLi compressive framework using machine learning and
vulnerabilities. IEEE Transactions on Dependable and hybrid techniques. Journal of Big Data, 9(1), 124.
Secure Computing, 18(4), 1549–1562. [21]. Falor, A., Hirani, M., Vedant, H., Mehta, P., &
https://doi.org/10.1109/TDSC.2019.2896769 Krishnan, D. (2022). A deep learning approach for
[9]. Awotiwon, B. O., Enyejo, J. O., Owolabi, F. R. A., detection of SQL injection attacks using convolutional
Babalola, I. N. O., & Olola, T. M. (2024). Addressing neural networks. In Proceedings of Data Analytics and
Supply Chain Inefficiencies to Enhance Competitive Management: ICDAM 2021 (Vol. 2, pp. 293–304).
Advantage in Low-Cost Carriers (LCCs) through Risk [22]. Garcia, R., & Watts, B. (2021). Role-aware machine
Identification and Benchmarking Applied to Air learning for insider threat detection. ACM
Australasia’s Operational Model. World Journal of Transactions on Privacy and Security.
Advanced Research and Reviews, 2024, 23(03), 355– [23]. Geeksforgeeks, (2024). Supervised and Unsupervised
370. https://wjarr.com/content/addressing-supply- learning, https://www.geeksforgeeks.org/machine-
chain-inefficiencies-enhance-competitive-advantage- learning/supervised-unsupervised-learning/
low-cost-carriers-lccs [24]. Godwins, O. P., David-Olusa, A., Ijiga, A. C., Olola,
[10]. Banerjee, A., & Roy, A. (2022). Intelligent profiling T. M., & Abdallah, S. (2024). The role of renewable
of SQL attack surfaces: A review of recent progress. and cleaner energy in achieving sustainable
Information & Computer Security, 30(4), 597–617. development goals and enhancing nutritional
https://doi.org/10.1108/ICS-11-2021-0146 outcomes: Addressing malnutrition, food security, and
[11]. Banerjee, A., & Singh, R. (2022). Metric-aware dietary quality. World Journal of Biology Pharmacy
performance evaluation in SQL-based threat detection and Health Sciences, 2024, 19(01), 118–141.
systems. Computers & Security, 115, 102620. https://wjbphs.com/sites/default/files/WJBPHS-2024-
https://doi.org/10.1016/j.cose.2022.102620 0408.pdf
[12]. Bashir, F., Rauf, A., & Shahid, A. R. (2023). A hybrid [25]. Godwins, O. P., Ochagwuba, E., Idoko, I. P., Akpa, F.
AI-based framework for behavioral anomaly detection A., Olajide, F. I., & Olatunde, T. I. (2024).
in SQL transactions. Neural Computing and Comparative analysis of disaster management
Applications, 35, 11571–11585. strategies and their impact on nutrition outcomes in the
https://doi.org/10.1007/s00521-023-08159-2 USA and Nigeria. *Business and Economics in
[13]. Bashiru, O., Ochem, C., Enyejo, L. A., Manuel, H. N. Developing Countries (BEDC)*, 2(2), 34-42.
N., & Adeoye, T. O. (2024). The crucial role of http://doi.org/10.26480/bedc.02.2024.34.42
renewable energy in achieving the sustainable [26]. Gomez, P., Sánchez, F., & Molina, J. (2022). Context-
development goals for cleaner energy. *Global augmented profiling of database queries. Journal of
Journal of Engineering and Technology Advances*, Big Data Security.
19(03), 011-036. [27]. Haque, A., & Soliman, H. (2025). A
https://doi.org/10.30574/gjeta.2024.19.3.0099 transformer-based autoencoder with Isolation Forest
[14]. Bianchi, F., Grana, M., & Rossi, C. (2021). Graph and XGBoost for malfunction and intrusion detection
neural networks for anomaly detection in SQL query

IJISRT25AUG324 www.ijisrt.com 60
Volume 10, Issue 8, August – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25aug324
in wireless sensor networks. Future Internet, 17(4), Engineering Technology and Sciences*, 11(1), 274-
164. 293.
[28]. Hussain, S., Ahmed, T., & Nazir, U. (2022). User-role [37]. Idoko, I. P., Ijiga, O. M., Enyejo, L. A., Akoh, O., &
activity profiling in relational databases. Information Ileanaju, S. (2024). Harmonizing the voices of AI:
Sciences. Exploring generative music models, voice cloning,
[29]. Ibokette, A. I., Aboi, E. J., Ijiga, A. C., Ugbane, S. I., and voice transfer for creative expression.
Odeyemi, M. O., & Umama, E. E. (2024). The impacts [38]. Idoko, I. P., Ijiga, O. M., Enyejo, L. A., Ugbane, S. I.,
of curbside feedback mechanisms on recycling Akoh, O., & Odeyemi, M. O. (2024). Exploring the
performance of households in the United States. potential of Elon Musk's proposed quantum AI: A
*World Journal of Biology Pharmacy and Health comprehensive analysis and implications. *Global
Sciences*, 17(2), 366-386. Journal of Engineering and Technology Advances*,
[30]. Ibokette., A. I. Ogundare, T. O., Danquah, E. O., 18(3), 048-065.
Anyebe, A. P., Agaba, J. A., & Olola, T. M. (2024). [39]. Igba, E., Danquah, E. O., Ukpoju, E. A., Obasa,
The impacts of emotional intelligence and IOT on J., Olola, T. M., & Enyejo, J. O. (2024). Use of
operational efficiency in manufacturing: A cross- Building Information Modeling (BIM) to Improve
cultural analysis of Nigeria and the US. Computer Construction Management in the USA. World Journal
Science & IT Research Journal P-ISSN: 2709-0043, of Advanced Research and Reviews, 2024, 23(03),
E-ISSN: 2709-0051. DOI: 10.51594/csitrj.v5i8.1464 1799–1813. https://wjarr.com/content/use-building-
[31]. Ibokette., A. I. Ogundare, T. O., Danquah, E. O., information-modeling-bim-improve-construction-
Anyebe, A. P., Agaba, J. A., & Agaba, J. A. (2024). management-usa
Optimizing maritime communication networks with [40]. Ijiga, O. M., Idoko, I. P., Ebiega, G. I., Olajide, F. I.,
virtualization, containerization and IoT to address Olatunde, T. I., & Ukaegbu, C. (2024). Harnessing
scalability and real – time data processing challenges adversarial machine learning for advanced threat
in vessel – to –shore communication. Global Journal detection: AI-driven strategies in cybersecurity risk
of Engineering and Technology Advances, 2024, assessment and fraud prevention. Open Access
20(02), 135–174. Research Journals. Volume 13,
https://gjeta.com/sites/default/files/GJETA-2024- Issue. https://doi.org/10.53022/oarjst.2024.11.1.0060
0156.pdf I
[32]. Ibrahim, M. M., & Suryani, V. (2023). Classification [41]. Integrating SIEM with Data Lakes and AI: Enhancing
of SQL injection attacks using ensemble learning Threat Detection and Response. (2024). ResearchGate
SVM and Naïve Bayes. In Proceedings of 2023 Paper.
International Conference on Data Science and Its [42]. Iqbal, W., & Naeem, M. (2024). Behavior-aware
Applications (ICODSA) (pp. 230–236). database intrusion detection: Trends and gaps. Journal
[33]. Idoko P. I., Igbede, M. A., Manuel, H. N. N., Ijiga, A. of Cybersecurity and Privacy, 4(2), 207–230.
C., Akpa, F. A., & Ukaegbu, C. (2024). Assessing https://doi.org/10.3390/jcp4020013
the impact of wheat varieties and processing methods [43]. Kamble, M. Y., Wankhade, K., & Barde, B. (2020).
on diabetes risk: A systematic review. World Journal Comparative study on SQL injection detection using
of Biology Pharmacy and Health Sciences, 2024, rule-based methods. Procedia Computer Science, 172,
18(02), 260–277. 641–648. https://doi.org/10.1016/j.procs.2020.05.088
https://wjbphs.com/sites/default/files/WJBPHS-2024- [44]. Kaushik, A., & Joshi, R. (2020). Structured feature
0286.pdf representation of SQL queries for anomaly detection.
[34]. Idoko, I. P., Igbede, M. A., Manuel, H. N. N., Adeoye, Future Generation Computer Systems, 111, 504–517.
T. O., Akpa, F. A., & Ukaegbu, C. (2024). Big data https://doi.org/10.1016/j.future.2020.05.031
and AI in employment: The dual challenge of [45]. Khan, S., & Ahmad, R. (2022). Grammar-based
workforce replacement and protecting customer normalization of SQL statements for effective
privacy in biometric data usage. *Global Journal of injection detection. International Journal of
Engineering and Technology Advances*, 19(02), 089- Information Security.
106. https://doi.org/10.30574/gjeta.2024.19.2.0080 [46]. Lee, H., Kim, D., & Park, S. (2023). Adaptive SQL
[35]. Idoko, I. P., Ijiga, O. M., Agbo, D. O., Abutu, E. P., query normalization with machine learning. ACM
Ezebuka, C. I., & Umama, E. E. (2024). Comparative Transactions on Database Systems.
analysis of Internet of Things (IOT) implementation: [47]. Li, Y., Yang, T., & Jiang, M. (2023). Adaptive
A case study of Ghana and the USA-vision, anomaly detection in streaming data using hybrid
architectural elements, and future directions. *World neural models. Journal of Artificial Intelligence
Journal of Advanced Engineering Technology and Research, 76, 231–257.
Sciences*, 11(1), 180-199. https://doi.org/10.1613/jair.1.13564
[36]. Idoko, I. P., Ijiga, O. M., Akoh, O., Agbo, D. O., [48]. Liu, F. T., Ting, K. M., & Zhou, Z. H. (2021).
Ugbane, S. I., & Umama, E. E. (2024). Empowering Isolation-based anomaly detection. ACM
sustainable power generation: The vital role of power Transactions on Knowledge Discovery from Data
electronics in California's renewable energy (TKDD), 15(3), 1–28.
transformation. *World Journal of Advanced https://doi.org/10.1145/3458446

IJISRT25AUG324 www.ijisrt.com 61
Volume 10, Issue 8, August – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25aug324
[49]. Liu, H., Guo, Q., & Li, S. (2023). Systematic review prevention techniques. Journal of Network and
of injection vulnerabilities and data leakage in cloud- Computer Applications, 195, 103217.
based databases. Future Generation Computer https://doi.org/10.1016/j.jnca.2021.103217
Systems, 145, 259–272. [63]. Setiyaji, A., & Ramli, K. (2024). A technique utilizing
https://doi.org/10.1016/j.future.2023.03.018 CNN for identification of SQL injection attacks. 2024
[50]. Liu, Y., Zhao, H., & Fan, Y. (2022). Anomaly-based ICSINTESA Conference Proceedings.
detection of SQLi using LSTM sequence learning. [64]. Sharma, P., & Desai, R. (2023). Query–table
Expert Systems with Applications, 193, 116385. interaction graphs for exfiltration detection.
https://doi.org/10.1016/j.eswa.2021.116385 Knowledge-Based Systems.
[51]. Lu, Y., Chen, X., & Fang, J. (2022). Representing [65]. Sharma, R., Dey, L., & Kumar, S. (2022). Semantic
relational queries as graphs for intrusion detection. embedding of structured query language statements
Applied Soft Computing. for intrusion detection. Knowledge-Based Systems,
[52]. Mehrotra, R., & Thakur, R. (2023). Extraction of 240, 108025.
behavioral features from SQL logs using unsupervised https://doi.org/10.1016/j.knosys.2022.108025
deep encoders. Pattern Recognition Letters, 169, 30– [66]. Singh, A., & Jang-Jaccard, J. (2022).
38. https://doi.org/10.1016/j.patrec.2023.01.015 Autoencoder-based unsupervised intrusion detection
[53]. Mohd Yazid Idris et al. (2023). An improved using multi-scale convolutional recurrent networks.
LSTM-PCA ensemble classifier for SQL injection and arXiv preprint.
XSS detection. UTM e-prints. [67]. Singh, A., & Kumar, P. (2023). LSTM-based
[54]. Onuh, J. E., Idoko, I. P., Igbede, M. A., Olajide, F. I., behavioral profiling of SQL query streams. Future
Ukaegbu, C., & Olatunde, T. I. (2024). Harnessing Generation Computer Systems.
synergy between biomedical and electrical [68]. Stiawan, D., et al. (2023). LSTM+PCA composite
engineering: A comparative analysis of healthcare model to detect SQL injection and XSS. Scientific
advancement in Nigeria and the USA. *World Journal Reports.
of Advanced Engineering Technology and Sciences*, [69]. Suretysystems, (2025). Enhance SAP System
11(2), 628-649. Security: Top Strategies for SAP SIEM Integration,
[55]. Ouyang, X., Lin, W., & Zhang, H. (2023). Online https://www.suretysystems.com/insights/enhance-
anomaly detection for relational databases using sap-system-security-top-strategies-for-sap-siem-
attention-based streaming models. Neurocomputing, integration/
522, 87–101. [70]. Tang, L., et al. (2020). Attack detection in network
https://doi.org/10.1016/j.neucom.2022.12.072 flow data using LSTM for SQL injection. International
[56]. Owolabi, F. R. A., Enyejo, J. O., Babalola, I. N. O., & Journal of Applied Engineering Research, 15(6), 569–
Olola, T. M. (2024). Overcoming engagement 580.
shortfalls and financial constraints in Small and [71]. The Future of SIEM in a Machine Learning-Driven
Medium Enterprises (SMES) social media advertising Cybersecurity. (2023). Turkish Journal of Computer
through cost-effective Instagram strategies in Lagos and Mathematics Education.
and New York City. International Journal of [72]. Uetz, R., Herzog, M., Hackländer, L., Schwarz, S., &
Management & Entrepreneurship Research P-ISSN: Henze, M. (2023). You cannot escape me: detecting
2664-3588, E-ISSN: 2664-3596. DOI: evasions of SIEM rules in enterprise networks. arXiv
10.51594/ijmer.v6i8.1462 preprint.
[57]. Patel, D., Sharma, K., & Mehta, S. (2023). Supervised [73]. Vakharia, M., & Patel, V. (2020). Benchmarking
modeling of user-based SQL activity for anomaly datasets for anomaly detection in SQL injection
detection. Computers & Security. scenarios. Journal of Cybersecurity, 6(1), taaa011.
[58]. Pu, et al. (2022). Detecting zero-day web attacks with https://doi.org/10.1093/cybsec/taaa011
an ensemble of LSTM, GRU, and stacked [74]. Wang, T., & Li, M. (2024). Context-aware detection
autoencoders. Computers, 14(6), 205. of data exfiltration via query patterns. Computers &
[59]. Qureshi, M. A., & Khan, S. (2023). Detecting data Security.
exfiltration from relational queries: A machine [75]. Web Traffic Anomaly Detection Using Isolation
learning perspective. IEEE Access, 11, 74501–74514. Forest. (2024). MDPI International Journal of Data,
https://doi.org/10.1109/ACCESS.2023.3282905 11(4), 83.
[60]. Rahman, M., Ahmed, F., & Miah, M. S. (2022). The [76]. Xu, J., & Tan, Z. (2023). A framework for benchmark
weakness of black-box SQL injection scanners in dataset creation in SQL-based attack detection using
modern web applications. Security and Privacy, 5(2), graph learning. IEEE Access, 11, 48526–48538.
e205. https://doi.org/10.1002/spy2.205 https://doi.org/10.1109/ACCESS.2023.3265270
[61]. Sabottke, C. F., & Abraham, J. (2022). Survey of [77]. Yoon, J., Park, E., & Han, S. (2024). Hybrid role-
anomaly detection for relational data using supervised based anomaly detection in enterprise queries. IEEE
and unsupervised learning. IEEE Transactions on Transactions on Software Engineering.
Knowledge and Data Engineering, 34(4), 1527–1540. [78]. Zhang, J., & Yu, W. (2021). Feature transformation
https://doi.org/10.1109/TKDE.2021.3053062 for SQL injection detection using query dependency
[62]. Sajjad, A., Nasir, Q., & Shafique, M. (2022). A graphs. Information Sciences, 569, 1–18.
taxonomy and survey of SQL injection detection and https://doi.org/10.1016/j.ins.2021.02.005

IJISRT25AUG324 www.ijisrt.com 62
Volume 10, Issue 8, August – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/25aug324
[79]. Zhang, M., & Wang, X. (2021). Comparative analysis
of evaluation metrics for SQL anomaly classifiers.
Expert Systems with Applications, 185, 115550.
https://doi.org/10.1016/j.eswa.2021.115550
[80]. Zhang, Y., Xu, L., & Li, X. (2024). Structural feature
extraction for SQL anomaly detection. Journal of
Computer Security.
[81]. Zheng, Y., Xie, T., & Xu, D. (2020). From SQL
injection to data exfiltration: Challenges and
countermeasures. IEEE Access, 8, 172495–172508.
https://doi.org/10.1109/ACCESS.2020.3025084
[82]. Zhou, W., Li, Z., & Xu, H. (2024). Graph stream
learning of SQL behaviors. Information Sciences.
[83]. Zhou, Y., Xu, Y., & Wang, C. (2021). Machine
learning for database security: A systematic review.
ACM Computing Surveys, 54(9), 1–36.
https://doi.org/10.1145/3457600

IJISRT25AUG324 www.ijisrt.com 63

You might also like