Intrusion Detection Method Based On SMOTE Transformation For Smart Grid Cybersecurity
Intrusion Detection Method Based On SMOTE Transformation For Smart Grid Cybersecurity
Abstract—Real-time Intrusion Detection Systems (IDSs) focus in NIDSs for SG network data to entrench the SG
have attracted greater attention for secured and resilient smart security infrastructure [8].
grid operations. IDSs are employed to identify unknown
cyberattacks and malware from network traffics. In this paper, From an intrusion detection viewpoint, network attacks
an efficient model-based machine learning is proposed to detect generally fall into four classes, Denial of Service attacks
a variety of cyberattacks. The proposed method enhanced (DoS), Probing attacks (Probe), Root to Local attacks (R2L),
Extremely randomized Trees (ET) classifier based on Synthetic and User to Root attacks (U2R) [3]. DoS is an attack that
Minority Oversampling Technique (SMOTE) accurately renders a machine or network unavailable to its unauthorized
classifies imbalanced IDSs data. The proposed ET-SMOTE uses users [9]. This prevents normal traffic from visiting a network
a virtue of data processing blocks to enable multi-layer network leading to massive damage such as power outages [8].
cyber-security assessment in smart grids by acquiring the Probing attack is explained by determining weaknesses, and
essential knowledge of attack dynamics. The proposed vulnerabilities of a networking device by scanning the data to
computing framework provides an accurate multiclass perturb the system. U2R is a malicious attempt in disallowing
classification of five network traffic categories: denial of service materials to get the access rights from a normal host by
attacks, probing attacks, root to local attacks, user to root gaining its permission as root to the system. R2L is
attacks, and normal. The experimental results demonstrate the characteristic that the intruder tends to gain local access to the
high accuracy of the proposed ET-SMOTE algorithm in
target’s system as a local user. Thus, the information integrity
detecting various types of cyber threats compared to benchmark
can be jeopardized [3].
models with an accuracy of 99.79% using the NSL-KDD
networks data set. In the SG era, the combination of two main networks: a
power network and a communication network made this
Keywords—Intrusion detection, multi-layer cybersecurity, cyber-physical system potentially susceptible to cyber-attacks
machine learning, network traffic, smart grid vulnerability. [10]. In fact, the communication of these different elements
makes the grid prone to attacks and compromises the
I. INTRODUCTION availability of systems. Traditional Intrusion Detection and
The integration of Information and Communications Prevention System (IDPS) based on signature and anomaly
Technologies (ICT) in Smart Grids (SG) enables techniques are insufficient and outdated to secure the grid.
communications for Internet of Things (IoT) devices and These legacy systems are deemed inadequate with the ever-
smart meters to participate in power system operations [1]. increasing cybersecurity risks [10]. This gap intensifies the
However, integrating electrical power systems with the vivid need for intrinsically embedding cybersecurity systems
burgeoning ICT technologies confronts more complicated to protect the unitality grids. Sophisticated IDS-based well-
threats to various types of cyber-attacks than ever before [2]. established Machine Learning (ML) algorithms are
Since an IoT-based SG lies on potentially millions of nodes, recognized as viable solutions to successfully classify
multi-layer cybersecurity infrastructure becomes readily network anomalies and intrusions. ML models usually
exposed to a large variety of cyberattacks and ever-rapidly- include three steps: data preprocessing, feature selection, and
evolving masquerades, leading to devastating financial and classification of attacks correctly according to the learned
economic consequences [3]. Backed by the aggregation of characteristics [11]. Knowing the attacks’ pattern can help
communication technologies, it becomes a challenging issue detect anomalies early and take corresponding preventive
to handle fast-changing cyber threats and thus the increased measures. Intelligent IDS-based soft computing ML performs
complexity of detection [4]. Therefore, Intrusion Detection an in-depth inspection of malicious activities and suspicious
Systems (IDSs), so-called defense-in-depth, brought great behaviors.
convenience to this intelligent infrastructure to prevent
catastrophic damage to power supplies and widespread power ML methods in conjunction with the intrusion detection
outages. IDS is a network-level security mechanism that context have gained considerable momentum by research
includes monitoring, discovering, determining, and scholars and technology industries. Pioneering work in [12]
identifying intended use, duplication, alteration, and applied a five-layer autoencoder-based model to solve the
destruction of ICT networks [5]. For the SG, IDS is problem of intrusion detection. The proposed model trained
essentially employed to detect malicious traffic inputs based on the Network Security Laboratory-Knowledge Discovery
on internet traffic record data [6]. IDSs are mainly classified and Data mining (NSL-KDD) dataset provided an F1-score of
into five types: host IDSs, protocol-based IDSs, application 92.26%. However, the generalizability and practicability of
protocol-based IDSs, hybrid IDSs, and Network IDSs the proposed technique are not demonstrated, especially for
(NIDSs). With the mushroom improvement of Web 3.0, the selected hyperparameter configuration. In [13], an Attack,
advanced metering infrastructure, artificial intelligence, and Bonafide, Train, Realization, and Performance (AB-TRAP)
big data technologies, the demand for efficient NIDSs is framework for ML-based NIDS has been proposed to
arguably on the rise [7]. Thus, this paper attributes a prime deliberate vandalism and unexpected damage of hackers. The
Authorized licensed use limited to: University of Florida. Downloaded on December 30,2022 at 17:40:32 UTC from IEEE Xplore. Restrictions apply.
assessment of this end-to-end framework yields great efficient oversampling technique for solving imbalanced data
reliability with F1-score of 95% for the trained attacks, but its problems [19]. The SMOTE technique synthetically increases
weak point lies in the inability of AB-TRAP to deal with the minority class from its nearest neighbors using Euclidean
unknown attacks. An improved Deep Belief Network (DBN) distance to balance the data set. This technique randomly
has been employed for IDS [14]. The DBN model is chooses the neighbors from the KNN, depending upon the
characterized by its efficiency to handle nonlinear behavior amount of data to be resampled. To synthetically increase the
[15]. Still, the DBN model is inefficient to recognize minority class, SMOTE technique uses the following
unknown cyberattacks. A multi-objective approach for longer equation [19].
model lifespans has been introduced to classify 20 TB of data
using high computational resources and CPU power to 𝐷𝑠𝑦𝑛 = 𝐷𝑖 + (𝐷𝐾𝑁𝑁 − 𝐷𝑖 ) ∗ 𝑟 (1)
manage the high volume of traffic data [16]. So the proposed With 𝐷𝑠𝑦𝑛 , 𝐷𝐾𝑁𝑁 and 𝐷𝑖 denote the synthetic data, minority
method overlooks exhausted network channels for analysis
and storage making the realization of the proposed approach samples, and sample of KNN from minority samples,
questionable. [17] capitalized on Deep Learning (DL) respectively. 𝑟 presents a random number between 0 and 1.
methods to detect malicious cyber-attacks based on The SMOTE algorithm is used to balance between the
distributed edge devices. A distributed DL system for web minority intrusion detection classes (particularly U2R and
attack detection on edge devices has been proposed and R2L classes)
assessed on three different datasets. It is worth mentioning B. Extra Trees classifier
that this model can be updated once needed to meet the
dynamic representation for URLs. Nevertheless, the The ET estimator is an optimized tree-based ensemble
aforementioned work does not provide any guidance on the method that focuses on the bias-variance tradeoff. For the
model deployment and configurations, especially with the classical ensemble trees, the decision trees generation often
large parameter search space of the residual neural leads to overfitting caused by the extreme split in the
network model. Recently, in paper [18], the authors recursive process of the decision trees. Thus, the ET employs
introduced an IDS modeling architecture in Supervisory the random splits of all observations which makes the
Control and Data Acquisition (SCADA) network-based bootstrapping more diversified [20]. The ET mechanism
power grid. The proposed system coupled recursive feature relies on aggregating the results of decorrelated decision trees
elimination using extreme gradient boosting and the majority collection to produce a classification result. The ET algorithm
rule ensemble approach. From the simulation results, the adds random thresholds for each feature to reduce overfitting.
proposed end-to-end IDS framework built from nine This makes this algorithm less computationally expensive
classifiers was found efficient for intrusion detection than the classical decision trees.
compared to individual classifiers. However, this high
performance is accompanied by increased complexity. C. Proposed method
Further, the proposed solution might be infeasible as it does The goal of this work lies in building an efficient classifier
not provide any action strategies to the findings or the capable of distinguishing between four types of attacks
following practices to harden the already vulnerable system. (probe, DoS, U2R, and R2L), and normal traffic. This work
In this article, the main contributions are threefold as incorporates the ET-SMOTE model to provide an efficient
follows: IDS. The flowchart of the proposed method is illustrated in
• First, an enhanced Extra Trees (ET) classifier is Fig.1.
developed to tackle the unbalanced classification problem for
the tree-based ensemble method.
• Second, an intrusion detection method is proposed. The
proposed model can not only detect attacks but also identify
the types of attacks. This reinforces the security and
intelligent capabilities of the proposed IDS.
• The proposed technique demonstrated its high
performance through multiple simulations on open-sourced
data portal and comprehensive comparisons with existing
benchmarks.
II. METHODS AND PROPOSED ARCHITECTURE
ML models-based IDS lie on supervised and unsupervised
techniques to identify and classify anomalies that could
encounter the SG. This section presents the employed
methods for IDS. First, the Synthetic Minority Over-sampling
Technique (SMOTE) transformation is presented. Second, the
ET mechanism is described. Last, the proposed ET-SMOTE
is introduced.
A. Synthetic Minority Over-sampling Technique Fig. 1: The full flowchart of the proposed method.
The misclassifying dilemma occurs when the number of The processing units start with data acquisition, data
samples for each class is remarkably different. SMOTE is an processing, and data split. The information is cleaned from
Authorized licensed use limited to: University of Florida. Downloaded on December 30,2022 at 17:40:32 UTC from IEEE Xplore. Restrictions apply.
missing values and outliers to avoid any model and R2L are scarce. Despite this unbalanced classification,
misunderstanding. Next, the SMOTE transformation is this is an actual reflection of the distribution of daily internet
computed to balance the data classes. Then, the model scaling- traffic attacks, where the most habitual attacks are DoS. The
based Z-score computation is conducted. One-hot-encoding is NSL-KDD is cleared of redundancy to prevent ML
done to transform the data to be recognized by the algorithms bias, which is an improvement over the KDD’99
classification model by converting categorical to numerical dataset.
features. Finally, multicollinearity removal is done to avoid B. Feature engineering
any unnecessary computing. The data is partitioned into
As ML performance is tightly closed with the data
training and testing sets. Afterward, the data is trained using
quality, the data set requires refining. The acquired data is
ET and assessed on the remaining testing set.
cleaned from infinity values and misleading samples. A
III. CASE STUDY multicollinearity threshold has been set to a 0.9 ratio [22].
Since the data has different scales, data normalization is
This section evaluates the performance of the different
essential to unify the range of data. In our problem, the data
ML methods to demonstrate the effectiveness, and
is scaled using Z-score normalization according to the
practicability of the proposed method.
following formula [12].
A. Data set
𝑥𝑖 − 𝑥
NSL-KDD is a data set created to overcome the 𝑍= (2)
𝑆𝐷
shortcomings of the KDD’99 intrusion data set such as bias
problems and duplicate records [21]. The NSL-KDD is Where 𝑍 denote the Z-score transformation, 𝑥𝑖 is the
collected in a simulated US air force base network. This data individual value. Here, 𝑥 and 𝑆𝐷 denote the mean of data
presents well-established benchmark datasets that better population and their Standard Derivation respectively. The
represent the current network environment, even though it final data is divided into three folds, 70% of the data is
devoted to training and validation. The remaining 30% are
still displays some undesirable characteristics. Indeed, the
used for testing purposes. Scikit-learn library in Python was
shortage of available IDS datasets and the complexity in used to implement the simulation results [23].
collecting the databases promotes the use of NSL-KDD the
most credible source for intrusion detection/malware C. Evaluation measures
detection topics. The set contains 125,973 training samples,
Here, The assessment of the proposed method is bi-fold:
which makes it both suitable for ML, and not overbearingly
firstly, the confusion matrix is conducted. Then, the precision
large to force researchers to pick parts of the set randomly.
The set contains 42 features listed in Table I. measures are conducted in terms of accuracy, precision,
TABLE I: THE NSL-KDD’42 FEATURES AND NUMBER OF ENTRIES.
recall, and F1-score. The mathematical equations are given
L# Features L# Features L# Features as follows [20]:
L1 Duration L15 Su. attempted L29 Same srv. rate
Protocol 𝑇𝑃 + 𝑇𝑁
L2
Type
L16 Num. root L30 Diff. srv. rate 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (3)
𝑇𝑃 + 𝐹𝑃 + 𝑇𝑁 + 𝐹𝑁
Num. file Srv. Diff. host
L3 Service L17 L31 𝑇𝑃
creation rate
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (4)
Dst. host 𝑇𝑃 + 𝐹𝑃
L4 Flag L18 Num. shells L32
count
Num. access Dst. host srv.
𝑇𝑃
L5 Src. bytes L19 L33 𝑅𝑒𝑐𝑎𝑙𝑙 = (5)
files count 𝑇𝑃 + 𝐹𝑁
Num. 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑅𝑒𝑐𝑎𝑙𝑙
Dst. host
L6 Dst. bytes L20 outbound L34
same srv. rate 𝐹 − 𝑠𝑐𝑜𝑟𝑒 = 2 × (6)
cmds 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
L7 Land L21 Is host login L35
Dst. host diff. where 𝑇𝑃 denotes True Positive, 𝑇𝑁 denotes True Negative,
srv. rate 𝐹𝑁 denotes False Negative, and 𝐹𝑁 denotes the False
Dst. host Positive.
Wrong
L8 L22 Is guest login L36 same srv. port
fragment
rate D. Results and discussion
Dst. host srv.
L9 Urgent L23 Count L37
fiff host rate Various simulations are performed using Python
Dst. host programming language (3.6 version) in a Lenovo laptop; i7,
L10 Hot L24 Srv. count L38
serror rate
9th generation, NVIDIA GeForce GTX 1650. The
Num.
L11 failed L25 Serror rate L39
Dst. host srv. hyperparameters of the proposed model and the benchmarks
serror rate are tuned using the Random Search method to yield the most
logins
L12
Logged
L26
Srv serror
L40
Dst. host effective solutions, which are close to brute force. The
in rate rerror rate assessment of the ET-SMOTE was conducted on ten-fold
Num. Dst. host srv.
L13
comp
L27 Rerror rate L41
rerror rate Cross-Validation (CV). The score errors employed to assess
Root Srv rerror our model in each one of the ten splits of CV are all based on
L14 L28 L42 Class label
shell rate the confusion matrix that each of the splits produced. The
score criteria are accuracy, precision, recall, F1-score, and
The data is saved in the Comma-Separated Values (CSV) false-positive ratio. For each one of the scores, the average
file. Particularly, the largest share of samples that occur in over the ten splits of the CV has been calculated to be as much
each data set are normal traffic, and the distribution of U2R reliable and accurate as possible in the evaluation of our
Authorized licensed use limited to: University of Florida. Downloaded on December 30,2022 at 17:40:32 UTC from IEEE Xplore. Restrictions apply.
model. After applying the SMOTE technique on the whole the worst classification results with the classification
data, the ten-fold CV results are provided in Table II. accuracy rate being 55.38%. The supremacy of the
TABLE II: CLASSIFICATION PERFORMANCE OF ON NSL-KDD DATASET. proposed SEC can also be presented through a bar graph
FN, M, AND SD PRESENT THE FOLD NUMBER, MEAN, AND STANDARD representation in Fig. 2.
DERIVATION RESPECTIVELY.
Acc Recall Prec. F1 Kappa Recall Prec.
FN
(%) (%) (%) (%) (%)
0 99.8 89.3 99.81 99.8 99.64
1 99.81 91.79 99.81 99.81 99.66
2 99.78 78.78 99.75 99.77 99.62
3 99.86 93.21 99.86 99.86 99.76
4 99.84 92.07 99.84 99.84 99.72
5 99.8 98.74 99.81 99.8 99.64
6 99.72 87.74 99.72 99.72 99.5
7 99.81 87.91 99.81 99.8 99.66
8 99.83 94.01 99.83 99.83 99.7
9 99.76 87.9 99.78 99.77 99.58
M 99.79 90.15 99.8 99.8 99.65
SD 0.04 5 0.04 0.04 0.07
The overall accuracy of our model is 99.79%, precision
equals 99.80%, recall rate is 90.15%, Kappa ratio is 99.65,
and F1-score is 99.8%. Table III presents the confusion
matrix of the proposed model and benchmarks.
TABLE III: CONFUSION MATRIX FOR THE PROPOSED MODEL. Fig. 2: Score performance comparison.
Probe Normal DoS R2L U2R As clearly seen in Fig. 2, the proposed ET-SMOTE
Probe 13771 7 0 0 0
model achieved the best accuracy, recall, and precision
Normal 2 20177 4 16 4
DoS 0 19 3478 0 0 results. Table V presents the confusion matrixes of the
R2L 0 11 0 287 0 proposed model compared to benchmarks. From this table,
U2R 0 10 0 0 5 the promising results of the ET-SMOTE reflect a better
According to Table III, the proposed ET-SMOTE model classification detection effect than the rest of the models.
can perform excellently in an extremely unbalanced dataset.
The proposed model is compared to several benchmarks
From the total number of probe attacks instances,13771
instances are accurately classified as true positives and 2 true from the literature. Specifically, the ET-SMOTE is
negatives. For the normal cases, the proposed model evaluated with Self Organization Map (SOM) [24],
misclassifies only 47 instances. In resume, the proposed Bidirectional Long Short Term Memory (BiLSTM) [25],
classifier is found efficient in classifying proble, DoS R2L Convolutional Neural Network-Bidirectional Short Term
attacks from the normal attacks. However, It can be remarked Memory (CNN-BiLSTM) [25], and Naive Bayes [26]. For
that U2R is more difficult to be detected than the other better visualization of the accuracy scores, Fig 3 is
attacks. The detection accuracy of U2R attacks decreased presented.
due to the low number of samples used for training. This
TABLE V: CONFUSION METRIC OF VARIOUS PREDICTION MODELS.
reveals that the minority class samples of the NSL-KDD are
Model Probe Normal DoS R2L U2R
still lacking, and its samples are out-of-date. In order to verify Probe 13354 421 3 0 0
the competitiveness of the proposed model, a series of tests Normal 39 19672 306 32 154
have been conducted. The proposed model is compared to MLP DoS 13 168 3314 0 2
single models, including Ridge classifier, Multi-Layer R2L 0 143 11 124 20
Perception (MLP), Support Vector Machines (SVM), and U2R 0 14 2 0 0
Linear Discriminant Analysis (LDA) were simulated and Probe 13140 536 102 0 0
compared to the proposed model. Note that all the benchmark Normal 210 19524 469 0 0
techniques are conducted through simulations. Table IV Ridge DoS 94 503 2900 0 0
R2L 8 287 3 0 0
resumed the statistics results of different models in detecting
U2R 0 14 2 0 0
the cyber attacks Probe 13097 535 118 5 23
TABLE IV: COMPARATIVE STUDY OF VARIOUS PREDICTION MODELS. Normal 170 19265 539 218 11
Accuracy Recall Precision F1 score LDA DoS 66 466 2963 1 1
Model
(%) (%) (%) (%) R2L 5 124 1 166 2
ET- U2R 0 7 0 6 3
99.79 90.15 99.8 99.8
SMOTE Probe 6928 1184 6 0 5660
MLP 95.87 65.74 96.39 96.03 Normal 125 18914 720 37 407
Ridge 94.22 55.12 93.47 93.83 SVC DoS 657 1319 330 2 1189
LDA 93.97 68.64 94.25 94.09 R2L 3 278 9 8 8
SVM 55.38 31.1 67.63 58.64 U2R 0 13 0 3 3
Probe 13771 7 0 0 0
Regarding Table IV, the proposed model is clearly
Normal 2 20177 4 16 4
outperforming the rest of the models. The proposed method ET-
DoS 0 19 3478 0 0
SMOTE
achieved more than 4% higher accuracy when compared to R2L 0 11 0 287 0
Ridge classifier. The closest performance to the proposed U2R 0 10 0 0 5
technique is conducted by the MLP model, which achieved
95.87% of overall accuracy. The SVM classifier achieves
Authorized licensed use limited to: University of Florida. Downloaded on December 30,2022 at 17:40:32 UTC from IEEE Xplore. Restrictions apply.
[2] H. Karimipour, A. Dehghantanha, R. M. Parizi, K.-K. R. Choo, and
H. Leung, “A deep and scalable unsupervised machine learning
system for cyber-attack detection in large-scale smart grids,” IEEE
Access, vol. 7, pp. 80778–80788, 2019.
[3] G. Fernandes, J. J. P. C. Rodrigues, L. F. Carvalho, J. F. Al-
Muhtadi, and M. L. Proença, “A comprehensive survey on network
anomaly detection,” Telecommunication Systems, vol. 70, no. 3,
pp. 447–489, 2019, doi: 10.1007/s11235-018-0475-8.
[4] A. Ameli, A. Hooshyar, E. F. El-Saadany, and A. M. Youssef,
“Attack Detection and Identification for Automatic Generation
Control Systems,” IEEE Transactions on Power Systems, vol. 33,
no. 5, pp. 4760–4774, 2018, doi: 10.1109/TPWRS.2018.2810161.
[5] A. L. Buczak and E. Guven, “A Survey of Data Mining and
Machine Learning Methods for Cyber Security Intrusion
Detection,” IEEE Communications Surveys and Tutorials, vol. 18,
no. 2, pp. 1153–1176, 2016, doi: 10.1109/COMST.2015.2494502.
[6] C. C. Sun, D. J. Sebastian Cardenas, A. Hahn, and C. C. Liu,
Fig. 3: Radar plot for the simulated models based on accuracy score. “Intrusion Detection for Cybersecurity of Smart Meters,” IEEE
According to Fig. 3, it is clearly shown that the proposed Transactions on Smart Grid, vol. 12, no. 1, pp. 612–622, 2021, doi:
10.1109/TSG.2020.3010230.
technique has the highest performance metrics. In order [7] Q. Liu, V. Hagenmeyer, and H. B. Keller, “A Review of Rule
to show the computational timing, Table VI resumes the Learning-Based Intrusion Detection Systems and Their Prospects
training and testing time of the proposed model and in Smart Grids,” IEEE Access, vol. 9, pp. 57542–57564, 2021, doi:
10.1109/ACCESS.2021.3071263.
benchmarks. [8] P. I. Radoglou-Grammatikis and P. G. Sarigiannidis, “Securing the
TABLE VI: COMPUTATIONAL TIME (SECOND) FOR CLASSIFICATION. Smart Grid: A Comprehensive Compilation of Intrusion Detection
and Prevention Systems,” IEEE Access, vol. 7, pp. 46595–46620,
Model MLP SVC LDA Ridge ET-SMOTE 2019, doi: 10.1109/ACCESS.2019.2909807.
Time (s) 2351 102 108 58 1018 [9] A. Huseinović, S. Mrdović, K. Bicakci, and S. Uludag, “A survey
of denial-of-service attacks and solutions in the smart grid,” IEEE
According to Table VI, times spent for 10-CV training are Access, vol. 8, pp. 177447–177470, 2020, doi:
2351s (MLP), 102s (SVC), 108s (LDA), 58s (Ridge), and 10.1109/ACCESS.2020.3026923.
[10] M. Z. Gunduz and R. Das, “Cyber-security on smart grid: Threats
1018s (ET-SMOTE). The Ridge regression model requires a and potential solutions,” Computer Networks, vol. 169, p. 107094,
considerably shorter training time (58 seconds). The 2020, doi: 10.1016/j.comnet.2019.107094.
introduced ET-SMOTE model has a much longer training [11] H. Hindy et al., “A Taxonomy of Network Threats and the Effect
time compared to SVM, LDA, and Ridge models. Therefore, of Current Datasets on Intrusion Detection Systems,” IEEE Access,
vol. 8, pp. 104650–104675, 2020, doi:
the most significant shortcoming of the proposed model is 10.1109/ACCESS.2020.3000179.
resumed in itscomputational burden. However, this burden is [12] W. Xu, J. Jang-Jaccard, A. Singh, Y. Wei, and F. Sabrina,
compensated by its high efficiency and effectiveness in “Improving Performance of Autoencoder-Based Network
Anomaly Detection on NSL-KDD Dataset,” IEEE Access, vol. 9,
network intrusion detection, which can be helpful in various pp. 140136–140146, 2021, doi: 10.1109/ACCESS.2021.3116612.
power system aspects. [13] G. de Carvalho Bertoli et al., “An End-to-End Framework for
IV. CONCLUSION Machine Learning-Based Network Intrusion Detection System,”
This study proposed an efficient ML methodology that IEEE Access, vol. 9, pp. 106790–106805, 2021, doi:
10.1109/ACCESS.2021.3101188.
is well suited for use to classify intrusions from normal [14] Z. Wang, Y. Zeng, Y. Liu, and D. Li, “Deep Belief Network
activities. The proposed model employs ET-SMOTE model. Integrating Improved Kernel-Based Extreme Learning Machine for
To verify the proposed technique’s accuracy and feasibility, Network Intrusion Detection,” IEEE Access, vol. 9, pp. 16062–
16091, 2021, doi: 10.1109/ACCESS.2021.3051074.
a NSL-KDD database is used for training and testing. The [15] M. Massaoudi, H. Abu-Rub, S. S. Refaat, M. Trabelsi, I. Chihi, and
numerical results demonstrate the high accuracy of the F. S. Oueslati, “Enhanced Deep Belief Network Based on
proposed classifier. The efficacy of SMOTE to improve the Ensemble Learning and Tree-Structured of Parzen Estimators: An
Optimal Photovoltaic Power Forecasting Method,” IEEE Access,
ET model performance of the proposed classifier has been vol. 9, pp. 150330–150344, 2021, doi:
demonstrated with an accuracy of 99.79%. The proposed 10.1109/ACCESS.2021.3125895.
model is perfectly tailored to perform IDS detection and [16] E. Viegas, A. O. Santin, and V. Abreu, “Machine Learning
classification. However, this model requires more Intrusion Detection in Big Data Era: A Multi-Objective Approach
for Longer Model Lifespans,” IEEE Transactions on Network
investigations on other datasets to validate its generalization Science and Engineering, vol. 8, no. 1, pp. 366–376, 2021, doi:
performance. Future directions of this work may include 10.1109/TNSE.2020.3038618.
analyzing and evaluating the proposed technique with other [17] Z. Tian, C. Luo, J. Qiu, X. Du, and M. Guizani, “A Distributed
Deep Learning System for Web Attack Detection on Edge
databases and applications. Devices,” IEEE Transactions on Industrial Informatics, vol. 16,
ACKNOWLEDGMENT no. 3, pp. 1963–1971, 2020, doi: 10.1109/TII.2019.2938778.
This publication was made possible by NPRP grant [18] D. Upadhyay, J. Manero, M. Zaman, and S. Sampalli, “Intrusion
[NPRP12C-33905-SP-213] from the Qatar National Detection in SCADA Based Power Grids: Recursive Feature
Elimination Model with Majority Vote Ensemble Algorithm,”
Research Fund (a member of Qatar Foundation). The IEEE Transactions on Network Science and Engineering, vol. 8,
statements made herein are solely the responsibility of the no. 3, pp. 2559–2574, 2021, doi: 10.1109/TNSE.2021.3099371.
authors. [19] N. v. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegzlmzyer,
“SMOTE: Synthetic minority over-sampling technique,” Journal
REFERENCES of Artificial Intelligence Research, vol. 16, no. 1, pp. 321–357,
[1] O. Ellabban, H. Abu-Rub, and F. Blaabjerg, “Renewable energy 2002, doi: 10.1002/eap.2043.
resources: Current status, future prospects and their enabling [20] M. Massaoudi, H. Abu-Rub, S. S. Refaat, I. Chihi, and F. S.
technology,” Renewable and Sustainable Energy Reviews, vol. 39, Oueslati, “An effective ensemble learning approach-based grid
pp. 748–764, 2014.
Authorized licensed use limited to: University of Florida. Downloaded on December 30,2022 at 17:40:32 UTC from IEEE Xplore. Restrictions apply.
stability assessment and classification,” in IEEE Kansas Power and
Energy Conference (KPEC), 2021, pp. 1–6.
[21] “NSL_KDD dataset.” http://www.unb.ca/cic/datasets/nsl.html
(accessed Feb. 18, 2022).
[22] A. S. Fotheringham and T. M. Oshan, “Geographically weighted
regression and multicollinearity: dispelling the myth,” Journal of
Geographical Systems, vol. 18, no. 4, pp. 303–329, 2016, doi:
10.1007/s10109-016-0239-5.
[23] F. Pedregosa et al., “Scikit-learn: Machine learning in Python,” the
Journal of machine Learning research, vol. 12, pp. 2825–2830,
2011.
[24] L. M. Ibrahim, D. B. Taha, and M. S. Mahmod, “A comparison
study for intrusion database (KDD99, NSL-KDD) based on self
organization map (SOM) artificial neural network,” Journal of
Engineering Science and Technology, vol. 8, no. 1, pp. 107–119,
2013.
[25] K. Jiang, W. Wang, A. Wang, and H. Wu, “Network Intrusion
Detection Combined Hybrid Sampling with Deep Hierarchical
Network,” IEEE Access, vol. 8, no. 3, pp. 32464–32476, 2020, doi:
10.1109/ACCESS.2020.2973730.
[26] K. Yang, J. Liu, C. Zhang, and Y. Fang, “Adversarial Examples
Against the Deep Learning Based Network Intrusion Detection
Systems,” Proceedings - IEEE Military Communications
Conference MILCOM, vol. 2019-Octob, pp. 559–564, 2019, doi:
10.1109/MILCOM.2018.8599759.
Authorized licensed use limited to: University of Florida. Downloaded on December 30,2022 at 17:40:32 UTC from IEEE Xplore. Restrictions apply.