0% found this document useful (0 votes)
73 views4 pages

Cyber1 Power System

The document discusses using ensemble learning methods to detect cyber attacks on power systems. It evaluates various ensemble learning techniques on 15 power system attack datasets to determine their effectiveness at classification. The datasets contain measurements and logs from a model power system with around 5000 instances and 129 features representing different types of events labeled as attacks or normal operations.

Uploaded by

Moriwam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views4 pages

Cyber1 Power System

The document discusses using ensemble learning methods to detect cyber attacks on power systems. It evaluates various ensemble learning techniques on 15 power system attack datasets to determine their effectiveness at classification. The datasets contain measurements and logs from a model power system with around 5000 instances and 129 features representing different types of events labeled as attacks or normal operations.

Uploaded by

Moriwam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

2018 the 3rd IEEE International Conference on Cloud Computing and Big Data Analysis

Ensemble Learning Methods for Power System Cyber-Attack Detection

Xiayang Chen, Lei Zhang, Yi Liu, Chaojing Tang


College of Electronic Science, National University of Defense Technology
Changsha, China
e-mail: [email protected]

Abstract—Power system is one of the most important industrial methodology of ensemble learning methods. In Section 4, we
control systems in today's society. In recent years, power demonstrate our results. And finally, Section 5 shows
systems have been well researched and developed extensively conclusions.
with a high rate. In order to optimally integrate systems and
reduce costs, lots of advanced information technologies are II. RELATED WORK
involved into power systems. Traditional power system is
Ensemble learning has distinguished itself as an
changing to the smart power grid rapidly. Therefore, modern
outstanding detector of malicious and anomalous events in
power systems are now exposing to the public network and
information security is becoming a new threat to resilience. In intrusion detection for traditional computer systems and
this work, we explore the suitability of ensemble learning networks, since it has better performance in comparison with
methods as a means of detecting power system cyber-attack. single machine learning method [2]. It is deployed by
We evaluate various ensemble learning methods as cyber- incorporating several base classifiers to produce final class
attack detectors and discuss the practical implications for output.
deploying ensemble learning methods as an enhancement to Many previous researchers have applied classifier
existing power system architectures. ensemble to Intrusion Detect Systems (IDSs). An ensemble
classifier based on majority voting which consisted of three
Keywords-ensemble learning; power system; industrial base classifiers, i.e. neural network, support vector machine,
control system; intrusion detect; cyber-attack detection and multivariate regression splines were proposed for
anomaly detection in paper [3]. The performance of the
I. INTRODUCTION proposed approach was evaluated on the KDDCup99 dataset.
Feature selection also was applied to reduce the
The main task of power system is transmitting of computational overhead in this work. Ensemble of decision
electricity to the customers. Power systems have been tree and support vector machine using weighted ensemble
designed with the fault tolerance mechanisms and approach was discussed in [4]. The proposed approach was
redundancy to perform this task, but information security implemented on the full features set of KDDCup99 dataset.
was not a consideration at the time. As formerly physically Accuracy was acted as performance evaluation in this work.
isolated power systems were joined to the Internet for In paper [5], a classifier ensemble, called Adaboost, was
integrated control and management, it bring about a greater applied to improve the performance of decision stump. False
potential for unauthorized access and uncovered these alarm rate and precision were used to evaluate the proposed
systems to the same vulnerabilities that trouble traditional method. Their evaluation was on the reduced-features of
computer systems and networks. KDDCup99 dataset. A product rule combination was
In this work, we explore the suitability of ensemble demonstrated by [6]. It was applied as the combiner to
learning methods as a means of detecting power system predict final class prediction. The area under ROC curve
cyber-attack. We evaluate the classification performance of (AUC) was used as a performance evaluation metric. This
various ensemble learning methods. They are evaluated in proposed scheme was also evaluated on the KDDCup99
the terms of classification accuracy, precision, recall and F- dataset. The paper [7] proposed a tree-based classifiers
measure. 15 power system attack datasets are used in this ensemble with hybrid feature selections for intrusion
paper to evaluate those ensemble learning methods. These detection systems. Three feature selection techniques, i.e.
datasets are collected from a model power system PSO, ACO, and GA were involved aiming at obtaining the
constructed by Mississippi State University and Oak Ridge best subset of features. Moreover, four tree-based classifier
National Laboratory [1]. Each dataset has about five algorithms, i.e. RF, NBT, LMT, and REPT were grouped for
thousand instances. The datasets contains 129 features. classification analysis. This proposed scheme was evaluated
These features consist of 116 columns for phasor on the NSL-KDD dataset.
measurements of electrical waves, 12 columns for control In recent years, the emergence of power system has
panel logs, Snort alerts and relay logs and the one column for motivated research into many kinds of intrusion detection
label. In total, 6 kinds of event scenarios are recorded in the techniques. One type of IDS research focuses on Intelligent
datasets. To simple the experiment, we grouped these event Electronic Devices (IED) security within power system. An
scenarios as either an attack or normal operations. anomaly-based detection technique for IED was proposed by
The remainder of this paper is organized as follows. Chee-Wooi Ten et al. in [8]. This detection technique was
Section 2 is related work. Section 3 discusses our

978-1-5386-4301-3/18/$31.00 ©2018 IEEE 613


host-based. Therefore, it only identified attacks against a OneR. OneR is a very simplistic method that evaluates
single IED in the substation according to the log from that each feature optimum rule and chooses the best one from all
IED. Another IDS which provided a protection mechanism feature rule sets [14].
for smart household appliances was proposed by Chen et al. Bayesnet. This method is designed for pattern
in [9]. This IDS created security rules for individual classification based on the Bayesian decision rule with neural
appliances via modeling three factors of the appliance, network architecture [15]. It can learn the probability density
device security, usability and electricity pricing, with functions (PDFs) of individual pattern classes from a set of
homogeneous functions. More IDSs of this kind obtain learning samples.
system level detection based on behaviors of multiple C4.5. This is a decision support approach which uses a
devices within the system. In [10], the behaviors of three tree-like graph or model of decisions and their possible
kinds of physical devices, head-ends, data aggregation points consequences [16]. It is a development of earlier ID3
distribution access points and subscriber energy meters, were algorithm [17] and builds a decision tree based on
taken into consideration to construct a specification-based information gain ratio.
IDS for the electric grid. In this paper, readings from 22 SVM. Support vector machines [18] trained using
sensors of these devices were used as state components. sequential minimal optimization [19]. SVM classifies
These components were quantized into a limited number of instances by constructing a hyperplane or set of hyperplanes
ranges. Three state machines with 3456, 1728, and 3456 in a high-dimensional space. New instances are then divided
states were built respectively for the three devices in the into a class based on their position in that space relative to
terms of conjunctive normal form. Obviously, it’s very the hyperplanes.
expensive to build such IDS’s because of the large state Ripper. Incremental Reduced Error Pruning algorithm
space. In addition, this IDS can only detect a small number that applies a separate-and-conquer methodology developed
of attacks by using a limited number of sensors. in [20] and improved by Cohen as shown in [21] to generate
Another type of IDSs for power system makes use of a sophisticated rule set.
communication traffic in the information infrastructure to
detect cyber-attacks. Yang et al. proposed an IDS in [11] for B. The Ensemble Learning Methods Used for Detection
synchrophasor systems. It detected cyber-attacks based on Adaptive boosting. This is an algorithm applied to
access control white lists, protocol-based white lists and improve the performance of other types of learning methods
network behavior-based rules. However, this IDS was [22]. It is an ensemble learning method in which each new
limited to cyber-attacks such as Man-in-the-Middle (MITM) model instance focuses on training examples which were
and Denial of Service (DoS). Similar to Yang’s IDS, Zhang classified in correctly in the previous models.
et al. in [12] proposed a distributed IDS which analyzed Bagging. It is also called random subspace method. It is
communications traffic at different network levels of smart an ensemble learning method which aims at reducing the
grid. These network levels included home area networks, correlation between estimators in an ensemble via training
neighborhood area networks, and wide area networks. An them on random samples of features instead of the entire
intelligent module was applied to each level to classify feature set [23].
possible cyber-attacks and malicious data in the utilization of Majority voting. In this paper, every base classifier
data mining algorithms. Then a system level view of the mentioned above are put together to votes for one class label,
whole communication network was constructed by and the final output class label is the one that receives more
communicating of these modules. The construction of this than half of the votes, otherwise a rejection option is given.
system level view was aimed at improving the detection Random forest. Random forest operates by constructing
accuracy. An anomaly detection technique for industrial a multitude of decision trees at training time and outputting
control systems was proposed by Hadeli et al. in [13], which the final result based on the consideration of all the trees’
extracted behavior patterns from protocols used in industrial results.
control systems, such as GOOSE messages, Modbus/TCP,
C. The Experiment Setup
Manufacturing Message Specification, IEEE 61850 and
redundant network routing protocols. The experiment is operated on a computer with Windows
10, 32GB RAM, and Intel core i7 CPU 2.6GHz. The overall
III. METHODOLGY performances of classifiers are evaluated in Java
environment using Waikato Environment for Knowledge
A. The Base Learning Methods Used for Detection Analysis (Weka) [24]. Bayesnet, OneR, Libsvm, JRip, J48,
We select 5 representative learning methods as base AdaboostM1, Bagging, Vote and RandomForest are adopted
methods, which are widely applied in classification problems. as implementations of the bayesnet, OneR, SVM, Ripper,
They are Bayesnet, OneR, Ripper, C4.5 and SVM. These 5 C4.5, adaptive boosting, bagging, majority voting and
classifiers can be grouped under four main categories: random forest in the Weka respectively. All the classifiers
1. Probabilistic classification (Bayesnet) referred above use default parameters of Weka.
2. Rule induction (OneR, Ripper) In total 17 learning methods are tested on power system
3. Decision tree learning (C4.5) attack datasets [1]. The 3x cross validation methodology is
4. Non-probabilistic binary classification (SVM) used in our experiments. Each dataset is partitioned into 3
sets randomly. The model is built on a two-thirds selection

614
from the dataset and tested on the remaining one third to class such as cyber-attack. It is similar to accuracy that
evaluate the learning methods performance. To reduce the ensemble methods have a higher average precision value
disturbance of randomness, this procedure repeats for each than basic learning methods. For precision, three methods
learning methods and each dataset for 5 times. based on SVM (basic, bagging and adaptive boosting SVM)
score the highest than others.
IV. RESULTS

A. Performance Metrics
Four metrics are used to evaluate the proposed approach.
They are average accuracy, the precision, the recall and the
F-score metrics. The four metrics used are defined as follows:
TP  TN
average accuracy (1)
TP  FP  FN  TN
2 recall precision
F  score (2)
recall  precision
precision TP (TP  FP) (3)
recall TP (TP  FN ) (4)
Figure 2. The average precision
TP is the number of instances correctly identified as
normal class, TN is the number of instances correctly
identified as abnormal class, FP is the number of instances
which are incorrectly marked as normal class and FN is the
number of instances that are incorrectly marked as abnormal
class. Precision represents the proportion of relevant
instances among the retrieved instances, while recall is the
proportion of relevant instances that have been retrieved over
the total amount of relevant instances. F-score is a good
metric to evaluate the performance of anomaly detection
approaches, which combined precision and recall in a
harmonic way.
B. Ensemble Learning Method Evaluation
Figure 1 shows the classification accuracy average over Figure 3. The average recall
the 15 datasets for 17 different algorithms. There are 5 basic
machine learning methods, 5 ensemble learning methods
based on bagging methods, 5 ensemble learning based on
adaptive boosting, a majority voting based on 5 basic
learning methods and a random forest method in the figure.
Figure 1 tells us that ensemble methods can help to obtain
better performances than basic learning methods in terms of
accuracy average.

Figure 4. The average F-score

Figure 3 shows results for averaged recall. As recall


reflects true positive rate, this evaluation identifies the
learning methods that detected cyber-attacks most
successfully. On the contrary to the precision, SVM
Figure 1. The average accuracy performs significantly worse in recall. The high precision
values coupled with the low recall values for some learning
Figure 2 demonstrates the precision value of the various methods indicate that methods are bias towards one class.
learning methods averaged over the 15 datasets. As the That means SVM may correctly classify malicious power
measure of positive prediction rate, precision provides a system disturbances, but at the cost of a disproportionate
sense of the false positive values when predicting for specific

615
amount of false negative values. Classifiers like this would Cybernetics Part B Cybernetics A Publication of the IEEE Systems
Man and Cybernetics Society, 38(2):577, 2008.
not be reliable. This conclusion is also shown in figure 4
[6] Jo Cabrera, O B. D, Guti, Carlos Rrez, and Raman KMehra.
where three SVM methods have low F-scores. Ensemble methods for anomaly detection and distributed intrusion
The F-score is displayed in Figure 4. It intrinsically detection in mobile ad-hoc networks. Information Fusion, 9(1):96–
displays classification performance in terms of both 119, 2008.
precision and recall. As expected, those learners that [7] Bayu Adhi Tama and Kyung Hyune Rhee. Hfste: Hybrid feature
performed better in terms of both precision and recall have selections and tree-based classifiers ensemble for intrusion detection
system. Ieice Transactions on Information and Systems,
the higher F-score. From Figure 4 we can see that the E100.D(8):1729–1737, 2017.
random forest performs the best. Except the bayesnet, [8] Chee Wooi Ten, Junho Hong, and Chen Ching Liu. Anomaly
ensemble methods based on bagging and adaptive boosting detection for cybersecurity of the substations. IEEE Transactions on
perform better than the basic methods. Smart Grid, 5(4):1643–1653, 2014.
According to the results of our experiments, it can be [9] Yuxin Chen and Bo Luo. S2a:secure smart household appliances. In
concluded that ensemble learning is effective enhancement ACM Conference on Data & Application Security & Privacy, pages
217–228, 2012.
to basic learning methods and it is available approach to
[10] Robert Mitchell and Ing Ray Chen. Behavior-rule based intrusion
providing reliable detection results for cyber-attack in detection systems for safety critical smart grid applications. IEEE
industrial control system (ICS) environment such as power Transactions on Smart Grid, 4(3):1254–1263, 2013.
system. [11] Y. Yang, K. Mclaughlin, S. Sezer, T. Littler, B. Pranggono, P. Brogan,
and H. F. Wang. Intrusion detection system for network security in
V. CONCLUSION synchrophasor systems. In Iet International Conference on
Information and Communications Technologies, pages 246–252,
The classification approaches to machine learning are 2013.
still not widely used in ICS as an intrusion detection system [12] Yichi Zhang, Lingfeng Wang, Weiqing Sun, Robert C. Green Ii, and
[25]. Especially, using ensemble learning methods in an ICS Mansoor Alam. Distributed intrusion detection system in a multi-
environment is a relatively new topic. According to the layer network architecture of smart grids. IEEE Transactions on
results of applying ensemble learning methods to these Smart Grid, 2(4):796–808, 2011.
power system datasets, it can be concluded that ensemble [13] Hadeli Hadeli, Ragnar Schierholz, Markus Braendle, and Cristian
Tuduce. Leveraging determinism in industrial control systems for
learning is available approach to providing reliable decision advanced anomaly detection and reliable security configuration. In
support to power system operators on whether the system is IEEE International Conference on Emerging Technologies & Factory
under attack. Automation, pages 1189–1196, 2009.
Despite these results, we consider that further work is [14] Robert C. Holte. Very simple classification rules perform well on
necessary to make ensemble learning systems deployable in most commonly used datasets. Machine Learning, 11(1):63–90, 1993.
an operation environment. It is necessary for these results to [15] Sukhan Lee and S Shimoji. Bayesnet: Bayesian classification network
be tested on a broader set of power system data with a wider based on biased random competition using gaussian kernels. In IEEE
International Conference on Neural Networks, pages 1354–1359
variety of classification schemes, learning approaches, and vol.3, 1993.
amounts of labeled data. This work can be treated as an
[16] J. Ross Quinlan. C4.5: programs for machine learning. 1, 1993.
initial set of evidence for the application of ensemble
[17] J. R. Quinlan. Induction of decision trees. Machine Learning,
learning methods in ICS environment and motivation for 1(1):81–106, 1986.
further research. [18] John C. Platt. Sequential minimal optimization: A fast algorithm for
training support vector machines. 1998.
ACKNOWLEDGMENT
[19] Corinna Cortes and Vladimir Vapnik. Support-vector networks.
Our search is supported by the National Science Machine Learning, 20(3):273–297, 1995.
Foundation of China (Project No. 61672527). [20] Johannes Frnkranz and Gerhard Widmer. Incrementalreduced error
pruning. Machine Learning Proceedings, pages 70–77, 1994.
REFERENCES [21] William W. Cohen. Fast effective rule induction. Machine Learning
[1] Thomas H. Morris and Wei Gao. Industrial control system cyber Proceedings 1995, 46(2):115–123, 1995.
attacks. In International Symposium on ICS and Scada Cyber [22] Yoav Freund and Robert E Schapire. A decision-theoretic
Security Research, pages 22–29, 2013. generalization of on-line learning and an application to boosting. In
[2] Bayu Adhi Tama and Kyung Hyune Rhee. A combination of pso- European Conference on Computational Learning Theory, pages 23–
based feature selection and tree-based classifiers ensemble for 37, 1995.
intrusion detection systems. Advances in Computer Science and [23] Leo Breiman. Bagging predictors. Machine Learning, 24(2):123–140,
Ubiquitous Computing, pages 489–495, 2015. 1996.
[3] Srinivas Mukkamala, Andrew H. Sung, and Ajith Abraham. Intrusion [24] Muamer N. Mohammad, Norrozila Sulaiman, and Osama Abdulkarim
detection using an ensemble of intelligent paradigms. Journal of Muhsin. A novel intrusion detection system by using intelligent data
Network and Computer Applications, 28(2):167–182, 2005. mining in weka environment. Procedia Computer Science, 3(1):1237–
[4] Sandhya Peddabachigari, Ajith Abraham, Crina Grosan, and Johnson 1242, 2011.
Thomas. Modeling intrusion detection system using hybrid intelligent [25] Raymond C. Borges Hink, Justin M. Beaver, Mark A. Buckner,
systems. Journal of Jiangxi University of Science and Technology, Tommy Morris, Uttam Adhikari, and Shengyi Pan. Machine learning
30(1):114–132, 2007. for power system disturbance and cyber-attack discrimination. In
[5] W. Hu, W. Hu, and S Maybank. Adaboost-based algorithm for International Symposium on Resilient Control Systems, pages 1–8,
network intrusion detection. IEEE Transactions on Systems Man and 2014.

616

You might also like