Machine Learning For Cyber Physical System by Janmenjoy Nayak
Machine Learning For Cyber Physical System by Janmenjoy Nayak
Janmenjoy Nayak
Bighnaraj Naik
Vimal S.
Margarita Favorskaya Editors
Machine Learning
for Cyber Physical
System: Advances
and Challenges
Intelligent Systems Reference Library
Volume 60
Series Editors
Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
Lakhmi C. Jain, KES International, Shoreham-by-Sea, UK
The aim of this series is to publish a Reference Library, including novel advances
and developments in all aspects of Intelligent Systems in an easily accessible and
well structured form. The series includes reference works, handbooks, compendia,
textbooks, well-structured monographs, dictionaries, and encyclopedias. It contains
well integrated knowledge and current information in the field of Intelligent Systems.
The series covers the theory, applications, and design methods of Intelligent Systems.
Virtually all disciplines such as engineering, computer science, avionics, business,
e-commerce, environment, healthcare, physics and life science are included. The list
of topics spans all the areas of modern intelligent systems such as: Ambient intelli-
gence, Computational intelligence, Social intelligence, Computational neuroscience,
Artificial life, Virtual society, Cognitive systems, DNA and immunity-based systems,
e-Learning and teaching, Human-centred computing and Machine ethics, Intelligent
control, Intelligent data analysis, Knowledge-based paradigms, Knowledge manage-
ment, Intelligent agents, Intelligent decision making, Intelligent network security,
Interactive entertainment, Learning paradigms, Recommender systems, Robotics
and Mechatronics including human-machine teaming, Self-organizing and adap-
tive systems, Soft computing including Neural systems, Fuzzy systems, Evolu-
tionary computing and the Fusion of these paradigms, Perception and Vision, Web
intelligence and Multimedia.
Indexed by SCOPUS, DBLP, zbMATH, SCImago.
All books published in the series are submitted for consideration in Web of Science.
Janmenjoy Nayak · Bighnaraj Naik · Vimal S. ·
Margarita Favorskaya
Editors
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
The hustle of new developments in the Cyber-Physical System poses new challenges
for data scientists and business delegates to influence smarter perception; challenging
a real-time dashboard of information extracted from data in movement. In CPS,
network and system security is of supreme importance in the present data commu-
nication environment. Hackers and intruders can have many successful attempts to
disrup the operation of networks and web services through unauthorized intrusion.
CPS exists everywhere in different sizes, with different functionalities and capabili-
ties. Moreover, IoT is responsible for the communication between connected devices
while exchanging data that requires Internet, wireless connections, and other commu-
nication mediums. Mainly, CPS makes use of IoT devices to fetch data and efficiently
process it for implementing it in a particular area. The sensors and connected devices
in CPS collect data from various gateways installed in the network and then analyze
it for better decision-making. CPS comprises a new cohort of sophisticated systems
whose normal operation depends on vigorous communications between their phys-
ical and cyber components. As we increasingly rely on these systems, asserting their
correct functionality has become essential. Inefficient planning and control strategies
may lead to a harmful cause.
v
vi Foreword
The last decade has seen enormous research in the field of deep learning and
neural networks in most of the engineering domains. Nowadays, various aspects of
our lives depend on complex cyber-physical systems, automated anomaly detection,
and developing a general model for security and privacy applications are crucial. For
accurate and efficient data analysis, ML-based approaches are the best suitable way
to protect and secure the network from any uncertain threats. The real cyber-physical
system should have both physical and digital parts inter-connected in each part and
process, and the system itself should have the capacity to change its behavior to adapt
to changing requirements. Machine learning occupies a major role in estimating the
cyberattacks that target the cyber-physical system and such attacks are challenging
throughout the world. Machine learning for anomaly detection in CPS includes tech-
niques that provide a promising alternative for the detection and classification of
anomalies based on an initially large set of features.
In recent years various applications and algorithms have been proposed to mitigate
those attacks through inferential data analysis. Data Driven capabilities for securing
the cyber- physical system are possible through emerging ML approaches. The need
for security in integrated components occupies a major criterion for CPS, I can
find some chapters contributing good approaches for security mitigations such as
an In-depth Analysis of Cyber-Physical Systems: Deep Machine Intelligence-based
Security Mitigations. Also, risk assessment in CPS and security using ML is a holistic
approach to the challenges that can be overcome in recent years.
Due to the hastening progress in machine learning and its application in dealing
various issues of CPS, this book’s publication is much needed at the present time. This
volume provides a literature review covering the complete history of CPS security, an
overview of the state-of-art, and many valuable references. Also, all the fourteen peer-
reviewed chapters report the state of the art in CPS and anomaly detection research
as it relates to smart city and other areas such as IoT, covering many important
aspects of security. In my opinion, this book is a valuable resource for graduate
students, academicians, and researchers interested in understanding and investigating
this important field of study.
vii
viii Preface
comparing and utilizing various classifiers, such as the KNeighbors Classifier, Deci-
sion Tree Classifier, Support Vector Classifier, AdaBoost Classifier, Random Forest
with Extreme Gradient Boost Classifier, Random Forest Classifier, Gradient Boosting
Classifier, Gradient Boosting Machine Classifier, and XGB Classifier. The outcomes
demonstrated that, when compared to alternative methods, the Gradient Boosting
Classifier algorithm employing random search achieved improved detection accu-
racy, suggesting a considerably lesser vulnerability to such changes.
Chapter 5 develops a useful collection of machine learning models for easy
offshore wind industry deployment, with the goal of addressing a major gap in
the current literature. The decision-making process on safety precautions, such as
when to schedule maintenance or repairs or alter work procedures to lower risk,
will subsequently be guided by these models. Furthermore, the models with the best
performance for the majority class in the imbalanced dataset and the minority class
in the imbalanced dataset have been highlighted by Barouti and Kadry in this chapter.
From the experimental results, the authors concluded that the classifiers outperformed
neural networks and deep neural networks. Furthermore, the chapter also emphasizes
the possible effects of these tools on the industry’s long-term profitability and the
significance of creating efficient machine learning models and enforcing stricter
data records to increase safety in the offshore wind sector. The chapter also points
out that the excellent performance of a few chosen models indicates the validity of
the anticipated predictions and shows how machine learning models work well for
safety-related decision-making in the offshore wind sector.
In Chap. 6, a Convolutional Neural Network (CNN) model for attack detection
has been created by Ravi Kishore et al. The proposed method has been verified
with the latest V2X dataset in order to investigate several attributes, such as the
source and destination vehicle addresses, network service kinds, network connection
status, message type, and connection duration. Initially, the authors have performed
preprocessing of data in order to create the desired detection system. In summary, the
simulation results show that the proposed CNN performs better than the state-of-the-
art machine learning techniques, including Random Forest (RF), Adaptive Boosting
(AdaBoost), Gradient Boosting (GBoost), Bagging, and Extreme Gradient Boosting
(XGBoost), and reaches an exceptional degree of accuracy when applying anomaly
detection.
In Chap. 7, the use of breakthroughs in autonomous systems to challenge the
foundations of the human cognitive linguistic process is unpacked in order to stim-
ulate the development of cyber-physical system models and algorithms. In order to
accomplish this, Monte-Serrat and Cattani employed an argumentation technique to
demonstrate that a particular structure, or pattern, frequently arises in the cognitive
language processes of both intelligent systems and humans. The authors use this
to demonstrate not only that the pattern ensures coherence in the decision-making
process of cognitive computing, but also highlights the issues surrounding the biases
of AI’s black box and the intelligence of autonomous vehicles. Thus, it is feasible to
control the interpretative activity of cyber-physical systems and the way they make
decisions by elucidating the dynamics of the distinct cognitive linguistic process as
Preface xi
a shared process for people and machines, resulting in the development of safe and
sustainable autonomous cars.
In Chap. 8, Kumar et al., have introduced a potential approach to enhance
the security of CPS in smart city environments. This was accomplished by using
under-sampling ensemble approaches to overcome the class imbalance problem that
machine learning algorithms faced. Class imbalance is resolved using the under-
sampling-based ensemble technique, which lowers the majority class and creates a
balanced training set. The suggested approach promotes minority performance while
decreasing bias toward the majority class. Additionally, the proposed method resolves
the issue of class imbalance and increases accuracy without the disadvantages asso-
ciated with complex model development. The MSCA benchmark IDS dataset is used
for the tests, and the results show that the under-sampling classifiers such as Self-
Paced Ensemble Classifier, Bagging Classifier, and Balance Cascade Classifier are
remarkably accurate in identifying network anomalies.
Chapter 9 is about the application of deep learning approaches in medical cyber-
physical system due to the large dimensionality and noticeable dynamic nature of
the data in these kinds of systems. Swapnarekha and Manchala have built an intel-
ligent security architecture in this chapter that uses deep neural networks to detect
cyberattacks in the healthcare industry. The WUSTL-EHMS 2020 dataset, which
is made up of network traffic indicators gathered from the patient’s biometric data,
was then used to validate the proposed framework. Since the features in the dataset
had fluctuating values, min-max normalization was first applied to the data. Further,
authors have used Synthetic minority oversampling (SMOTE) because the dataset
included in this study is unbalanced, with 2046 network attack samples and 14,272
normal samples. Finally, the effectiveness of the proposed framework in comparison
to a number of conventional machine learning and ensemble learning approaches has
been verified, and the results of the experiments show that the proposed DNN model
outperforms the examined machine learning and ensemble learning approaches in
terms of accuracy, precision, recall, F1-score, AUC-ROC, and accuracy.
Chapter 10 explains about safeguarding sensitive industrial data, and averting
safety risks using advanced machine learning approaches. In this chapter, an
ensemble learning-based model is designed by Geetanjali Bhoi et al., to detect
anomalies in Industrial IoT network. The authors used gradient boosted decision
tree with its optimized hyperparameters using a gravitational search algorithm. The
suggested approach has been validated using the X-IIoTID dataset. Then the perfor-
mance of the proposed model has been compared with various machine learning
and ensemble approaches such as Linear Regression, Linear Discriminant Anal-
ysis, Naïve Bayes, Decision Tree, Stochastic Gradient Descent, Quadratic Discrimi-
nant Analysis, Multilayer Perceptron, Bagging, Random Forest, AdaBoost, Gradient
Boosting, and XGBoost, and the experimental findings shows that the suggested
approach attained superior performance in comparison with other approaches.
xii Preface
by Maity et al. The authors have carried out the survey by categorizing datasets
into two types such as still image-based, and video-based. The datasets based on
still images are additionally divided into datasets based on front images and aerial
imaging. An extensive comparison of the various dataset types, with particular
attention to their properties, has been presented in this chapter. Additionally, the
chapter lists difficulties and research gaps pertaining to automatic vehicle classifi-
cation datasets. Along with offering a thorough examination of every dataset, this
chapter also makes several important recommendations for future automatic vehicle
classification research directions.
xv
xvi Contents
Abstract Network abnormalities may occur due to enormous reasons, such as user
irregular behavioral patterns, network system failure, attacker malicious activities,
botnets, or malicious software. The importance of information management and data
processing systems has changed the enormous volume of data and its incremental
increase. An IDS monitors and examines data to detect unauthorized entries into
a system or network. In this article, the Ada-Boost ensemble learning technique is
proposed with SMOTE to identify the anomalies in the network. The Ada-Boost
algorithm is utilized mainly in the classification task, and SMOTE handles the class
imbalance problem. The suggested approach outperformed various ML algorithms
and ensemble learning approaches in relation of precision, recall, F1-score, and
accuracy with 0.999 and 99.97% respectively when investigated with the NSL-KDD
dataset.
S. K. Pemmada (B)
Department of Computer Science and Engineering, GITAM (Deemed to be University), GST,
Visakhapatnam, Andhra Pradesh 530045, India
e-mail: [email protected]; [email protected]
K. S. Naidu
Department of CSE-IoT, Malla Reddy Engineering College, Medchal-Malkajgiri, Hyderabad,
Telangana State 500100, India
D. K. K. Reddy
Department of Computer Science Engineering, Vignan’s Institute of Engineering for Women(A),
Visakhapatnam, Andhra Pradesh 530046, India
1.1 Introduction
request. Saint attack screens every system live on the network for TCP and UDP
services. It launches a series of probes for any service it finds running to detect
something that might allow an intruder to gain unlawful access. Ftp Write attack is
an FTP protocol exploit where an attacker can use the PORT command to indirectly
request access to ports by using the victim’s computer, which acts as a request proxy.
Warez Master (WM), and Warez Client (WC) attacks are two types of assaults that
take advantage of flaws in “anonymous” FTP on both Windows and Linux. Rootkit
attacks are stealthy programs designed to obtain a network device’s administrative
rights and access. In a Mailbomb attack, a mail bomb sends a massive amount of
emails to a specific person or computer.
Over half of the global population resides in cities, and it is anticipated that
this figure will increase as people continue to move to urban regions seeking
improved employment prospects and educational resources. Smart city facilities
can be extended to several fields, such as transportation, tourism, health, environ-
ment, safety, home energy management, and security [4]. Several components of a
smart city include various sensors in applications such as structural health aware-
ness, real-time nose mapping, smart parking, smart street lights, route optimization,
etc. With the emergence of these, wireless technologies have reached the public eye
and gradually incorporated into every corner. So, there is always a scope for several
unauthorized access to such devices, which may lead to data inconsistency and the
evolvement of suspicious activities. An IDS is designed to assist ongoing moni-
toring and detection of cyber-attacks over the smart city (especially IoT networks)
to supplement the security protocol provision. IDS is a security approach used to
discover suspicious device behavior and intercept the attacking source promptly to
secure the network [5].
Three distinct categories can be used to classify ML algorithms employed in
anomaly detection systems: those that utilize supervised learning, those that apply
unsupervised learning techniques, and those that incorporate a combination of both
in a hybrid approach [6]. The techniques of supervised detection of anomalies train
the classifiers with the labeled information. For both anomalous and normal data,
the testing and training of data utilized in these methods should be submitted to the
necessary mark. ML algorithms that are unsupervised do not require labeled datasets.
They focus on analyzing and discovering the structure of the data. Hybrid strategies
are made up of two or more aspects, each of which performs a certain function.
One component is used for classification, clustering, and preprocessing, and another
component for optimization tasks. Hybrid approaches are used to make the best of
each of the algorithms mentioned above and boost machine efficiency. A variety of
ML techniques have been employed to identify different kinds of network threats [7].
ML algorithms in a particular, decision tree [8], k-nearest neighbor [9], random forest
[10], support vector machine [8], multi-layer perceptron [11], etc. are the majorly
applied method for intrusion detection. Different techniques, learning processes,
and different features of input do not provide the same results concerning the various
classes of attacks. However, such algorithms have various disadvantages such as
data acquisition (collected data may be bogus and imbalance in nature), error-prone
(data must be clean that needs data preprocessing), algorithm selection (difficult task
4 S. K. Pemmada et al.
The capacity to detect network abnormalities is critical for ensuring network stability.
The majority of intrusion detection research for predictive approaches is done using
comparable training and test datasets.
Adhi Tama et al., [12] aimed to study and highlight the usefulness of stacking
ensemble-based approach for anomaly-based IDS, where the base learner model
uses Deep Neural Network (DNN). It shows the stacking-based DNN model for
intrusion detection for two-class detection problems as normal and malicious data.
They validated the proposed model on NSL-KDD, UNSW-NB15, and CICIDS 2017
datasets using different evaluation metrics. According to the results, the suggested
model outperformed the underlying DNN model and other current ML algorithms
in the literature.
Jain and Kaur [13] explored the disseminated AI-based group methods to iden-
tify float’s presence in the organization traffic and recognize the organization-based
assaults. The investigation has been directed in three stages. Initially, Random Forest
(RF) and LR classifiers are utilized as initial phase learner, and Support Vector
Machines (SVM) are used as the second level phase learner. Next, K-means clus-
tering based on sliding window is used to handle the concept drift. Finally, techniques
based on ensemble learning are used to identify the attacks in the network. Exper-
imentation has been conducted on CIDDS-2017, generated testbed data, and NSL-
KDD. The assessment has been directed at different machines by shifting numerous
6 S. K. Pemmada et al.
agent centers to realize the learning time dormancy in the distributed environment.
The test results demonstrated that the SVM-based model had shown better exactness.
Several methods have been suggested to identify normal data with anomalies to
detect network intrusions. Zhong et al. [14] discussed the integration framework
of several ML techniques. They utilized a damped incremental statistics method to
abstract features from network tragics and then used an autoencoder using labeled
data to identify network traffic anomalies. The proposed algorithm combines the
LSTM classifier and autoencoder classifier and finally displays the experimental
results for the proposed algorithm.
Khammassi and Krichen [15] suggested a multi-objective Feature Selection
method as a learning algorithm using a logistic regression wrapper approach and
a non-dominated genetic algorithm as a search methodology. The proposed method
is tested in two phases, and the results are compared for both binary-class and multi-
class classifiers using Naive Bayes, RF, and C4.5 classifiers on UNSW-NB15, CIC-
IDS2017, and NSL-KDD data sets. The binary class datasets display better accu-
racy compared to multi-class datasets. Table 1.1 outlines a variety of strategies and
evaluative studies that have been suggested by different researchers.
From Table 1.1, it is seen that most of the research has been focused on the use
of the KDD-CUP 99 and NSL-KDD datasets. However, many of them are involved
with the issues like complexity and finding highly accurate solutions.
Adaptive Boosting (AdaBoost) was proposed by Freund et al. [25]. The base learners
build on a weighted distribution dataset, where the instance weights on the dataset
depend on the prediction of previous base learner instances. If a particular instance be
misclassified, the subsequent model will assign a higher weight to that instance; else,
if the classification is right, the weight will be unaltered. The final decision-making
is accomplished by the weighted vote of the base learners, where the weights are
determined by the misclassification rates of the models. In AdaBoost, DTs serve as the
foundational classifiers, and the models that achieve higher predictive accuracy are
assigned greater weights, whereas those with lower accuracy are given lesser weights.
Figure 1.1 depicts the proposed approach framework. An IDS has the capability to
scrutinize both user behaviors and system activities, identify established patterns of
attacks, and spot nefarious activities within the network. The primary objective of
an IDS lies in overseeing the network and its individual components, recognizing a
range of network breaches, and alerting the respective personnel upon the detection of
such security incidents. Several smart city sensor data have been preprocessed using
different steps, normalizing non-numerical labels and balancing class labels with the
1 SMOTE Integrated Adaptive Boosting Framework for Network … 7
target variables. The prepared data is then put into Ada-Boost, an intelligent ensemble
framework. If the proposed method detects an attack, the network administrator
will be notified, and the monitoring system will be alerted. In addition, intrusion
prevention systems scan incoming network packets to detect malicious or anomalous
activity and provide alerts.
8 S. K. Pemmada et al.
Different statistical studies have exposed the inherent disadvantages of KDD cup 99,
which affected many researchers’ detection accuracy of intrusion detection models
[26]. NSL-KDD represents an enhanced iteration of the original KDD, incorporating
essential records from the predecessor dataset of the KDD Cup 99. This work is
simulated on the NSL-KDD [27] using an ensemble learning algorithm called Ada-
boost and validated the method proposed by comparing different state-of-the-art
ML algorithms, SGD, KNN, RF, LDA, QDA, DT, GNB, LR, and MLP. The dataset
contains 41 features referring to ‘basic features’, ‘features related to the content’,
‘traffic features related to time’, and ‘traffic features based on the host of each network
connection vector’. The detailed feature and its type are presented in Table 1.2.
This dataset is having 148,517 instances and in these instances, various attack
types such as ‘normal’–77,054; ‘back’–1315; ‘land’–25; ‘Neptune’–45,871; ‘pod’–
242; ‘smurf’–3311; ‘worm’–2; ‘teardrop’–904; ‘processtable’–685; ‘apache2’–
737; ‘udpstorm’–2; ‘satan’–4368; ‘ipsweep’–3740; ‘nmap’–1566; ‘portsweep’–
3088; ‘mscan’–996; ‘saint’–319; ‘guess–passwd’–1284; ‘ftp–write’–11; ‘imap’–
965; ‘phf’–6; ‘multihop’–25; ‘warezmaster’–964; ‘warezclient’–890; ‘spy’–2;
‘xlock’–9; ‘xsnoop’–4; ‘snmpguess’–331; ‘snmpgetattack’–178; ‘httptunnel’–133;
‘sendmail’–14; ‘named’–17; ‘buffer-overflow’–50; ‘loadmodule’–11; ‘rootkit’–
23; ‘perl’–5; ‘sqlattack’–2; ‘xterm’–13; ‘ps’–15; ‘mailbomb’–293 are mentioned
as the dependent variable. Except normal class labels remaining attack types
are converted as 4 class labels such as ‘DoS’—‘pod’, ‘smurf’, ‘back’, ‘land’,
‘udpstorm’, ‘processtable’, ‘Neptune’, ‘teardrop’, ‘apache2’, ‘worm’, ‘mailbomb’.
‘U2R’— ‘xterm’, ‘ps’, ‘buffer-overflow’, ‘perl’, ‘sqlattack’, ‘loadmodule’, ‘rootkit’.
‘Probe’— ‘nmap’, ‘satan’, ‘mscan’, ‘ipsweep’, ‘portsweep’, ‘saint’. ‘R2L’—
‘xsnoop’, ‘named’,‘snmpguess’, ‘imap’, ‘multihop’, ‘warezclient’, ‘spy’, ‘xlock’,
‘snmpgetattack’, ‘phf’, ‘guess-passwd’, ‘ftp-write’, ‘httptunnel’, ‘warezmaster’,
‘sendmail’. So, the dependent variable has 5 classes which are normal, R2L, U2R,
DoS, Probe.
1 SMOTE Integrated Adaptive Boosting Framework for Network … 11
The research was carried out on a computer equipped with Windows 10 Pro (64-
bit) and powered by an Intel(R) Core (TM) by 8 GB of RAM. Simulations of the
suggested models, alongside those for comparison, were executed within an envi-
ronment based on Python. This setup encompassed the Numpy and Pandas libraries
(utilized for data manipulation and analysis); the sklearn library (employed for the
implementation of machine learning classifiers and data preprocessing tasks); pycm
(used for evaluating multiclass classification metrics); Matplotlib and Seaborn (for
graphical representation of data); and imblearn (applied for addressing class imbal-
ance through random oversampling). Additionally, the classification-metrics library
was used for assessing performance and analysis. The techniques under considera-
tion, including the novel approach and those for comparison, underwent evaluation
on a dataset partitioned in an 80% training to 20% testing split. The parameters for
both the novel technique and the benchmark ensemble and ML methods are detailed
in Table 1.3.
Experimentation is carried out on the proposed approach and with several ML tech-
niques. The projected method and comparative approaches are validated on various
evaluation metrics such as true negative (TN), true positive (TP), false negative (FN),
false positive (FP), false-positive rate (FPR), recall (TPR), f1-score, precision, accu-
racy, micro and macro average roc curve concerning every class, and overall accuracy
[28].
14 S. K. Pemmada et al.
The study demonstrates the performance of the AdaBoost classifier relating to various
ML and ensemble learning techniques presented in Tables 1.4, 1.5, 1.6, 1.7, 1.8, 1.9,
1.10, 1.11, 1.12, 1.13 and 1.14. The SGD, GNB, and LR classifiers show a large miss
classification rate performance for all the classes. The complete in-depth results of
these classifiers are shown in Tables 1.4, 1.5 and 1.6, where the accuracy of SGD,
GNB, and LR are 26.26, 30.58, and 33.15, respectively. This shows the inability to
interpret and classify such large data with conventional ML methods.
The QDA classified overall model performance with each and individual class
are shown in Table 1.7. The DoS, U2R, and Probe produce a TPR of 99.4, 98.5, and
97.5. The classes Normal, R2L, and U2R produced the least FPR with 0.001, 0.004,
and 0.083. The overall accuracy is 65.63, whereas individual accuracy of 93.1 and
91.9% is achieved for U2R and DoS classes.
The LDA classifies the DoS, Probe, and Normal classes precisely to some extent,
i.e., these classes obtain TPR and individual accuracy with 98, 97, and 96.7%, as
shown in Table 1.8. The DoS class shows an FPR of 0.007, an F1-score of 94.9,
and 97.2 precision. The individual accuracy of U2R and R2L classes produces an
accuracy of 95.2 and 92. The LDA gives an overall accuracy of 89.4.
Table 1.9 shows the MLP classifier’s result analysis in the Probe, DoS, and U2R
classes, with accuracy of 99.5, 98.8, and 98.3, respectively. 15,266 instances are
correctly classified, and 268 are wrongly classified with FPs for the Probe class. The
DoS and U2R class show that 14,736 and 14,523 are correctly classified, and 238 and
482 are misclassified and given false positives. Each class shows a TPR greater than
93% and an FPR of less than 0.025. The F1-score and precision values of individual
classes derive precise values, which leads the MLP classifier with an overall accuracy
of 95.94.
Table 1.10 shows the result analysis of the k-NN, where ‘DoS’ and ‘R2L’ classes
are classified precisely and with an FPR of 0.003 and 0.002. The ‘Probe’ and
‘U2R’ classes are properly predicted with 15,309 and 15,284 instances, respectively,
whereas 158 and 303 instances are misclassified as false positives. The k-NN clas-
sifier categorizes each class almost and achieves an accuracy rate of 98.69, whereas
the class ‘Normal’ achieves a distinct accuracy of 99.2, ‘DoS’, and ‘URL’ with an
accuracy of 99.6. The classes’ Probe’ and ‘U2R’ with a distinct accuracy of 99.5.
The GB classifier predicted almost all the classes precisely with recall, precision,
F1-score, and the accuracy of these classes is greater than 99%. Table 1.11 shows
the results of each class performance metric for the GB classifier. The classes’ DoS’
and ‘U2R’ achieved an individual accuracy of 99%, ‘Normal’ and ‘Probe’ achieved
1 SMOTE Integrated Adaptive Boosting Framework for Network … 19
an individual accuracy of 99.8, and ‘R2L’ with an individual accuracy of 99.7. The
FPR is less than 0.01 for all the classes and achieved an overall accuracy of 99.67.
Table 1.12 shows the result analysis of the Bagging classifier. The classes show
high true positive instances; it can be concluded that very few instances of individual
classes are misclassified. Thirty-two instances of the Normal class are predicted as
attacks, and 14 instances of attacks have been classified as Normal. 11 instances
of the DoS class are classified with Normal and with other attack classes, and 11
instances of other classes have been predicted as DoS attacks. The U2R attack is
classified well compared to other classes, as it has 12 FP and 4 FN instances. The
F1-score, TPR, precision, and individual accuracy show greater than 99% for each
class, where the classifier achieves an overall accuracy of 99.9.
The stacking classifier result analysis are illustrated through Table 1.13, which
illustrates that the classes are classified precisely except with a few misclassifications,
with an overall accuracy of 99.95. The FP of the Normal class shows that 6 instances
of attacks are predicted as Normal, and the false-negative instances with 24 are
predicted as attacks. 7 instances of the DoS class are classified with Normal and
with other attack classes, and 7 instances of other classes have been predicted as
DoS attacks. The false-positive of the Probe class shows that 9 instances of other
classes are predicted as Probe and the false-negative instances with 1 predicted as
DoS attack. The U2R attack shows 5 false positives and 1 false negative instance i.e.,
5 instances of other classes are predicted as U2R, and 1 instance of U2R is predicted
as R2L.
Table 1.14 shows the analysis of the Ada-Boost classifier where the classes are
classified precisely. The ‘U2R’ attack shows 3 FP and 0 FN instances i.e., 2 instances
of ‘Normal’ and 1 instance of ‘R2L’ is predicted as ‘U2R’. The false-positive of the
R2L class shows that 6 instances of the ‘Normal’ class are predicted as ‘R2L’ attack,
and the FN with 2 instances is predicted with one ‘Normal’ and one ‘DoS attack’.
The FP of the ‘Probe’ class shows that 10 instances of other classes are predicted
as ‘Probe’, and the FN with 1 instance is predicted as ‘DoS attack’. The FP and FN
of the ‘DoS’ class are derived with 1 and 8 instances. The classes Normal, DoS,
Probe, R2L, and U2R are wrongly classified with very few TP and TN instances.
The overall accuracy of the AdaBoost classifier is 99.97.
Illustrated in Fig. 1.4 is the AUC-ROC curve for each model under consideration.
A macro-average is determined by evaluating the metric separately for each class
before taking the mean, whereas a micro-average compile the contributions from all
classes to calculate the overall average metric used in the ROC curve. The micro-
average ROC curve and macro-average for SGD, GNB, LR, QDA, LDA, MLP, and
KNN are 0.52, 0.57, 0.57, 0.79, 0.93, 0.98 and 0.99, and the values for Bagging,
stacking, and the proposed AdaBoost method has the values 1.0, respectively.
Figure 1.5 represents the classification measures of all the models. In all cases,
proposed approach performed well compared to various EL and ML models.
Figure 1.6 presents the respective overall accuracies of different ML and EL methods,
such as SGD, GNB, LR, and QDA with 26.26, 30.58, 33.15, and 65.63% accuracy.
The other methods, such as LDA, MLP, KNN, GB, Bagging, and Stacking, have an
20 S. K. Pemmada et al.
(b)
(a)
(d)
(c)
(f)
(e)
(g) (h)
(i) (j)
(k)
Fig. 1.4 AUC-ROC curves of a SGD, b GNB, c LR, d QDA, e LDA, f MLP, g K-NN, h GB,
i Bagging, j Stacking, k Ada-Boost
1 SMOTE Integrated Adaptive Boosting Framework for Network … 21
(b)
(a)
(d)
(c)
(e)
Fig. 1.5 a TPR against different models, b FPR against different models, c F1-score against
different models, d Precision against different models, e Class accuracy against different models
Table 1.15 Comparison of performance of the proposed method with previous articles
Intelligent method Datasets Evaluation factors Ref
Union and quorum UNSW-NB15 and Accuracy: 99%, [23]
techniques NSL-KDD Random forest with union:
99.34%, Random forest
with quorum: 99.21%
Autoencoder model NSL-KDD Accuracy: 96.36% [29]
trained with optimum
hyperparameters
Hybrid supervised NSL-KDD Accuracy: 98.9% [30]
learning algorithm
KNN, SVM NSL-KDD Accuracy: 84.25% [31]
SVM, KNN, NB, RF NSL-KDD Accuracy: 99.51%, [32]
F1-Score: 99.43%
KNN, MLP, RF NSL-KDD Accuracy: 85.81% [33]
NSL-KDD Accuracy: 99.97% Proposed method
Table 1.15 presents the previous research results on Network Anomaly Detection
using various existing algorithms, where the results concerning the accuracy and
other parameters are calculated using the NSL-KDD datasets are tabulated. The
proposed method Ada-Boost classifier obtained the highest accuracy compared to
various previous studies.
1.6 Conclusion
Data mining and ML approaches are actively trying to speed up the mechanism
of discovering information. There is a greater volume of ubiquitous data streams
produced from different digital applications. As computer network traffic is growing
rapidly every year, managing the network in real time is a difficult task. Hence to
reduce the potential risk and segregate normal data instances from anomalous ones,
an EL approach is suggested to integrate the effects of individual techniques with
the support of the established ML algorithms. Ada-Boost ML technique is used for
a classification task that will boost performance, and SMOTE method is applied to
overcome the class imbalance problem.
Analysis of this experiment manifests that the presentation of the projected method
is relatively better than the existing traditional ML algorithms for precision, accuracy,
and recall values. The projected approach is compared to several ML methods such
as KNN (98.69%), MLP (95.94%), LDA (89.42%), QDA (65.63%), GNB (30.58%),
and SGD (26.24%). When compared to the accuracy of the various techniques, the
proposed Ada-Boost algorithm achieved an accuracy of 99.97%. The results are
evidence that the projected approach for anomaly detection is better compared to
other proposed methods.
1 SMOTE Integrated Adaptive Boosting Framework for Network … 23
Most conventional ML algorithms perform poorly because they prefer the majority
class samples, resulting in low prediction accuracy for the minority class in the case of
the unbalanced dataset. As a result, learning the critical instances becomes difficult.
In fact, in order to reduce the overall error rate, assume equal misclassification costs
for all samples, and oversampling increases the number of training instances, which
increases computing time. Because the identical misclassification cost involved with
each of the classes in the unbalanced datasets is not true and pushes the extreme
computing limitations in identifying different assaults. AdaBoost combined with
SMOTE produces an ideal set of synthetic samples by fixing for skewed distribu-
tions and modifying the updating weights. Although this approach can address class
imbalance issues well, it may use a significant amount of system resources. Further-
more, studies may concentrate on improving the efficacy and efficiency of IDS by
taking into account the many difficulties that ML-based IDS encounters and utilizing
the newest ensemble learning algorithms, such XGBoost, LightGBM, etc. It’s crucial
to remember that these approaches would have to take the possibility of higher system
resource usage into account.
References
1. Salehi, M., Rashidi, L.: A survey on anomaly detection in evolving data. ACM SIGKDD Explor.
Newsl. 20(1), 13–23 (2018). https://doi.org/10.1145/3229329.3229332
2. Reddy, D.K.K., Behera, H.S., Nayak, J., Routray, A.R., Kumar, P.S., Ghosh, U.: A fog-based
intelligent secured IoMT framework for early diabetes prediction. In: Ghosh, U., Chakraborty,
C., Garg, L., Srivastava, G. (eds.) Internet of Things, pp. 199–218. Springer, Cham (2022)
3. Nayak, J., Kumar, P.S., Reddy, D.K.K., Naik, B., Pelusi, D.: Machine learning and big data
in cyber-physical system: methods, applications and challenges. In: Cognitive engineering for
next generation computing, Wiley, pp. 49–91 (2021)
4. Baig, Z.A., et al.: Future challenges for smart cities: Cyber-security and digital forensics. Digit.
Investig., 22 (September 2019), 3–13 (2017). https://doi.org/10.1016/j.diin.2017.06.015
5. Elsaeidy, A., Munasinghe, K.S., Sharma, D., Jamalipour, A.: Intrusion detection in smart cities
using Restricted Boltzmann Machines. J. Netw. Comput. Appl., 135(September 2018), 76–83
(2019). https://doi.org/10.1016/j.jnca.2019.02.026
6. Chkirbene, Z., Erbad, A., Hamila, R.: A combined decision for secure cloud computing based
on machine learning and past information. In: 2019 IEEE Wireless Communications and
Networking Conference (WCNC), vol. 2019-April, pp. 1–6 (2019). https://doi.org/10.1109/
WCNC.2019.8885566
7. Tun, M.T., Nyaung, D.E., Phyu, M.P.: Network anomaly detection using threshold-based sparse.
In: Proceedings of the 11th International conference on advances in information technology,
pp. 1–8 (2020). https://doi.org/10.1145/3406601.3406626
8. Peddabachigari, S., Abraham, A., Thomas, J.: Intrusion detection systems using decision trees
and support vector machines. Int. J. Appl. Sci. Comput. 11(3), 118–134 (2004)
9. Liao, Y., Vemuri, V.R.: Use of K-nearest neighbor classifier for intrusion detection. Comput.
Secur. 21(5), 439–448 (2002). https://doi.org/10.1016/S0167-4048(02)00514-X
10. Negandhi, P., Trivedi, Y., Mangrulkar, R.: Intrusion detection system using random forest on
the NSL-KDD dataset, pp. 519–531 (2019)
11. Guezzaz, A., Asimi, A., Asimi, Y., Tbatous, Z., Sadqi, Y.: A global intrusion detection system
using PcapSockS sniffer and multilayer perceptron classifier. Int. J. Netw. Secur. 21(3), 438–450
(2019). https://doi.org/10.6633/IJNS.201905
24 S. K. Pemmada et al.
12. Adhi Tama, B., Nkenyereye, L., Lim, S.: A Stacking-based deep neural network approach
for effective network anomaly detection. Comput. Mater. Contin., 66(2), 2217–2227 (2021).
https://doi.org/10.32604/cmc.2020.012432
13. Jain, M., Kaur, G.: Distributed anomaly detection using concept drift detection based hybrid
ensemble techniques in streamed network data. Cluster Comput., 1–16 (2021). https://doi.org/
10.1007/s10586-021-03249-9
14. Zhong, Y., et al.: HELAD: A novel network anomaly detection model based on heterogeneous
ensemble learning. Comput. Networks 169, 107049 (2020). https://doi.org/10.1016/j.comnet.
2019.107049
15. Khammassi, C., Krichen, S.: A NSGA2-LR wrapper approach for feature selection in network
intrusion detection. Comput. Networks, 172(February), 107183(2020). https://doi.org/10.1016/
j.comnet.2020.107183
16. Kaur, G.: A comparison of two hybrid ensemble techniques for network anomaly detection in
spark distributed environment. J. Inf. Secur. Appl., 55(September), 102601(2020). https://doi.
org/10.1016/j.jisa.2020.102601
17. Othman, D.M.S., Hicham, R., Zoulikha, M.M.: An efficient spark-based network anomaly
detection. Int. J. Comput. Digit. Syst. 9(6), 1175–1185 (2020). https://doi.org/10.12785/ijcds/
0906015
18. Nagaraja, A., Boregowda, U., Khatatneh, K., Vangipuram, R., Nuvvusetty, R., Sravan Kiran,
V.: Similarity based feature transformation for network anomaly detection. IEEE Access, 8,
39184–39196 (2020). https://doi.org/10.1109/ACCESS.2020.2975716
19. Thaseen, I.S., Chitturi, A.K., Al-Turjman, F., Shankar, A., Ghalib, M.R., Abhishek, K.: An
intelligent ensemble of <scp>long</scp>-short <scp>-term</scp> memory with genetic algo-
rithm for network anomaly identification. Trans. Emerg. Telecommun. Technol., (September),
1–21(2020). https://doi.org/10.1002/ett.4149
20. Truong-Huu, T., et al.: An empirical study on unsupervised network anomaly detection using
generative adversarial networks. In: Proceedings of the 1st ACM workshop on security and
privacy on artificial intelligence, pp. 20–29 (2020). https://doi.org/10.1145/3385003.3410924
21. Gurung, S., Kanti Ghose, M., Subedi, A.: Deep learning approach on network intrusion detec-
tion system using NSL-KDD dataset. Int. J. Comput. Netw. Inf. Secur., 11(3), 8–14 (2019).
https://doi.org/10.5815/ijcnis.2019.03.02
22. Zhang, C., Ruan, F., Yin, L., Chen, X., Zhai, L., Liu, F.: A deep learning approach for network
intrusion detection based on NSL-KDD dataset. In: 2019 IEEE 13th International Conference
on Anti-counterfeiting, Security, and Identification (ASID), vol. 2019-Octob, pp. 41–45. https://
doi.org/10.1109/ICASID.2019.8925239
23. Doreswamy, Hooshmand, M.K., Gad, I.: Feature selection approach using ensemble learning
for network anomaly detection. CAAI Trans. Intell. Technol., 5(4), 283–293. https://doi.org/
10.1049/trit.2020.0073
24. Bagui, S., Kalaimannan, E., Bagui, S., Nandi, D., Pinto, A.: Using machine learning techniques
to identify rare cyber-attacks on the UNSW-NB15 dataset. Secur. Priv. 2(6), 1–13 (2019).
https://doi.org/10.1002/spy2.91
25. Freund, Y., Schapire, R.E., Hill, M.: Experiments with a new boosting algorithm. (1996)
26. Dhanabal, L., Shantharajah, S.P.: A study on NSL-KDD dataset for intrusion detection system
based on classification algorithms. Int. J. Adv. Res. Comput. Commun. Eng. 4(6), 446–452
(2015). https://doi.org/10.17148/IJARCCE.2015.4696
27. University of New Brunswick.: Canadian Institute for Cybersecurity. Research|Datasets|UNB.
unb.ca, (2018)
28. Nayak, J., Kumar, P.S., Reddy, D.K., Naik, B.: Identification and classification of hepatitis C
virus: an advance machine-learning-based approach. In: Blockchain and machine learning for
e-Healthcare systems, Institution of Engineering and Technology, pp. 393–415
29. Kasim, Ö.: An efficient and robust deep learning based network anomaly detection against
distributed denial of service attacks. Comput. Networks 180, 107390 (2020). https://doi.org/
10.1016/j.comnet.2020.107390
1 SMOTE Integrated Adaptive Boosting Framework for Network … 25
30. Hosseini, S., Azizi, M.: The hybrid technique for DDoS detection with supervised learning
algorithms. Comput. Networks 158, 35–45 (2019). https://doi.org/10.1016/j.comnet.2019.
04.027
31. Su, T., Sun, H., Zhu, J., Wang, S., Li, Y.: BAT: Deep learning methods on network intrusion
detection using NSL-KDD dataset. IEEE Access 8, 29575–29585 (2020). https://doi.org/10.
1109/ACCESS.2020.2972627
32. Kasongo, S.M., Sun, Y.: A deep long short-term memory based classifier for wireless intrusion
detection system. ICT Express 6(2), 98–103 (2020). https://doi.org/10.1016/j.icte.2019.08.004
33. Illy, P., Kaddoum, G., Moreira, C.M., Kaur, K., Garg, S.: Securing fog-to-things environment
using intrusion detection system based on ensemble learning. arXiv, no. April, pp. 15–18
Chapter 2
An In-Depth Analysis of Cyber-Physical
Systems: Deep Machine Intelligence
Based Security Mitigations
B. K. Tripathy (B)
School of Computer Science Engineering and Information Systems, VIT, Vellore, Tamil
Nadu 632014, India
e-mail: [email protected]
G. K. Panda
MITS School of Biotechnology, Bhubaneswar, Odisha 751024, India
A. Sahu
Research Scholar, Utkal University, Bhubaneswar, Odisha 751004, India
2.1 Introduction
In late 2008, the National Science Foundation (NSF, US) acknowledged the impor-
tance of Cyber Physical Systems (CPS) as a significant domain for exploration and
welcomed collaborative research proposals in 2009 [1]. Initially, this system involved
with the amalgamation of computational and physical resources. Over time, it became
more prominent in the research community and evolved into a rising technology for
integrating into research and industrial applications. It can be described as systems
involving tangible, biological and engineered elements, where their functions are
seamlessly merged, observed and regulated through a computing device. Compo-
nents are interconnected at all levels and computing is deeply ingrained in each
physical element, potentially even within substances. These integrated computa-
tional units operate within a distributed environment, providing real-time responses.
The behavior of it represents a fully integrated fusion of computational algorithms
and physical actions.
In CPS, an array of advanced physical devices plays a pivotal role in enabling
seamless interaction with and control of the physical world. Beyond the familiar
devices like light-proximity sensors, microphones, GPS chips, GSM chips, cameras,
touch screens, WiFi, Bluetooth, EDGE, and 4G/5G connectivity, there is a diverse
spectrum of specialized hardware. This includes ultraviolet (UV) sensors for moni-
toring UV radiation, piezoelectric sensors for precise stress and strain measurements,
Geiger-Muller counters for radiation detection, and colorimeters and spectrometers
for in-depth color and spectral analysis. Additionally, devices like strain gauges,
gas chromatographs, and mass spectrometers find applications in stress analysis,
chemical analysis, and composition assessment, respectively. Further, sensors such
as sonar sensors, seismic sensors, and turbidity sensors are deployed for under-
water distance measurement, earthquake monitoring, and water quality assessment.
Capacitive touch sensors, thermal imaging cameras, and Global Navigation Satellite
System (GNSS) receivers enhance human–machine interaction, thermal analysis,
and precise positioning. The list extends to hygrometers for humidity measurement,
time-of-flight (ToF) sensors for 3D imaging, and accelerated stress testing (AST)
chambers for extreme component testing, collectively forming the robust arsenal of
technical components underpinning the functionality of CPS.
When dealing with spatially-distributed physical processes, the task of sensing
can present significant challenges, particularly when deploying sensor devices across
expansive areas. Design difficulties in CPS are discussed in [2]. To seamlessly inte-
grate these devices into the system, a substantial number of sensors, actuators, or
analogous physical components must be distributed over vast geographical regions.
The use of wired sensors, while effective, can incur massive deployment costs and
2 An In-Depth Analysis of Cyber-Physical Systems: Deep Machine … 29
essential features and recent challenges. Section 2.3 delves into the integral aspects of
WSN in conjunction with CPS and the corresponding intricacies of MAC protocols in
this context. Section 2.4 is dedicated to the discussion of threats and security concerns
in CPS and the utilization of machine intelligence and deep learning techniques to
2 An In-Depth Analysis of Cyber-Physical Systems: Deep Machine … 31
address these issues. In Sect. 2.5, we present the results of experiments through ML-
based and DL-based models, substantiating these outcomes through experimental
analysis and subsequent discussions.
In this section, we dive into the foundational elements of CPS. This entails an in-depth
look at the key components, operational structure, technological progress, domain
applications, and the harmonious integration of hardware, software, and real-world
processes. We emphasize the creation of intelligent systems and delve into the essen-
tial characteristics and challenges, particularly in ensuring secure computational and
control processes within CPS.
Cyber refers to elements such as computation, communication, and control, which
are discrete, based on logic, and operate in a switched manner. Physical pertains to
systems, whether natural or human-made, that adhere to the laws of physics and
function continuously over time. CPS represent systems in which the cyber and
physical aspects are closely intertwined across all scales and levels. This marks a shift
from merely applying cyber to the physical realm, moving away from the mindset
of treating computing as off-the-shelf commodity “parts,” and transitioning from
ad-hoc approach to one that is grounded and assured in its development. Figure 2..1
represents an overview of these three terminologies.
In the context of a general overview, a CPS typically consists of a monitoring
system, which usually includes one or more microcontrollers responsible for control-
ling and transmitting data obtained from sensors and actuators that interact with the
physical environment. These embedded systems also require a communication inter-
face to exchange information with other embedded systems or cloud-based platforms.
The central and most crucial aspect of a CPS is the exchange of information, as data
and drug delivery systems [35, 35, 37]. Moreover, the manufacturing industry has
been revolutionized by Industry 4.0, incorporating CPS to enhance efficiency and
automation in production processes [15].
Advancements in Industry 4.0: The integration of CPS into the fourth industrial
revolution, also known as Industry 4.0, signifies a significant shift in manufacturing
[17, 18]. It emphasizes the use of smart technology, data analytics, and automation to
create ‘smart factories’ where machines, products, and systems communicate with
each other [14]. This leap in technology enhances productivity, quality control, and
cost-efficiency. It’s transforming the manufacturing sector and is poised to become
a fundamental aspect of modern industrial production.
In this part, we address the challenges and emerging trends in CPS, such as security
and privacy concerns, as well as the increasing role of machine intelligence in shaping
the future of these systems. It’s crucial to understand that the design of CPS encom-
passes three primary facets. The first aspect focuses on the hardware, embedded in
the system, with the goal of expanding available computational resources (such as
processing power, memory, sensors, and actuators) while keeping costs, size, and
energy consumption in check [41]. In [2], key considerations and hurdles faced
in the development of CPS are explored, providing insights into the complexities
associated with integrating computational and physical elements. The second aspect
deals with communication, whether wired or wireless, aiming to efficiently transmit
messages between distributed devices, quickly and with minimal energy usage. In
[3], efforts have been made to trace the evolutionary path from WSN to the broader
domain of CPS. Their discussion encompasses on the transition, implications and
advancements as sensor networks become integral to the broader concept of CPS.
Researchers in [41], focus on energy consumption and optimization analysis, within
the context of energy efficiency aspects of wireless CPS.
The third aspect centers on the design of a distributed system, enabling the
implementation of CPS functions like remote monitoring and control of distributed
processes. However, achieving perfect communication, such as a 100% packet recep-
tion rate, isn’t the sole objective. Instead, it necessitates real-time guarantees of secure
communication and distribution.
In these scenarios, distributed applications often provide transportation mecha-
nisms for collected sensor data. The primary challenge lies in reliably aggregating
or disseminating messages across the network. Single-hop communication occurs
when a source node is within the communication range of its destination, which is a
straightforward case. However, deployed networks often cover large areas, and low-
power radios typically have a limited communication range of just tens of meters
(Table 2.2). Hence, multi-hop (MHp) communication becomes necessary, where a
source node relies on other network nodes to forward its messages, hop by hop, until
they reach the destination.
34 B. K. Tripathy et al.
MAC sublayer within the data link layer is responsible for controlling access to the
physical network medium, addressing the diverse needs of the sensor network and
minimizing or preventing packet collisions in the medium. Numerous advancements
in MAC protocols have been specifically tailored for WSNs and we pick few related
aspects and detailed in Table 2.3.
What we’ve come to understand about CPS is that a central objective is to seamlessly
merge physical components equipped with sensors and communication capabilities,
both in the physical and virtual realms, in order to create automated and intelligent
systems. Setting aside the various other aspects and challenges associated with CPS,
when we focus on its physical components, many developers aspire to incorporate
sensory devices like light sensors, proximity sensors, microphones, GPS chips, GSM
chips, cameras, and touch screens. In addition to these sensory components, commu-
nication units such as WiFi, Bluetooth, EDGE, and 4G/5G are integral parts of the
system.
It’s important to note that most of these physical units are readily available to the
public, though some may have proprietary features. Furthermore, the communication
infrastructures that the integrators heavily rely on are predominantly public, such
as the internet and cloud services, with the exception of defense or highly secure
solutions.
As a result, the integrated CPS system effectively exposes its identity to the
public, becoming a potentially attractive target for unauthorized access. This open-
ness gives rise to a broad spectrum of concerns, including security vulnerabilities,
privacy compliance issues, and the risk of data breaches. This begs the question:
How can we ensure the security and privacy of these interconnected systems in an
environment where so much is publicly accessible?
Some well-known real-world incidents, such as Stuxnet (in 2010), BlackEnergy (in
2015), Industroyer (in 2016), Triton (in 2017), WannaCry (in 2017), NotPetya (in
2017), Colonial Pipeline Ransomware Attack (in 2021) serve as reminders of the
vulnerabilities in our interconnected systems. In the following section, we delve
into comprehensive hypothetical scenarios related to security breaches and employ
machine learning and deep learning techniques to address these challenges.
These digital threats can be classified based on the intruder’s objectives. In the first
category, their goal is to completely disable the target device. In the second category,
they seek admin or unauthorized access privileges to the target devices. Broadly
speaking, these vulnerabilities can be classified into an exhaustive list. We bring
out most eight types of attacks as physical, network-based, software-driven, data
breaches, side-channel, cryptographic analysis, access-level, and strategic attacks.
Table 2.4 outlines the current cyber-world attacks specifically associated with
software and network-based incidents only [52–54].
38 B. K. Tripathy et al.
offer a more comprehensive toolkit for tackling intrusion detection challenges. Brief
overviews of these methods are outlined.
The Gaussian-Naïve-Bayes learning procedure (Colab-Python: GaussianNB)
centers around the assumption that the distribution of features follows a Gaussian
(normal) distribution, which is a crucial statistical method used to compute the condi-
tional probability of an event. The values of μb and σb are determined through
maximum likelihood estimation as shown in Eq. 2.1.
( )
1 (ai − μb )2
P(ai b) = / exp − (2.1)
2π σb2 2σb2
•
n
Entr opy = − pi ∗ log( pi ) (2.2)
i=1
•
n
Gini I ndex = 1 − pi2 (2.3)
i=1
2 An In-Depth Analysis of Cyber-Physical Systems: Deep Machine … 41
||w||
Entr opy = s.t., yi (w.xi + b) − 1 ≥ 0andyi (w.xi + b) + 1 ≥ 0 (2.4)
2
The logistic regression model (Colab-Python: LogisticRegression) employs a
linear model to handle classification tasks. It uses a logistic function (sigmoid curve)
to model the probabilities associated with potential outcomes in a single trial, as
shown in Eq. 2.5. In this context, a0 represents the midpoint of the function, k
indicates the logistic growth rate or the steepness of the curve and L signifies the
maximum value attained by the function.
L
f (a) = (2.5)
1+ e−k(a−a0 )
Begin
Step 1. Pre-Processing:
Convert CPS raw data covering 2-level classification (normal and abnormal).
Handling missing values with coverage.
Step 2. Feature Selection:
Identify relevant features from processed data for threat detection.
Step 3. Select ML Model:
Choose ML models suitable for CPS threat detection.
Step 4. Data Splitting:
Split labelled data into training and testing sets for model evaluation.
Step 5. Model Training:
Train specified ML model using labelled training data.
Step 6. Model Evaluation:
Evaluate model specific performance measures.
Step 7. Real-time monitoring and alerting:
Implement real-time monitoring using trained ML models
Generate mitigation strategies (alerts or response mechanisms) for detected threats
of attacks.
End
Deep learning (DL) is a specialized field within machine learning that revolves around
the development and training of artificial neural networks with multiple layers,
commonly referred to as deep neural networks. The term “deep” reflects the incorpo-
ration of numerous interconnected layers in these networks. These deep architectures
empower machines to autonomously learn and comprehend intricate patterns and
features from input data, eliminating the need for explicit programming. Character-
ized by the use of neural networks with multiple hidden layers, DL models are adept
at learning hierarchical representations of data. The core principle involves the auto-
matic extraction of relevant features during the training process, a concept known
as representation learning. Employing end-to-end learning, these models directly
learn complex representations from raw input to produce predictions or decisions.
The training process relies on back propagation, where the model iteratively adjusts
its parameters based on the disparity between predicted and actual outcomes. This
learning approach finds successful applications across diverse domains, including
2 An In-Depth Analysis of Cyber-Physical Systems: Deep Machine … 43
computer vision, natural language processing, speech recognition, and medical diag-
nosis. Notable architectures such as Convolutional Neural Networks (CNNs), Recur-
rent Neural Networks (RNNs), and Transformer models have propelled the field’s
advancements, showcasing the versatility and power of deep learning in tackling
complex tasks. Table 2.5 provides a snapshot of popular deep learning models and
their respective architectures and applications. For more detailed explanations can
find comprehensive information in [55, 56].
Autoencoders (AEs) exhibit distinct strengths and characteristics within the realm
of unsupervised learning. AE is a type of artificial neural network with at least
an encoder and a decoder, considered a DL method. AEs are a class of unsu-
pervised learning algorithms employed for efficiently learning representations of
data, typically for dimensionality reduction or feature learning purposes. AEs are
renowned for their simplicity and adaptability, with training occurring in an end-to-
end manner, optimizing the reconstruction error by adjusting both the encoder and
decoder weights simultaneously. In Algorithm-2 we explain the operations of this
DL approach.
The undertaken dataset categorizes attacks into four primary types: (a) DoS: Denial
of Service attacks, (b) R2L: Unauthorized access, especially from remote to local, (c)
U2R: Unauthorized access aimed at obtaining local super-user privileges (referred to
as User to Root) and (d) PROBE: Activities related to surveillance and probing, such
as port scanning. These groups correspond to different types of attacks. Table 2.6
represents the mapping of dataset attributes to the type of attacks. In addition there
are 97,278 normal instances.
46 B. K. Tripathy et al.
within these datasets have been organized into attack groups, grouping similar attack
types together.
The goal is to identify highly correlated feature sets in this high-dimensional data
and correlating the relevant attribute values with the least correlation feature sets. By
comparing each variable with the highest correlation factor, considering differences
up to 0.01343 from the highest value, we categorize into 5 class labels in accordance
2 An In-Depth Analysis of Cyber-Physical Systems: Deep Machine … 49
Fig. 2.6 ML based classification accuracy for normal and abnormal attacks
to the 4 dominant attack types (DoS, R2L, U2R and PROBE) and normal cases.
This process helps highlight and classify critical attacks based on their correlation
patterns within the dataset.
As the Auto Encoder model excels in unsupervised representation learning,
autonomously capturing meaningful features, we leverage its working principles to
enhance k-level classification tasks of the undertaken dataset. Accordingly, we split
the dataset 75% for training and 25% for testing for AE deep learning experiments.
This model comprises with input layer, encoding layer with 50 neurons, decoding
and output layers and define with ‘mean-squared-error’ loss function with ‘adam’
optimizer. Figure. 2.7a and b are the outcome of loss versus epoch and accuracy vs
epoch for 5-level class train and test datasets respectively.
In Fig. 2.8, we demonstrate a 3D scatter plot depicting the accuracy of k-level
classification (four cases of attack and normal) by the AE-Model across various
combinations of hidden layers. Comparative analysis reveals that the AE classifier
with a configuration of 50, 20, 10 hidden layers exhibited superior performance
in classifying attacks into five distinct levels. Further insight into the ROC analysis,
specifically focusing on behavior in the network, normal and attack types is presented
in Fig. 2.9a–e.
50 B. K. Tripathy et al.
Fig. 2.8 AE-DL model Accuracy with varied hidden layers (3D Scatter Plot)
(a) (b)
(c) (d)
(e)
Fig. 2.9 DL based ROC in respect to a Normal and b–e Attacks in the network
issues. Machine learning and deep learning approaches were presented, substanti-
ated with experimental analysis and comprehensive discussions. The examination
of CPS attack classifications and prediction has revealed that employing a two-level
class structure is most effective when utilizing machine intelligence processes. In
this context, seven MI-based classification models exhibit commendable accuracy;
however, their efficiency diminishes when tasked with handling more than two levels
52 B. K. Tripathy et al.
References
11. Rad, C.R., Hancu, O., Takacs, I.: Olteanu, G. Smart monitoring of potato crop: A cyber-physical
system architecture model in the field of precision agriculture. Agric. Agric. Sci. Procedia 6,
73–79 (2015)
12. Ahmad, I., Pothuganti, K.: Smart field monitoring using toxtrac: a cyber-physical system
approach in agriculture. In: Proceedings of the 2020 Intl conf on smart electronics and
communication (ICOSEC), Trichy, India, pp.10–12 (2020)
13. Abid, H., Phuong, L.T.T., Wang, J., Lee, S., Qaisar, S.: V-Cloud: vehicular cyber-physical
systems and cloud com- putting. In: Proc of 4th Intl symposium on applied sciences in
biomedical and communication technologies, Spain, (2011)
14. Work, D., Bayen, A., Jacobson, Q.: Automotive cyber phys- ical systems in the context of
human mobility. In: Proceedings of the national workshop on high-confidence cyber-physical
systems, Troy, Miss, USA, (2008)
15. Dafflon, B, Moalla, N, Ouzrout, Y.: The challenges, approaches, and used techniques of CPS for
manufacturing in Industry 4.0: A literature review. Int. J. Adv. Manuf. Technol. 113, 2395–2412
(2021)
16. He, G., Dang, Y., Zhou, L., Dai, Y., Que, Y., Ji, X.: Architecture model proposal of innovative
intelligent manufacturing in the chemical industry based on multi-scale integration and key
technologies. Comput. Chem. Eng. 141, 106967 (2020)
17. Ren, S., Feng, D., Sun, Z., Zhang, R., Chen, L.: “A framework for shop floor material delivery
based on real-time manufacturing big data. J. Ambient. Intell. Humaniz. Comput. 10, 1093–
1108 (2019)
18. Majeed, A., Lv, J., Peng, T.: A framework for big data driven process analysis and optimization
for additive manufacturing. J. Rapid Prototyp. 24, 735–747 (2018)
19. Sampigethaya, K., Poovendran, R.: Aviation cyber–physical systems: foundations for future
aircraft and air transport. Proc. of IEEE 101, 1823–1855 (2013)
20. Ying, D.S.X., Venema, D.S., Corman, D.D., Angus, D.I., Sampigethaya, D.R.: Aerospace
cyber physical systems-challenges in commercial aviation, Cyber-Physical Systems Virtual
Organization
21. Sampigethaya, K., Poovendran, R.: Aviation cyber–physical systems: Foundations for future
aircraft and air transport. Proc. IEEE 101, 1834–1855 (2013)
22. Huang, Y., Zhao, M., Xue, C.: Joint WCET and update activity minimization for cyber-physical
systems. ACM Transactions, TECS 14, 1–21 (2015)
23. Broo, D.G., Boman, U., Törngren, M.: Cyber-physical systems research and education in 2030:
Scenarios and strategies. J. Ind. Inf. Integr. 21, 100192 (2021)
24. Perry-Hazan, L., Birnhack, M.: Privacy CCTV and school surveillance in the shadow of
imagined law. Law Soc. Rev. 50, 415–449 (2016)
25. Singh, K., Sood, S.: Optical fog-assisted cyber-physical system for intelligent surveillance in
the education system. Comput. Appl. Eng. Educ., 692–704 (2020)
26. Marwedel, P., Engel, M.: Flipped classroom teaching for a cyber-physical system course-an
adequate presence-based learning approach in the internet age. In: Proc of the 10th European
Workshop on Microelectronics Education (EWME), Tallinn, Estonia, pp.14–16 (2014)
27. Taha, W., Hedstrom, L., Xu, F., Duracz, A., Bartha, F.Y., David, J., Gunjan, G.: Flipping a first
course on cyber-physical systems: An experience report. In: Proc of the 2016 workshop on
embedded and cyber-physical systems education. Association for Computing Machinery, New
York, NY, USA (2016)
28. Singh, V.K., Jain, R.: Situation based control for cyber- physical environments. In: Proc of the
IEEE military communications conf (MILCOM ’09), Boston, Mass, USA, (2009)
29. Meng, W., Liu., Xu, W., Zhou, Z.: A cyber-physical system for public environment perception
and emergency handling. In: Proc of the IEEE Intl Conf on high performance computing and
communications, (2011)
30. Hackmann, G., Guo, W., Yan, G., Sun, Z., Lu, C., Dyke, S.: Cyber-physical code sign of
distributed structural health monitoring with wireless sensor networks. IEEE Trans. Parallel
Distrib. Syst. 25, 63–72 (2013)
54 B. K. Tripathy et al.
31. Lin, J., Yu, W., Yang, X., Yang, Q., Fu, X., Zhao, W.: A real-time en-route route guidance
decision scheme for transportation-based cyber physical systems. IEEE Trans. Veh. Technol.
66, 2551–2566 (2016)
32. Kantarci, B.: Cyber-physical alternate route recommendation system for paramedics in an urban
area. In: Proc of the 2015 IEEE Wireless Communications and Networking Conf (WCNC),
USA, (2015)
33. Ko, W.H., Satchidanandan, B., Kumar, P.: Dynamic watermarking-based defense of transporta-
tion cyber-physical systems. ACM Trans. Cyber-Phys. Syst. 4, 1–21 (2019)
34. Raisin, S.N., Jamaludin, J., Rahalim, F.M., Mohamad, F.A.J., Naeem, B.: Cyber-Physical
System (CPS) application-a review. REKA ELKOMIKA J. Pengabdi. Kpd. Masy. 1, 52–65
(2020)
35. Wang, J., Abid, H., Lee, S., Shu, L., Xia, F.: A secured health care application architecture for
cyber-physical systems. Control Eng Appl Inform 13(3), 101–108 (2011)
36. Lounis, A., Hadjidj, A., Bouabdallah, A., Challal, Y.: Secure and scalable cloud-based
architecture for e-health Wireless sensor networks. In: Proc of the Intl Conf on Computer
Communication Networks (ICCCN ’12), Munich, Germany, (2012)
37. Bocca, M., Tojvola, J., Eriksson, L.M., Hollmen, J., Koivo, H.: Structural health monitoring
in wireless sensor networks by the embedded goertzel algorithm. In: Proc of the IEEE/ACM
2nd Intl Conference on Cyber-Physical Systems (ICCPS ’11), pp.206–214. Chicago, Ill, USA
(2011)
38. Jindal, A., Liu, M.: Networked computing in wireless sensor networks for structural health
monitoring. In: Proceeding of the IEEE/ACM transactions on networking (TON ’12), vol. 20.
pp.1203–1216 (2012)
39. Akter, F., Kashem, M.A., Islam, M.M., Chowdhury, M.A., Rokunojjaman, M., Uddin, J.: Cyber-
Physical System (CPS) based heart disease’s prediction model for community clinic using
machine learning classifiers. J. Hunan Univ. Nat. Sci. 48, 86–93 (2021)
40. Feng, J., Zhu, F., Li, P., Davari, H., Lee, J.: Development of an integrated framework for cyber
physical system (CPS)-enabled rehabilitation system. Int. J. Progn. Health Manag 12, 1–10
(2021)
41. Liu, J., Wang, P., Lin, J., Chu, C.H.: Model based energy consumption analysis of wireless
cyber physical systems. In: Proc of 3rd IEEE Inl Conf on Big data security on cloud, IEEE Intl
Conf on High Performance and Smart Computing (Hpsc), and IEEE Intl Conf on intelligent
data and security, pp. 219–224. China (2017)
42. Panda, G.K., Tripathy, B.K., Padhi, M.K.: Evolution of social IoT world: security issues and
research challenges, Internet of Things (IoT), pp.77–98. CRC Press, (2017)
43. Panda, G.K., Mishra, D., Nayak, S.: Comprehensive study on social trust with xAI: tech-
niques, evaluation and future direction, (Accepted), explainable, interpretable and transparent
AI system, pp.1–22 (Ch-10). CRC Press, (2023)
44. Ye, W., Heidemann, J., Estrin, D.: An energy-efficient MAC protocol for wireless sensor
networks. In: 21st Annual joint Conf of the IEEE computer and communications societies,
vol. 3. pp.1567–1576 (2002)
45. Van, T.D., Langendoen, K.: An adaptive energy-efficient MAC protocol for wireless sensor
networks. In: Proc of the 1st Intl Conf on embedded networked sensor systems, pp. 171–180.
ACM, New York, USA (2003)
46. Liu, Z., Elhanany, I.: RL-MAC: A reinforcement learning based MAC protocol for wireless
sensor networks. Intl. J. Sensor Networks 1(3), 117–124 (2006)
47. Shen, Y.J., Wang, M.S.: Broadcast scheduling in wireless sensor networks using fuzzy hopfield
neural network. Expert Syst. Appl. 34(2), 900–907 (2008)
48. Kim, M., Park, M.G.: Bayesian statistical modeling of system energy saving effectiveness for
MAC protocols of wireless sensor networks. In: Software engineering, artificial intelligence,
networking and parallel/distributed computing, studies in computational intelligence, vol. 209,
pp. 233–245. Springer. (2009)
49. Chu, Y., Mitchell, P., Grace, D.: ALOHA and q-learning based medium access control for
wireless sensor networks. In: Intl symposium on wireless communication systems, pp. 511–515
(2012)
2 An In-Depth Analysis of Cyber-Physical Systems: Deep Machine … 55
50. Sha, M., Dor, R., Hackmann, G., Lu, C., Kim, T.S., Park, T.: Self adapting MAC layer for
wireless sensor networks. Technical Report WUCSE-2013–75, Washington University in St.
Louis. Tech Rep (2013)
51. Dash, S., Saras, K., Lenka, M.R., Swain, A.R.: Multi-token based MAC-Cum-routing protocol
for WSN: A distributed approach. J. Commun. Softw Syst., 1–12 (2019)
52. Kumar, L.S., Panda, G.K., Tripathy, B.K.: Hyperspectral images: A succinct analytical deep
learning study. In: Deep learning applications in image analysis. Studies in big data, vol. 129,
pp.149–171. Springer, (2023)
53. Mpitziopoulos, A., Gavalas, D., Konstantopoulos, C., Pantziou, G.: A survey on jamming
attacks and countermeasures in WSNs. IEEE. Commun. Surv & Tutor. 11(4), 42–56 (2009)
54. Yin, D., Zhang, L., Yang, K.: A DDoS attack detection and mitigation with software-defined
Internet of Things framework. IEEE Access 6, 24694–24705 (2018)
55. Buduma, N., Locascio, N.: Fundamentals of deep learning: Designing next-generation machine
intelligence algorithms. O’Reilly Media, Inc., O’Reilly (2017)
56. Sarker, I.H.: Deep learning: a comprehensive overview on techniques, taxonomy, applications
and research directions. SN Comput. Sci. 2, 420 (2021)
57. Irvine The UCI KDD Archive, University of California. KDD Cup 1999 Data, http://www.
kdd.ics.uci.edu/databases/kddcup99/kddcup99/html/ [Accessed 20 April 2023]
Chapter 3
Unsupervised Approaches in Anomaly
Detection
Abstract Industry 4.0 is a new industrial stage based on the revolution brought about
by the integration of information and communication technologies (ICT) in conven-
tional manufacturing systems, leading to the implementation of cyber-physical
systems. With Industry 4.0 and cyber-physical systems, the number of sensors and
thus the data from the monitoring of manufacturing machines is increasing. This
implies an opportunity to leverage this data to improve production efficiency. One
of these ways is by using it to detect unusual patterns, which can allow, among other
things, the detection of machine malfunctions or cutting tool wear. In addition, this
information can then be used to better schedule maintenance tasks and make the
best possible use of resources. In this chapter, we will study unsupervised clustering
techniques and others such as nearest neighbor methods or statistical techniques for
anomaly detection that can be applied to machining process monitoring data.
3.1 Introduction
One of the applications of machine learning is anomaly detection. This task requires
being able to identify anomalous behavior from non-anomalous behavior, which is
not always trivial. The normal operating conditions of the industry 4.0 machines can
vary, machines allow working with a multitude of parts as well as with different mate-
rials and with different production sizes that will produce different monitoring data.
This makes it impossible to know a priori which data are outside normal behavior.
In addition, the data from this monitoring of machine operation are unbalanced,
with much more data corresponding to normal operating behavior than to unusual
operating behavior, which complicates their analysis. This situation means that in
machine learning, anomaly detection is normally treated as a non-machine learning
problem. This situation means that within machine learning, anomaly detection is
normally treated as an unsupervised or semi-supervised learning problem (with only
a few examples that are usually (having only a few examples that are usually of usual
behavior).
Unsupervised learning is an important branch of machine learning with several
applications. Techniques that fall under the umbrella of unsupervised learning do
not assume that samples are labeled (for classification tasks) or have one or more
associated values to predict (for regression tasks). Therefore, they cannot be used to
design classifiers or regressors; they are used to find groupings of the data based on
one or more criteria (e.g. Euclidean distance). That is why they can help us to divide
data sets into two or more groups, as well as to detect outliers [1]. In anomaly detection
tasks, unsupervised learning techniques help to identify patterns that are considered
normal. For each regularly observed pattern associated with the normal operation
of the system under observation, the unsupervised learning technique used may find
several clusters. The idea is that unsupervised outlier detection approaches score data
based solely on the inherent properties of the dataset. In all unsupervised learning
tasks, we want to learn the natural structure of our data without using specially
given features. Unsupervised learning is useful for exploratory analysis because
it can automatically identify data structure. For example, if analysts are trying to
segment consumers, unsupervised consumers, unsupervised clustering techniques
would be a good starting point for their analysis. In situations where it is impossible
or impractical for humans to suggest trends in data, unsupervised learning can provide
initial insights that can then be used to test hypotheses. Unsupervised learning avoids
the need to know which of the collected data are anomalous and requires less data to
train. These algorithms allow unusual data to be defined dynamically and avoid the
need for extensive knowledge of the application domain.
In addition, it is not only necessary to be able to identify atypical behavior. As
mentioned above, industry 4.0 machines make different parts during their opera-
tion that will result in different measurements and the data does not always contain
information in this regard. This makes it useful to be able to identify common,
repeating patterns that correspond to specific part-processing signatures. One of the
most used unsupervised machine learning techniques is clustering, where data is
grouped according to a similarity measure.
Clustering techniques have been widely used for both static and dynamic data.
Within the dynamic data, we find time series. This type of data is very common in
a multitude of domains, including industry. The characteristics of time series can
vary the way of tackling the problem. There is no one learning technique that is
better than another for any given problem, which implies that it is required to test
3 Unsupervised Approaches in Anomaly Detection 59
which technique is more effective for each problem. The final objective of this study
is to detect anomalies in the data flow of a software-defined network. Initially, an
unsupervised dataset is available, with different observations on the traffic flow of
the software-defined network, which is examined and analysed. For this purpose,
feature engineering is employed on the set, using certain technologies, applying a
transformation on the data, and obtaining as a result a valid set for analysis in the
following phases.
3.2 Methodology
Unsupervised learning does not know what class the data belongs to and its objective
is to discover hidden patterns in the data. It has no direct feedback and one of the
tasks of unsupervised learning is clustering.
Clustering is one of the most widely used techniques for pattern discovery of
patterns. Clustering is the process of unsupervised partitioning of a dataset D = {{F1 ,
F2 , …, Fn } within k groups C = {C1 , C2 , …, Ck } according to a similarity measure
that maximizes the similarity between objects in the same group and minimizes the
similarity with the data of the rest of the groups.
Objects within the same group must share characteristics or be related, have small
differences between them, or at least be related to each other, have small differences
between them, or at least be related to other objects in the group. The data set
is considered to be groupable when there are continuous regions with a relatively
high density, surrounded by a relatively high density and other continuous regions
surrounded with a lower density.
In numerical data clustering, two types of groups can be distinguished.
– Compact Groups: all objects in the group are similar to each other and the group
can be represented by its center.
– Chained Groups: each object in the group is more similar to another member of
the group than to any other object in the other groups and can connect two objects
of the group using a path.
In the modeling of the problem, the definition of the group as well as the separation
criteria must be determined. Clustering methods are composed of several elements
(see Fig. 3.1).
Clustering
components
Evaluation
Representation Distant Algorithm
measures
Characteristi
Original data Clustering
cs selection Results
algorithm Validation
and visualization
design
extraction
similarity between pairs of data, and to form clusters of elements, as well as to eval-
uate the results. All these techniques are not always compatible with each other or
work equally well. Some algorithms may present various configurations of clusters,
depending on some other criterion such as the order in which the data are analyzed.
The criteria for grouping the data, the spacing between clusters, the similarity
measure, or the space in which they work are often used to compare clustering
methods. The choice of these components (representation method, algorithm, simi-
larity measure, and evaluation measure) will depend on the problem. A method of
clustering that works equally well for any situation does not exist.
Clustering algorithms can be classified into different types, the best known of which
are
– Partition
– Density
– Grid
– Hierarchical
– Model-based
In addition, several algorithms of different types can be combined to perform multi-
step clustering. Each of these types of algorithms has several advantages and disad-
vantages, which make them more or less suitable depending on the problem. The
choice of algorithm will depend on the data to be clustered. In Table 3.1 all types of
algorithms considered in this chapter are summarized.
Partitioning methods divide the data into k groups where each group contains at
least one element. The clusters are created all at once and there are no hierarchical
relationships between the groups obtained.
To create these groups, a set of representative elements, also called prototypes
of each of the groups, is used. These representatives can belong to the group or be
created from the elements that compose it. However, the choice of these prototypes
to achieve an optimal partitioning of the elements is unknown. Therefore, partition-
based algorithms follow a two-step iterative approach. From the initially chosen
prototypes, the elements are assigned to the cluster of the closest prototype (Assign-
ment step) and after that, the prototypes are recalculated (Optimization step). These
steps are repeated until a predefined requirement is met, such as an error or a limit
on the number of iterations. The effectiveness of the method used depends not only
on the prototype that is defined but also on the update method used to recalculate the
62 J. R. B. Higuera et al.
prototypes after each iteration of the algorithm. These algorithms are divided into
hard clustering when each element belongs to one and only one group, and fuzzy
clustering, when each element is assigned a percentage of probability of belonging
to each of the clusters. They have low complexity, are fast, and usually give good
efficiency, however, they are not suitable for non-convex data and require knowledge
of the number of partitions. In addition, their efficiency is determined by the proto-
type used. The best-known algorithms in this category are k-means, where the group
mean is used as the prototype, and k-medoids, where the group medoid and its fuzzy
approaches such as fuzzy c-means are used. In Fig. 3.3, an example of clustering
partition using the k-means algorithm can be examined.
These algorithms group data according to their connectivity and density; regions
with high density belong to the same group. In other words, an element can continue
to expand the group with its nearby elements when its neighborhood, which is the
number of elements close to it, exceeds the threshold. They have high efficiency and
64 J. R. B. Higuera et al.
Fig. 3.3 Example of clustering partition based using k-means algorithm [2]
are capable of clustering data with different shapes, but their results worsen when
the density of the data space is not uniform and depends on the input parameters.
There are two approaches to density-based clustering; density-based connectivity
using algorithms such as DBSCAN and OPTICS and a second type which is based
on the density function, which is applied in algorithms such as DENCLUE, which
also uses the influence function.
group and selecting the minimum (this allows us to obtain clusters with a more
elongated shape), it is defined as in Eq. (3.1).
• Complete Link. When the distance between 2 clusters is the longest distance
between each pair of the elements that compose the groups (obtaining clusters
with a more spherical shape). It is defined as shown in Eq. (3.2).
• Average Link. In this case, the distance between the groups is the average distance
between each pair of objects in both groups.
• Distance to centroid. The center of each group (centroid) is determined and
the distance between the two groups is calculated as the distance between their
centroids.
• Ward link. Merges the two groups that account for a minimal increase in variance.
This is calculated by comparing the variance of the groups before merging and
after merging to find the pair of groups with the minimum increase in variance.
To determine which groups to merge, the Lance-Williams formula can be used.
Depending on the type of bond used, the parameters of the formula are represented
in Table 3.2.
Table 3.2 Parameters used according to the type of bond in the Lance-Williams formula
Methods αi αj β γ
Simple 1/2 1/2 0 –1/2
Complete 1/2 1/2 0 1/2
|Ci | |C j |
Average |Ci |+|Ci | |Ci |+|Ci | 0 0
|Ci | |C j | |Ci |·|C j |
Centroid |Ci |+|C j |
0
|Ci |+|C j | (|Ci |+|Ci |)2
|Ci |+|Ck | |Ci |+|Ck | |Ck |
Ward |Ci |+|Ci +|Ck || |Ci |+|Ci +|Ck || |Ci |+|Ci +|Ck || 0
66 J. R. B. Higuera et al.
The main problem with these algorithms is that once a cluster is merged or split, it
is not possible to go backward, which negatively affects the quality of the clustering
and makes them often used in hybrid clustering approaches. Although the complexity
of these algorithms is high, they are deterministic algorithms that do not require
knowledge of the number of clusters nor do they require the use of a prototype, and
have a high visualization capacity, allowing the representation of different clusters
and their relationships using dendrograms (see Fig. 3.4). These dendrograms allow
us to visualize the hierarchy of the clusters. However, they are not suitable from a
moderate number of objects onwards, as the tree loses visualization capacity as the
number of objects increases. Within this type, some of the best-known algorithms
are Birch and Cure.
These types of algorithms divide or quantize the space of the clustering elements
into cells and perform a clustering of the cells (see Fig. 3.5). In other words, these
algorithms focus on the data space instead of the data to perform the clustering.
They tend to have low complexity and are highly scalable and can take advantage
of parallel processing. However, they are sensitive to the number of cells into which
the space is divided. The smaller the number of cells, the higher the counting speed,
but the lower the clustering accuracy.
They consist of a series of basic steps:
1. Divide the space into a number of cells finite.
3 Unsupervised Approaches in Anomaly Detection 67
which the data belong. Probabilistic models allow the representation of subpopula-
tions within a population. The most common component distribution for continuous
data is the multi-variate Gaussian, giving rise to Gaussian mixture models (GMM).
The model based on neuronal networks are algorithms such as SOM (self-
organizing maps) that consist of a single-layer neural network where the clusters are
obtained by assigning the objects to be grouped to the output neurons (see Fig. 3.6).
This is competitive unsupervised learning and requires as parameters the number of
clusters and the grid of neurons. In a SOM network, the input layer and the output
layer are fully connected. In SOMs, data are assigned to their nearest centroids and
when the centroids are updated, the centroids are also updated. The objects that are
close to the centroid are also updated when the centroids are updated. It thus presents
a projection of the input space to a two-dimensional neuron map. It has the advantage
that it is easy to visualize, one way to visualize it is through the Sammon projection.
In addition to SOM, neural network-based clustering has also been performed
using learning Kohonen quantization (LVQ) vectors and with adaptive resonance
models (ART).
More approaches have been used for clustering among which are the clustering based
on graphs where the nodes represent the relationship between the points. Within this
approach are algorithms such as CLINK, spectral clustering where a similarity graph
is constructed, and then a spectral embedding is performed (applying eigenvectors
3 Unsupervised Approaches in Anomaly Detection 69
In general, in clustering problems the labels in the data are unknown. In this case,
external indexes cannot be used, and instead internal indexes are used to measure the
goodness of the clustering structure. A criterion for comparing clustering algorithms
is based on 3 aspects; the way the groups are formed, the structure of the data, the
sensitivity to the data, and the sensitivity to the parameters of the clustering algorithm
used. The objective is to maximize the similarity within the group (cohesion) and
minimize the similarity between the different groups (separation). Separation can
be measured by calculating the distance between centers or the minimum distance
between pairs of objects of different groups. Therefore, validation metrics are based
on measuring cohesion, separation, or both.
For this, there are mainly two types of validation:
• External Index can be used when the truth is known (to which cluster the data
belong), where the obtained solution is compared with the real one. Some external
indexes are the purity of the group, the rand index, or the entropy, among others.
• Internal indexes do not use the ground truth to evaluate the result of the clustering
process. These are based on evaluating high similarity between data of the same
group and low similarity between different groups. These indexes include, among
others, the silhouette index and Dunn’s index.
In Table 3.3, the main validation metrics are summarized.
In general, in clustering problems the labels in the data are unknown. In this case,
external indices cannot be used, and instead internal indices are used to compute
the kindness of the clustering structure. There are eligibility criteria for comparing
clustering algorithms based on 3 aspects; how groups are formed, the structure of
the data, and the sensitivity to the parameters of the clustering algorithm used. The
objective is to maximize the similarity within the group (cohesion) and minimize
the similarity between the different groups (separation). Separation can be measured
by calculating the distance between centers or the minimum distance between pairs
of objects in different groups. Therefore, validation metrics are based on measuring
cohesion, separation, or both.
In [11] the performance of the evaluation measures was compared in terms of
various characteristics that the data may present to obtain the ideal number of
the groups (for the comparison they used the K-Means algorithm, except for the
70 J. R. B. Higuera et al.
skewed data where the experiment was performed with Chameleon) reaching several
conclusions:
• Monotonicity: refers to how indices behave as the number of groups increases,
indices that only compare one characteristic, separation, or cohesion increase or
decrease steadily as the number of data increases while other indices reach a
maximum or minimum when the correct number of groups is found.
• Noise: indexes that use minimum and maximum distances to calculate cohesion
and separation are more sensitive to noise.
• Density: in general, most indexes work well for different data with different
densities.
• Impact of subgroups: A subgroup is a group that is enclosed in another group
where there is more than one subgroup. Indices that measure separation obtain
maximums when subgroups are considered as a single group, when subgroups
are considered as a single group which leads to incorrect results.
• Skewed distributions: When there are very large groups and very small groups, in
general, most indices work well with skewed data, however, the Calinski-Harabasz
index does not work well with this type of data.
The study revealed that of the indices it compared only S Dbw performed well for all
of these characteristics. For arbitrary shapes, many of these measures do not perform
well when measuring group separation and cohesion through the center of the group
or pairs of points.
72 J. R. B. Higuera et al.
These measures can be calculated using the contingency matrix (see Table 3.4) where
the columns of the matrix represent the clusters obtained and the rows are used for
the labels. The cells of the matrix nij represent the number of clusters obtained and
the rows are used for the class labels of the objects, thus the cells of the matrix nij
represent the number of objects in the cluster j that belong to class i. of objects in
cluster j that belong to class i:
• Purity. It is used to measure the homogeneity of the labels in the clusters obtained,
that is, if the majority of objects in the group belong to the same class. To calculate
it, the purity of each cluster is first calculated using the Eq. (3.4).
1
Pj = x maxi (n ij ) (3.4)
nj
For example, the purity of a group j is the maximum number of objects in the
cluster that belong to the same class i. Once the purity of each cluster has been
calculated, we obtain the purity of the cluster purity is obtained by Eq (3.5).
∑
k
nj
Puri f y = · Pj (3.5)
j=1
n
where k is the number of clusters, nj is the number of objects that have been grouped
in cluster j, and n is the number of total objects.
• Entropy (H). It is like purity and is used to measure the homogeneity of the
labels in the clusters obtained. Both methods are frequently validated in K-Means.
Similar to the purity to calculate the entropy, first the entropy associated with each
cluster j is calculated with Eq. (3.6).
∑
c
n ij n ij
H= × log (3.6)
i=1
nj nj
∑
k
nj
Entr opy = x Hj (3.7)
j=1
n
∑
k
nj 2xRecall(i, j ).Precsision(i, j )
V alor − F = max (3.8)
i=1
n recall(i, j ) + Precsision(i, j )
Once the asymmetric dataset is obtained, the model can be connected to the SMOTE
module. An unbalanced dataset can have many causes. Perhaps the target category
has one set of data in the population, or the data is complicated to collect. You
can ask SMOTE for analyzing an underrepresented category. The output of the
module contains the original sample and additional samples. These new samples
are composite minority samples. Before starting the technique, you must determine
the number of these synthetic samples.
If the classification of information is not the same, we can talk about unbalanced
information. This is a classification task and results in several problems with the
model output. For example, a binary classification task has 100 instances. Class 1
contains 80 marked specimens. On the other side, the remaining labeled sample is
class 2. This is a simple example of an unbalanced dataset. The ratio of 1st class to
2nd class would present 4:1.
If we talk about real test data or Kaggle competitions, the problem of class imbal-
ance is very common. A real classification problem implies some classification imbal-
ance. This usually happens when there is no matching profile for the category. There-
fore, it is important to choose the right evaluation metric for your model. If the model
has an asymmetric data set, its output is meaningless. But when this model is applied
to real problems, the end result is waste. There is always class imbalance in different
situations. A good example is looking at fraudulent and non-fraudulent transactions.
You will find fraudulent transactions. This is the problem.
74 J. R. B. Higuera et al.
If we talk about real test data or Kaggle competitions, the problem of class imbalance
is very common. A real classification problem implies some classification imbalance.
This usually happens when there is no matching profile for the category. Therefore,
it is important to choose the right evaluation metric for your model. If the model has
an asymmetric data set, its output is meaningless. But when this model is applied to
real problems, the end result is waste. There is always class imbalance in different
situations. A good example is looking at fraudulent and non-fraudulent transactions.
You will find fraudulent transactions. This is the problem.
Here are some of the benefits of SMOTE:
• Information is kept.
• This technique is simple and can be easily understandable and implemented in
the model.
• This improves overfitting for synthetic examples. This helps to create new
instances instead of copying.
Dup_size and K are two parameters of SMOTE. If you want to understand Dup_size
and K, you need to learn how SMOTE works. SMOTE works from the perspective of
existing cases and creates new ones randomly. The function creates a new instance
at a specified distance from a neighboring instance. However, it is not yet clear how
SMOTE will treat its neighbors in each established minority.
• The function considers the nearest neighbor at K = 1.
• The function considers nearest neighbor and nearest neighbor at K = 2.
Often, SMOTE initially experiences minority manifestations. Although loop iteration
is an instance, the pattern creates a new instance between the original instance and
its neighbors. The dup_size parameter specifies the number of times the SMOTE
function will loop over the first instance. For example, if dup_size = 1, the model
will only synthesize four new data points, and so on. Finally. When building predictive
models in ML, you may encounter non-balanced datasets. This data affects the output
of the model. This problem can be solved by oversampling a small amount of data.
So instead of generating duplicate data, use the SMOTE algorithm and generate
synthetic data for oversampling. Here are some variations of SMOTE.
• Borderline-SMOTE
• SMOTE-NC.
A study by Naung et al. [13] is an example of the use of ANNs and SMOTE in simple
ANN-based DDoS attack detection using SMOTE for IoT environments. In recent
years, with the rapid development of the IoT era, attackers have mainly targeted the
3 Unsupervised Approaches in Anomaly Detection 75
Internet of Things environment. They optimize his Internet of Things devices as bots
to attack target organizations, but due to limited resources to manage effective defense
mechanisms on these devices, these devices are easily infected with IoT malware.
Highly dangerous Internet of Things malware such as Mirai conducts DDoS attacks
against targeted organizations using infected Internet of Things devices. Although
many security mechanisms have been implemented in IoT devices, there is still a
need for effective Internet of Things environment sensing systems. This detection
system uses public datasets, machine learning techniques and a simple artificial
neural network (ANN) architecture to detect such attacks. Bot Internet of Things, a
modern botnet attack dataset, is used to detect DDoS attacks, but the dataset contains
a small amount of benign data and a large amount of attack data, which makes it
difficult to detect inaccurate data. Big issues like balance need to be addressed. In this
work, Synthetic Minority Oversampling Technique (SMOTE) is used to solve the
data imbalance problem to implement his machine learning based DDoS detection
system. The results show that the proposed method can effectively detect DDoS
attacks in Internet of Things environments.
In a study by Joloudari et al. [14], an efficient imbalance learning based on SMOTE
and convolutional neural network classes, SMOTE is used to solve imbalanced data
sets. Data imbalance (ID) is a problem that prevents machine learning (ML) models
from achieving satisfactory results. ID is a situation where the number of samples
belonging to one class significantly exceeds the number of samples belonging to
another class. In this case, the learning of this state will be biased towards the majority
class. In recent years, several solutions have been proposed to solve this problem,
choosing to generate new synthetic configurations for minority classes or to reduce
the number of majority classes to balance the profiles. Therefore, in this study, the
effectiveness of methods based on a hybrid of deep neural networks (DNNs) and
convolutional neural networks (CNNs) as well as several well-known solutions for
unbalanced data involving oversampling and undersampling are investigated. Then,
together with SMOTE, a CNN based model for efficient processing of unbalanced
materials are presented. For evaluating the method the KEEL, Breast Cancer, and
Z-Alizadeh San datasets are used. To obtain reliable results, 100 experiments using
randomly shuffled data distributions are perfomed. The classification results show
that Hybrid SMOTE outperforms various methods in normalized CNN and achieves
99.08% accuracy on 24 unbalanced datasets. Therefore, the proposed hybrid model
can be applied to non-balanced binary classification problems in other real datasets.
The purpose of this section is to detect anomalies in the data flow of a software-defined
network. Initially, an unsupervised dataset is available, with different observations on
the traffic flow of the software-defined network, which is examined and analyzed. For
76 J. R. B. Higuera et al.
this purpose, feature engineering is employed on the set, using certain technologies,
applying a transformation on the data, and obtaining as a result a valid set for analysis
in the following phases. Once this phase of the data has been built, machine learning
algorithms to be used are studied. Subsequently, the best combination of parameters
to be applied to these algorithms is sought, comparing them with each other and
generating the most optimal models possible, which can group the data samples with
similar characteristics and obtain a valid set for analysis in the following phases
and detect anomalies in the flow, thus meeting the established objectives. Through
the models, we evaluate the results obtained with the scores of the different internal
metrics selected. Finally, a comparison of the algorithms used, based on the results,
execution times, and ease of understanding, highlights the most optimal and efficient
one.
3.4.1 Method
The method to follow as shown in Fig. 3.7 consists of the following steps:
1. Data collection
– Collection, description, and exploration of the data.
– Verification of data quality.
2. Data preparation
– Construction of the final data set encompassing all the necessary activities of
data selection, cleaning, construction, integration, and formatting.
3. Modeling
– Determination of evaluation metrics.
Data Data
Modeling Conclusions
recolection preparation
– Determination of hyperparameters.
– Creation of the different models.
– Evaluation of the results of each model.
4. Conclusions
– Consideration of the results obtained against the established objectives.
– Conclusions and lessons learned.
The dataset used for this project is a dataset already generated and downloaded
from the following link [7]. It consists of three files with.csv extension, two of which
contain attack data traffic (OVS.csv and metasploitable-2.csv) and the third of normal
data traffic (Normal_data.csv). normal data traffic (Normal_data.csv). The first two
are considered attacks on the OVS and attacks targeting the Metasploitable-2 server,
respectively. These three files are put together to form a uniform data set. For this
purpose, the Python library is used. This allows the data to be stored in an object called
DataFrame, thus being able to form the data set. It allows working with large volumes
of data, providing facilities when querying any column, row, or specific data. This set
results in 343,889 annotations of data traffic flow, corresponding to the DataFrame
flags, of which 138,722 belong to the OVS.csv file, 136,743 to the metasploitable-
2.csv file and 68,424 to the Normal_data.csv file, 136,743 to metasploitable-2.csv
and 68,424 to Normal_data.csv. In addition, it contains 84 features, corresponding
to the columns of the DataFrame. The dataset used is public and attack-specific. It
aims to provide its practice for anomaly detection systems applied in SDN networks,
to verify the performance of intrusion detection systems. It contains categories of
benign attacks, as well as different situations that can occur in the SDN platform
scenario.
After scanning the data, this section will check the quality of the data. To do this,
we first check that the data set is complete by examining that it does not have any
null values. In addition, it is also checked that it does not present variables with
values such as “NaN”. For this purpose, different techniques are applied, such as the
function is null (), and isna() from the Pandas library or even with a heat map from
the seaborn() library.
The purpose of this task is to generate, from the originally captured data, derived
attributes, new records, or transformed values of existing attributes preparing the
input to the modelling tools according to the requirements. The objective is to trans-
form all variables into numerical. For example, this operation is performed for the
source and destination port variables, ‘Src Port’ and ‘Dst Port’. In this case, although
the port values are divided into three ranges, two variables are generated for the
78 J. R. B. Higuera et al.
source port and two for the destination port, applying the same pattern as for the IP
addresses. as for the IP addresses. This optimizes the data set and avoids unnecessary
correlations.
3.4.1.3 Modelling
When starting with the construction of the model, we start from a base in which
the dataset used is completely unlabelled. For this purpose, different tests and runs
of the of the different selected algorithms, exchanging the different parameters of
each one of them. This phase is called hyperparametrization since the values used to
configure the model are called hyperparameters. This term is defined as adjustable
parameters that allow control of the training process of a model. They are values that
are generally not obtained from the data,
Since the optimal value is unknown, it is necessary to use generic values, values
that have worked correctly in similar problems, or to find the best or to find the best
option based on trial and error. While, on the other hand, the parameters are the
variables that are estimated during the training process with the data sets. Therefore,
these values are obtained, and not provided manually.
Selected algorithms are:
1. K-means
2. DBSCAN
3. SOM
The validation techniques selected are:
• Silhouette
• Davies Bouldin
• Calinski and harabasz
K-means [6]. The function used to run this algorithm is provided by Scikit-Learn,
KMeans. The parameters to be taken into account, considering the most important
ones of this algorithm are:
– n_clusters: represents the number of clusters to be formed, as well as the number
of centroids to be generated. A range of values between 2 and 5 is provided,
through a for loop, to perform several executions varying these values. The values
provided are low since the objective is to obtain a low number of differentiated
data sets.
– init: Represents the initialization method. It admits different values:
k-means + + : selects the initial cluster centers for clustering intelligently to
speed up convergence.
random: chooses random observations or flasks from the data for the initial
centroids.
Tests are performed considering both values.
3 Unsupervised Approaches in Anomaly Detection 79
To obtain the results of this algorithm, several difficulties have been encoun-
tered. The main one has been the size of the data set used since this implementation
massively calculates all the neighborhood queries, and therefore, it increased the
complexity of the memory, so much so that the execution process could not be
carried out. Several solutions have been tried to obtain good results. The first one
has been, an estimated adjustment of the hyperparameters mentioned above, even
performing the famous “elbow technique”, which technique, to provide a reasonable
value for the EPS parameter. In addition, solutions proposed by Scikit have been
tested, solutions proposed by Scikit-learn have been tested, such as pre-calculating
the sparse neighbourhoods in fragments and thus using the metric with a ‘precom-
puted’ value. The way used to obtain good results, as shown in Table 3.5, has been
to reduce the data set by 40, 30, and 10%, respectively which is not an optimal way,
but to demonstrate that the algorithm is very efficient with smaller data sets. PCA
algorithm has been used for reducing the dimensionality.
SOM. In this case, the function used to execute this algorithm is Minisom provided
by the Minisom library [16]. The parameters to be considered as the most important
of this algorithm are:
– x: Dimension x of the SOM.
– y: Dimension y of the SOM.
– input_len: Number of the elements of the input vectors. The number of features
of the dataset used is provided.
– random_seed: Random seed to be used. Set in the same way as in the previous
algorithms.
First, it is worth mentioning the dimensionality reduction performed by applying the
PCA algorithm previously. The results obtained from this process are identical to
the previous algorithm since it is applied directly to the initial data set. As for the
performance obtained for this algorithm, it is shown in Table 3.7 through the results
for the different metrics. The first column ‘shape’ indicates the size of the algorithm
dimensions. This variable is the most important since at first glance it can be seen
that as the size of the dimensions increases, the scores of the metrics shown in the
following columns improve considerably. That is to say, for Silhouete, Calinski, and
Harabasz the values increase according to reaching their most optimal values, while
for Davies Bouldin the values decrease in seeking the same way to reach the most
optimal value. In view of this reasoning, it is worth noting that as the size of the
dimensions increased, the execution times intensified, being not very efficient and
3 Unsupervised Approaches in Anomaly Detection 81
very costly to obtain results. On the other hand, the size of the clusters obtained
varies proportionally to the established dimensions. Table 3.7 shows SOM results.
3.4.1.4 Conclusions
In this last phase of the methodology, an evaluation and critical analysis of the
models created in the previous phase will be carried out. It compares the different
algorithms used, basing the comparison on execution times, ease of understanding,
and results obtained for the scoring of the different metrics used. With all the informa-
tion provided from the previous phase, the k-means algorithm is taken as a reference.
The k-means algorithm is taken as the one that provides the best results, as it is also
the one that requires the shortest execution time. On the other hand, it is the easiest
algorithm to understand due to its simplicity and low variable modification necessary
to obtain optimal results.
Continuing with the SOM algorithm, the results are not entirely optimal. It should
be noted that by increasing the dimensions of the algorithm, the results are optimized,
but on the other hand, the execution times increase. These have been the worst of all
the models used because the scikit-learn library does not have an implementation of
this algorithm, as well as the dimensions of the data provided. Finally, it should be
noted that it is quite simple to understand both the operation and the parameters to
set in the function.
Finally, DBSCAN, as discussed in the previous phase, has yielded results out of
the reach of the efficient and optimal, in terms of metric scores. The complexity of
understanding the parameters to be set should be highlighted, since in min_samples it
is advisable to have prior knowledge of the subject, as for eps, which would facilitate
the choice of the values of the same, since, to adjust these values with large amounts
of data, the algorithm can be heavy and quite inefficient.
After a final evaluation of the generated models, the K-Means model is the one
that best fits the project objectives, grouping the network traffic of the data set into a
compact and homogeneous number of clusters. In addition, it is best suited to large
amounts of data, without increasing execution times too much, so it could be used
in any field that requires data analysis and more specifically to detect anomalies in
data traffic. It is also easy to understand and implement, which is always gratifying.
On the other hand, it is interesting to note that other algorithms such as DBSCAN
can also be used in similar domains to the one developed in this work due to their
82 J. R. B. Higuera et al.
high efficiency in terms of clustering observations into similar groups, although its
efficiency improves in smaller amounts of data than those presented in this work.
To work with time series there are currently several libraries and resources that allow
the preparation of data, the use of algorithms such as those indicated in this document,
as well as the elaboration of mathematical models. These include Datetime, Pandas,
Matplotlib, MatrixProfile, Numpy, Ruptures, Plotly, Stlearn and Sklearn.
– Datetime. It is a module that allows the manipulation and management of dates.
– Pandas. It is a Python package that allows you to work with structured data,
creating fast, adaptable, and expressive data structures.
– Maplotlib. It is a library that contains a wide variety of graphics and allows the
creation of two-dimensional graphics.
– MatrixProfile. Provides accurate and approximate algorithms for calculating the
matrix profile of a time series, as well as for determining discords and motifs in
the time series from it, and tools for visualizing the results.
– Numpy. It is a package that provides general-purpose array processing, i.e. a high-
performance multidimensional array and methods for handling them that allow
for easy computations.
– Ruptures. It is an offline change point detection library that provides approximate
and accurate detection for parametric and non-parametric models.
– Plotly. A library for interactive visualisation with a wide variety of advanced
graphics.
– Tslearn. It is a Python package for machine learning of time series. Among its
many modules are time series metrics, including DTW and variants; a clustering
module including K-means; a reprocessing module, including time series repre-
sentations such as PAA and SAX; and a Shapelet-based algorithm package that
requires Keras.
– Sklearn. Classification, regression, clustering, dimensionality reduction, and
preprocessing algorithms (such as standardization and normalization) are included
in this open-source library. It also includes techniques for comparing, validating,
and choosing parameters for models. In addition to internal indices such as
the silhouette coefficient, Calinski-Harabasz, and the Daves-Bouldin index, it
includes clustering algorithms such as K-Means, affinity propagation, mean shift,
spectral clustering, the Ward method, agglomerative clustering, and Gaussian and
Birch mixtures.
3 Unsupervised Approaches in Anomaly Detection 83
References
1. Kibish, S.: A note about finding anomalies [Internet]. Medium. (2018). [Visited 23 May 2023].
Available on https://towardsdatascience.com/a-note-about-finding-anomalies-f9cedee38f0b
2. Berzal, F.: Partition based clustering. [Visited 23 May 2023]. Available on https://elvex.ugr.es/
idbis/dm/slides/41%20Clustering%20-%20Partitional.pdf
3. Isaac, J.: Cluster jerarquico. (2021). [Visited 23 May 2023]. Available on https://rpubs.com/jai
meisaacp/760355
4. Bandaru, S., Kalyanmoy, D.: Towards automating the discovery of certain innovative design
principles through a clustering-based optimization technique. Eng. Optim. 43, 911–941 (2011).
https://doi.org/10.1080/0305215X.2010.528410
5. Sancho, F.: Self Organizing Maps (SOM) in NetLogo. (2021). [Visited 23 June 2023]. Available
on https://www.cs.us.es/~fsancho/?e=136
6. K-means.: [Visited 13 November 2023]. Available on https://scikit-learn.org/stable/modules/
generated/sklearn.cluster.KMeans.html
7. DATASET.: [Visited 13 November 2023]. Available on https://aseados.ucd.ie/datasets/SDN/
8. DBSCAN.: [Visited 13 November 2023]. Available on https://www.kaggle.com/code/meetna
gadia/dbsc [Visited 13 November 2023]. Available on: an-clustering
9. SOM.: [Visited 13 November 2023]. Available on https://www.kaggle.com/code/asparago/uns
upervised-learning-with-som
10. Masich, I., Rezova, N., Shkaberina, G., Mironov, S., Bartosh, M., Kazakovtsev, L.: Subgroup
discovery in machine learning problems with formal concepts analysis and test theory
algorithms. Algorithms 16, 246 (2023). https://doi.org/10.3390/a16050246
11. Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of internal clustering validation
measures. In: 2010 IEEE international conference on data mining, Sydney, NSW, Australia,
pp. 911–916 (2010). https://doi.org/10.1109/ICDM.2010.35
12. Kashef, R.: Scattering-based quality measures. In: 2021 IEEE international IOT, electronics
and mechatronics conference (IEMTRONICS), Toronto, ON, Canada, pp. 1–8 (2021). https://
doi.org/10.1109/IEMTRONICS52119.2021.9422563
13. Soe, Y.N., Santosa, P.I., Hartanto, R.: DDoS attack detection based on simple ANN with
SMOTE for IoT environment. Fourth International Conference on Informatics and Computing
(ICIC) 2019, 1–5 (2019)
14. Joloudari, J.H., Marefat, A., Nematollahi, M.A., Oyelere, S.S., Hussain, S.: Effective class-
imbalance learning based on SMOTE and convolutional neural networks. Appl. Sci. 13, 4006
(2023). https://doi.org/10.3390/app13064
15. DBSCAN.: [Visited 13 November 2023]. Available on https://scikit-learn.org/stable/modules/
generated/sklearn.cluster.DBSCAN.html
16. MiniSOM.: [Visited 13 November 2023]. Available on https://pypi.org/project/MiniSom/
17. Jan, A., Muhammad Khan, G.: Real world anomalous scene detection and classification using
multilayer deep neural networks. Int. J. Interact. Multimed. Artif. Intell. 8(2), 158–167 (2023).
https://doi.org/10.9781/ijimai.2021.10.010
18. Deore, M., Kulkarni, U.: MDFRCNN: Malware detection using faster region proposals convo-
lution neural network. Int. J. Interact. Multimed. Artif. Intell. 7 (4), 146–162 (2022). https://
doi.org/10.9781/ijimai.2021.09.005
Chapter 4
Profiling and Classification of IoT
Devices for Smart Home Environments
Abstract The goal of this study is to create a strong categorization system specif-
ically designed for Internet of Things (IoT) device profiling. The main goal is to
supplement current studies that use a wide range of machine learning techniques
to identify anomalous behavior in Smart Home IoT devices with an exceptionally
high accuracy rate. The intended framework is positioned to play a crucial function
in bolstering IoT security in the future because it is made to include several types
of abnormal activity detection. Our technological motivation stems from IoT smart
sensors’ high processing power and advanced connectivity capabilities. Notably,
these sensors have the potential to be manipulated for malicious purposes only on a
single sensed data point rather than the complete collection of collected data from
sensors, such as temperature, humidity, light, and voltage measurements. Such a
threat lowers the detection effectiveness of many machine learning algorithms and
has a substantial impact on the accuracy of aberrant behavior detection. To iden-
tify occurrences of alteration in one specific data point among the four potential
data points collected by a single sensor, we compared and used different classi-
fiers in our investigations, including the Decision Tree Classifier, KNeighbors Clas-
sifier, Support Vector Classifier (SVC), Logistic Regression, AdaBoost Classifier,
Random Forest with Extreme Gradient Boost (XGBRF) Classifier, Random Forest
Classifier, Light Gradient Boosting Machine (LGBM) Classifier, Gradient Boosting
Classifier, and XGB Classifier. The results showed that the Gradient Boosting Clas-
sifier algorithm using random search attained an 85.96% detection accuracy, indi-
cating a somewhat lower vulnerability to such changes. As a result, the Gradient
Boosting Classifier algorithm with random search was the foundation for the care-
fully constructed suggested framework, which used four hyperparameter tuning
mechanisms for comparison.
4.1 Introduction
The term “Internet of Things” (IOT) refers to networks of physical objects and prod-
ucts that are integrated with electronics, sensors, actuators, software, and connections
to allow for data exchange and communication between them. The IoT is currently
the most extensively utilized technology, with projections indicating its continued
dominance until 2030, there will be more than 25.4 billion linked IoT devices glob-
ally. Due of its widespread prevalence, the COVID-19 epidemic has contributed
significantly to the rapid development of IoT technology.
The Internet of Things (IoT), which refers to a growing number of technical
devices connected to the Internet, brings about new modern conveniences. The way
we link IoT devices to our living spaces is expected to undergo a transformation
thanks to the constantly expanding variety of smart, high-tech items available today.
IoT helps and benefits us in practically every aspect of our life. The Internet of
Things (IoT) gradually integrates into our daily lives. IoT devices have been making
their way in a variety of industries recently, including residential and commercial
applications.
To take advantage of the greater capacity to be aware of and control important
characteristics of their houses, many individuals are setting up domestic devices
and IP-enabled gadgets in their homes. However, there are numerous media stories
regarding IoT devices that are installed in consumer residences and other living
spaces that have security flaws that might be used by attackers. IoT device suppliers
should be able to release timely fixes to address vulnerabilities, but they appear
to be unable or unwilling to do so. This would be the best way to deal with IoT
device vulnerabilities. A large number of IoT users lack the necessary knowledge or
motivation to carry out such procedures, or they may forget about unattended IoT
devices that were previously put in their network, leaving them with software that is
outdated. Upcoming safety measures for IoT technology must take into account the
possibility of unpatched IoT devices coexisting alongside other Internet of Things
(IoT) devices during their entire lifecycle in the user’s network and posing dangers.
A huge number of installed devices in advance IoT-connected smart city scenarios
are widely accessible and, as a result, their physical security is of utmost significance.
The main security vulnerabilities are those connected with poor physical protection,
4 Profiling and Classification of IoT Devices for Smart Home Environments 87
such as simple gadget disassembly, unauthorised access to device records and data,
and portable storage media [13]. Because of this, despite the various conveniences
and adaptability benefits they provide, they also pose a number of security risks
and issues [14]. Separating the devices and forbidding connectivity to different IoT
devices via a gateway is a key factor in attack prevention for IoT devices. Given these
security issues, effective identification of devices is likely a preferable strategy for
administering networks than device isolation.
The cause of received communications on the server that leads to determine thefts
is the fundamental difficulty with device-side authorization. Using a document called
a certificate, which may be faked, is one option. Fingerprinting devices may be the
best option for allowing network managers to automatically detect linked IoT devices
[12]. The fingerprinting procedure involves identifying a certain type of equipment
from its personal network data from a distance. For the purpose of to safeguard
and maintain the network, network administration has to be aware of the devices
connected to the system. Network managers needed a better knowledge of the linked
devices as more Internet of Things (IoT) gadgets are added to a network. Device
profiling, which is a strategy to continuously perform identifying a device or detection
by taking into account behavioural aspects, is comparable to device fingerprinting.
4.1.1 Motivations
Many more organisations now enable IOT devices to connect to their networks, which
might put such networks at risk for security. In order to determine which devices are
connected to their networks and if they are deemed legal and do not constitute a risk,
organisations must be able to identify these devices.
In past studies, it has become more common to use network data to identify
devices in general. Particularly, there is growing interest in the field of identifying
IOT devices since it is crucial to do so in an organisational setting (particularly in
terms of security).
This study aims to address the issue of identifying an IoT device by utilizing
machine learning techniques to analyze its high-level network traffic data. We want
to provide a mechanism for locating such a device, even if its IP address has been
spoofed (which is simple to achieve), and to be able to see any unusual behaviours that
would point to the device that is being used. We would want to analyse the traffic’s
high-level data (that is, the metadata and traffic statistics, rather than analysing the
content), as we can’t rely on the IP address to identify the device (since this number
might be faked).
The topic that we want to tackle in this study is fundamentally a multi-class issue.
We’ll make use of a dataset gathered from 10 various IOT devices. The dataset
includes details on these devices’ network traffic. The strategy we’ll employ in this
study is to identify the device based on a specific traffic session or series of sessions.
For each device, we will start by developing one-vs-rest classifiers, and we will keep
going until we are able to discriminate between every type of device.
88 S. K. Das et al.
4.1.2 Contribution
To identify devices at the device type level, the suggested solution uses a cross-
layer methodology that includes network, data connection, transport, and application
data. To limit IoT devices’ access to sophisticated features accessible to conventional
devices such as laptops and smartphones, the fundamental concept is to analyse and
recognise the distinctive behaviour patterns of IoT devices. IoT devices continue to
be vulnerable as possible network weak spots despite these precautions, though. It
would not stop, for instance, the monitoring and control of IoT devices in a home
network, such as the use of a camera to trigger an action like opening a garage
door. Additionally, the adopted regulation gives all IoT devices the same degree of
capabilities without distinction.
This study proposes a novel method for recognising and categorising devices that
incorporates cutting-edge machine learning techniques. In particular, it offers a
ground-breaking framework for big data-based traffic categorization that is extend-
able, distributed, scalable, and portable. The study also suggests a distributed
approach for processing real-time ingress IoT flow streams that makes use of
H2O.ai. This technique effectively fulfills crucial requirements such as on-demand
scaling, storage capacity, computation dissemination, latency, and privacy. The study
proposes the method, which categorises IoT devices based on their behavioural
traffic characteristics. The input dataset is composed of flow entries extracted from
the incoming network traffic for training the model. The learning algorithm set
employed consists of MMetaModel, XGBoost, DRF, GBM, and GLM. The study
greatly advances device identification and classification methods in the IoT space by
utilising this thorough and complex methodology.
An additional.pcap file with 802,582 packets in binary format from 17 distinct
devices was used to verify the efficacy of the suggested fix. The framework’s exam-
ination revealed exceptional performance, obtaining a remarkable accuracy rate of
99.94%. Additionally, the solution showed good performance metrics for F1 score,
Precision, and Recall. These findings demonstrate the solution’s potential to success-
fully answer concerns about cyberattacks and open the door for the creation of
autonomous defence systems. The framework has a lot of promise for battling cyber-
security threats and developing resilient defence systems due to its high accuracy
and robust performance.
The solution IDentifier (SysID) solution, which specialises in IoT device finger-
printing, is introduced in this research article. With just one packet, SysID can
successfully identify the device type. SysID employs machine learning and genetic
algorithms to independently acquire knowledge about the unique characteristics of
4 Profiling and Classification of IoT Devices for Smart Home Environments 89
each IoT device, in contrast to conventional techniques that need professional super-
vision. This method illustrates the supremacy of rule-based algorithms, which excel in
capturing distinctive header traits and precisely analysing attribute values utilised for
classification. SysID distinguishes out as a flexible, network-adaptive model-based
technology. The three stages of this study’s development were: defining the research
topic; setting up a lab environment with SHIoT (Smart Home IoT) devices; and,
finally, preprocessing the data gathered and creating a classification model. Based
on the traffic flow characteristics amassed over a period of 10 days for each SHIoT
device, the categorization model was built. The initial dataset had 681,684 feature
vectors spread across four classes, however it was discovered that this distribution
was out of balance. In order to overcome this, stratification techniques were utilised,
producing a dataset with 117,423 feature vectors that was then used to create further
models.
It was decided that the Precision-Recall Curve (PRC) metric was superior than
the Receiver Operating Characteristic (ROC) measure. The M4 model emerged as
the best option after several observed models were analysed since it performed more
consistently than the others. It was discovered that the characteristics of the observed
traffic flow that had the most influence on the classification model were the packet
length, interarrival packet timings, segments within the traffic flow, and the amount
of data transferred inside the sub-stream. These characteristics were essential for
correctly categorising SHIoT devices.
Federated learning (FL) is becoming increasingly important in this field as
machine learning (ML) and deep learning (DL) approaches are used to discover
cybersecurity vulnerabilities in IoT systems. Realistic splitting tactics, like those
reported in the MedBIoT dataset, may be used in FL approaches, which call for
datasets that correctly represent cyberattacks directed at IoT devices. However, the
split of centralised databases must be taken into account by current FL-based systems.
This component’s goal is to train a federated ML model for malware detection using
four different multilayer perceptrons (MLP) and autoencoder architectures.
The aggregation function is considered as a parameter in two FL methods, Mini-
batch aggregation and Multi-epoch aggregation. The supervised solution is used
primarily to compare the supervised method to the unsupervised approach and to
carry out extensive experiments that are in-depth in nature. Three separate challenges
are studied and analysed as a result of the three various methods the dataset is
rebalanced.
This study proposes a unique method called HFeDI that uses horizontal federated
learning with privacy protection to identify IoT devices. Three publicly accessible
datasets are used to assess the effectiveness of HFeDI, with encouraging findings.
When using centralised training, the 23 features along with the 2 additional features
discovered by Miettinen et al. [39], were shown to offer the greatest accuracy. The
output data of the feature extractor tool is enhanced through the utilization of SK
resampling, a resampling method, to improve the quality of the data. By employing
kaiming weight initialization, group normalisation, a loss function that incorporates
weights to calculate the cross-entropy, and a straightforward averaging technique
90 S. K. Das et al.
at the server, HFeDI substantially enhances the efficiency of IoT device identifi-
cation. For cases involving both independent identically distributed (IID) and non-
independent and identically distributed (non-IID) data, the findings show a significant
improvement in accuracy, recall, precision, and F1-score. These results demonstrate
the efficacy and promise of HFeDI in improving IoT device identification while
preserving privacy using federated learning techniques.
We employ machine learning (ML) techniques and encrypted analysis of traffic
to address the issue of identifying IoT devices based on their unique characteris-
tics. This study utilized the dataset given by the University of New South Wales
and IBM Research. A TP-Link router, which serves as a connection point to the
public Internet, is equipped with the OpenWrt operating system and other essential
packages, enabling the collection of traffic in pcap files to record pertinent actions.
Then, in order to extract useful characteristics, these files are analysed. Exploration
Evaluation is used to find the most effective classifier estimators in order to maximise
training. This method aids in choosing the best models for the task at hand. Addition-
ally, a comparative evaluation employing a variety of predictive measures is carried
out to compare these classifiers to a baseline. These assessments make it possible to
evaluate the performance of the classifiers in-depth.
The testing set is then used to evaluate and verify how well the chosen classifiers
perform in comparison to the metrics and benchmarks that have been created. In
order to accurately map encrypted data streams to the appropriate device types, this
guarantees thorough study and evaluation of our ML-based encrypted traffic analysis
technique.
A robust security solution called IOT SENTINEL was created expressly to address
the security and privacy issues brought on by unreliable IoT devices. In order to
manage and restrict the flow of traffic from susceptible devices, it makes use of
software-defined networking and an advanced device-type recognition approach. A
Security Gateway and an IoT Security Service provided by an IoTSSP (IoT Security
Service Provider) are the two essential parts of the system. IOT SENTINEL automat-
ically detects susceptible devices inside an IoT network and implements customised
rules to restrict their communication capabilities with the purpose of minimising any
harm resulting from hacked devices. The approach dramatically lessens the potential
harm caused by hacked IoT devices by putting these preventative steps in place. IOT
SENTINEL guarantees strong security and safety in the quickly developing IoT land-
scape by combining the benefits of device-type identification and software-defined
networking.
This study introduces a machine learning approach for accurate IoT device cate-
gorization using network traffic analysis. The suggested method uses a recursive
feature selection model to find and choose the IoT-AD-20 dataset’s most impor-
tant properties. In addition, the characteristics are ranked according to how crucial
they are to the classification process using the random forest method. A cross-
validation test is carried out to guarantee the model’s dependability and prevent
overfitting. When using flow-based characteristics, the results show the usefulness
of the suggested approach, attaining a phenomenal 100% identification rate for all
IoT devices. The detection of weak IoT devices is made possible by this precise
4 Profiling and Classification of IoT Devices for Smart Home Environments 91
categorization capacity, which also makes it easier to implement strict security regu-
lations. The proposed approach demonstrates its potential to improve IoT device
security and reduce possible dangers in IoT networks by utilising the strength of
machine learning and careful feature selection.
With the use of a machine learning algorithm, this work pioneers the creation of an
anomaly-based protection (ABP) system. It investigates how slight changes to sensed
data might affect how accurate a machine learning algorithm is, additionally, it covers
the process of constructing an ABP with a specific machine learning approach. The
dataset for the experiment consist of 32,000 samples collected from the Intel Berkeley
Research Laboratory. The remaining 12,000 samples were produced in a way that
resembled anomalous behaviour, whereas the other 20,000 samples were obtained
during routine operations. 24,000 samples from the complete dataset were designated
for training, while 8,000 samples were reserved for testing. The ABP system was
used to find instances of signal injection that were intended to compromise services,
such heating or cooling in an office context, and were directed at specific detected
data. Insights into the behaviour and effectiveness of the machine learning algorithm
in spotting abnormalities were gathered through this research, which helped enhance
anomaly detection methods for protecting crucial systems.
This paper explores the dangers of IoT traffic analysis by outlining a two-
stage classification method for identifying devices and recognising their statuses.
Two different datasets—self-collected packet traces and publicly accessible packet
traces—are used to assess the suggested approach. It was found that each time a
state of an appliance changes, a discrete sequence of packets with different sizes
is sent along with it. This discovery was made by careful examination of traffic
on the network caused by IoT devices in a controlled laboratory environment. This
paper thoroughly investigates the effects of traffic profiling attacks on IoT devices.
Notably, machine learning (ML) techniques are used to accurately and efficiently
learn user actions. This research highlights the hazards and vulnerabilities present in
IoT networks by thoroughly analysing IoT traffic data and applying ML approaches.
The findings help to build strong security measures in the IoT ecosystem by offering
insightful information on device identification, state recognition, and the possibility
of hostile traffic analysis assaults. The summary of various worked carried on IoT
devices profiling has been represented in Table 4.1.
Network management and monitoring face additional issues as IoT devices grow.
Statistical analysis can classify IoT devices. IoT rules must be executed consis-
tently using a device type recognition framework. IoT devices may not be detectable
because skilled hackers can use malware to find their MAC addresses. There is no
MAC address-based system identification detection standard. This study classifies
IoT devices by traffic patterns using composite controlled machine learning algo-
rithm. The machine learning algorithms RF, k-NN, DT, NB, and SVM are capable
92 S. K. Das et al.
of identifying IoT devices. The innovative method groups novel, undiscovered IoT
devices by network utilization. Network information like SSID probes, packet desti-
nation, MAC protocol fields, and broadcast packet size identify users, while device
driver hardware and some hardware features are fingerprinted.
This chapter summarizes IoT device categorization research. Several studies have
used application and device packet features to define systems. Miettinen et al. [11],
tested 31 IoT devices. Fingerprint readers get 23 new functions. Nineteen of the 23
elements were binary, indicating domains or protocols at several protocol stacking
levels, including link (LLC and ARP), network (ICMPv6, IP, and EAPoL), transport
(UDP and TCP), application layer (HTTPS and HTTP), payload, and IP selections.
Target IP counter, packet size, source-and-destination port class were integer-type
properties. Authors employed Random Forest (RF) to classify 17 IoT devices with
95% accuracy and all of their system’s devices with 50% accuracy.
Researchers created IoT device fingerprinting methods using proactive probes or
anonymous data grabs. Nmap can detect devices [1]. Manufacturers classify devices
using various network stacks. Nmap determines OS or device from 16 probes. Several
passive fingerprinting approaches target network packet characteristics. For OS veri-
fication, P0f passively profiles and modifies TCP SYN headers and metadata [2]. Gao
et al. [3] locate access sites using packet traffic wavelet estimation. Many approaches
emphasize timing. Several passive and periodic authentication solutions leverage
application layer protocol timing to identify devices [4]. In addition to SVM clas-
sification, RTF shows tree-based spatial finite phase machine signatures. Radhakr-
ishnan et al. [5], categorize devices and simulate packet inter-arrival times using
ANN. Formby et al. [6], develops fingerprint commercial control systems using actual
running times and data and information response computation durations. Devices are
categorized by TCP timestamp clock skew prevalence by Kohno et al. [7] are exam-
ined using wireless network properties. Desmond et al. [8] use 802.11 probe request
packet timing analysis to find WLAN devices. Clustering was employed to create
fingerprints. Radiometrics was used by Nguyen et al. [9] to passively profile identity
tampering. Measurements include radio signal frequency and volume. The authors
then identified the device using non-parameterized Bayesian approaches. Recent
4 Profiling and Classification of IoT Devices for Smart Home Environments 95
The level of complexity of the IoT market is increasing, therefore there is still plenty
to learn about the many categories of IoT devices [44, 45]. The rising need for
IoT technology presents various kinds of challenges for the infrastructure as it tries
to sustain network services. This section outlines a method for building a structure
to identify devices in networked IoT devices whenever an additional IoT device gets
added, an IoT device is compromised, or as an IoT device provides erroneous data [48,
49]. New network analysis processes are needed in order to locate IoT equipment
that is attached with the system. This makes it practicable to employ analytical
methods for interpretation of information find typical setups that might distinguish
between various device types. IoT systems are more established than conventional
desktop computers since they exclusively carry out certain tasks. In order to detect IoT
devices along with excellent accuracy and minimal error messages, communication
analysis is advised. The proposed method will guard against different attacks on
the IoT systems’ activities by tracking and analysing the activity of IoT devices
[50]. Figure 4.1 depicts our recommended format for sensor characteristics in an IoT
network. A variety of connected devices and platforms for communication make up
the testbed. IoT devices have sensors for data collection and transmission from or into
the actual surroundings. Figure 4.2 shows five stages of the IoT device recognition
procedure [44–47].
A network management tool gathers IoT network traffic. The access point and the
intelligent systems will be the two points of contact for the monitoring process [51,
52]. This method has the advantage of detecting malicious IoT device activity before
the access point is accessed. Network traffic is recorded using packet capture software
like Wireshark [53–55]. The origin IP, root ports, goal IP, ports at the goal, and the
content of the packets are all included in Wireshark traffic. Data from IoT devices
can be gathered from different payload to create the device identity. To determine
device behaviour, the information from every system will be analysed.
96 S. K. Das et al.
Fig. 4.2 System for profiling IoT devices in smart home infrastructure
A system of data with the typical functioning of IoT sensors is described in the
section for sensor description [56, 57]. The routine operation of IoT sensors has
4 Profiling and Classification of IoT Devices for Smart Home Environments 97
been described using machine learning techniques. It is best for determining every
potential stage of typical sensor behaviour using a thorough model. To be able to
categorise IoT devices throughout the system, this document emphasises the commu-
nication analysis of the sensors. By device recognition, a network manager will be
able to identify malicious sensors in IoT system.
4.4.3 Analysis
To check for any discrepancies in the communications of the received IoT system,
the sensor profile from the preceding stage will be utilised as a baseline [58]. For
the sensor communications, a runtime profile is created, and any departure from
the baseline should be regarded as abnormal. The possibility of natural behaviour
occurring outside of the lower or upper bound is confirmed using the probability
distribution. If the system’s gathered data rate exceeds the parameters specified, it
will be regarded as irregular.
4.4.4 Classification
the adversary lacks access to the authorized equipment required for capturing and
analyzing traffic trends once they have obtained the compromised secrets.
The three categories of machine learning algorithms are supervised learning, unsu-
pervised learning, and semi-supervised learning. Each type has unique traits and uses
in a variety of industries. Let’s look more closely at these categories.
100 S. K. Das et al.
Training a model on labelled data through supervised learning entails associating the
input data with the matching output labels. The objective is to discover a mapping
between the features of the input and the desired output [72]. The model gains
the ability to predict outcomes during training by extrapolating patterns from the
labelled samples. The labelled data helps the model make precise predictions about
fresh, unforeseen data. Linear regression, decision trees, support vector machines,
and neural networks are examples of common supervised learning methods [70].
In this method, the model is given unlabeled data without any associated output
labels. Finding patterns, structures, or relationships within the data is the aim [73].
There is no specific right output to direct the learning process, unlike supervised
learning. Unsupervised learning algorithms seek out hidden patterns, collect related
data points, or make the data less dimensional. Unsupervised learning is frequently
used in clustering algorithms like k-means and hierarchical clustering as well as
dimensionality reduction methods like principal component analysis (PCA) and t-
distributed stochastic neighbour embedding (t-SNE).
This section gives a thorough explanation of supervised learning and discusses five
different supervised machine learning algorithms that are used to categorise the
different kinds of IoT devices that are present in the network. The following are
these algorithms.
A decision degree growing technique variant known as random forest enables random
growth branches inside the chosen subspace, setting it apart from other classifiers as
shown in Fig. 4.3. Constructed from a collection of random depend regression trees,
the random forest approach makes predictions about the result. At each random
base regression, the algorithm chooses a node and splits it to develop additional
branches. Given that it integrates many trees, and since Random Forest is a combined
algorithm, it is crucial to keep this in mind. Ensemble algorithms integrate a number
of classifiers with various kinds in the ideal case. It is possible to think of random
forest as a bootstrapping method for enhancing decision tree outcomes.
The algorithm operates in the sequence shown below. The parameter designating
the sample used for bootstrapping U(i) employs the ith bootstrap. Despite using a
modified decision tree technique, the programme learns a traditional decision tree.
As the tree develops, the alteration is methodically carried out and is particular. This
means that instead of performing an iterative for each conceivable value splits at
1: perform RF (U, Z)
2: A ← 0
3: for i ∈ 1, ..., Q do
4: U(i) ← 1 an instance of bootstrap taken from U
5: ai RTL(U(i), Z) (RTL = RandomizedTreeLearn)
6: A ← A ∪ {ai}
7: close for
8: return A
9: close RF
10: perform RTL (U, Z)
11: At every point:
12: f ← Create a condensed collection of Z
13: Divide the most significant attribute in f
14: return RTL (Model)
15: close RTL
The subject it takes into account is that wrapping lowers the difference of decision
tree method, which is how the algorithm implements the ensemble decision tree.
xi ∈ R p , i = 1, ..., n
where Rp denotes the p-dimensional data space and forecast vector domain with real
values and xi stands for the training observation. A basic Support Vector Machine
algorithm’s pseudo-code is displayed.
Algorithm 2 (SVM)
FeatureSupportVector(FSV) = {Most Similar Feature Pair of Differing Groups}.
This technique looks for potential supported vector, designated as S which makes
the assumption that the SV represents the dimension in which hyperplane’s linear
attributes’ parameters are kept.
kNN categorises data by utilising the identical distance measuring method as Linear
Discriminant Algorithm and further approaches employing regression. While the
technique delivers the worth of a characteristic or predictor in a regression appli-
cation, it creates class members in a classification application [16]. The method
was chosen for the study because it can pinpoint the most important predictor.
Despite being regarded as resistant to outliers and adaptable among many other
qualifying criteria, the method has significant memory requirements and is attentive
to feature that are not contributed. The average space between individual data points
is used by the method to form classes or clusters. The following Eq. (4.1) can be
used to get the mean distance [40].
1
ϕ(x) = . (xi , yi ) ∈ k N N (x, L , K )yi (4.1)
κ
In above Eq. (4.1), kNN (x, L, K), the letter k stands for the input attribute’s K
nearest neighbours in the learning set space (i).
The dominant k class determines how the algorithm applies classification and
prediction, and the following Eq. (4.2) is the prediction formula [40].
104 S. K. Das et al.
ϕ(x) = argmax (xi , yi ) ∈ k N N (x, L , K )yi (4.2)
c∈y
So, it is crucial to understand the fact which resulting class is made up the desired
attribute’s participants, and that the Euclidean distance is used to assign the attributes
to classes. Six phases make up the algorithm’s implementation. The calculation of
distance according to Euclidean is first stage. The calculated n distances organised
from highest to lowest in the second stage, and k is a positive integer selected based
on the ordered Euclidean distances in the third phase. The fourth stage establishes
and assigns k-points that match the k-distances due to closeness to the centre of the
group. Finally, if ki > kj for every i = j is true, an feature x is added to that group for
k > 0 and for the amount of points there in i. The kNN stages method is shown in
Algorithm 3.
Algorithm 3 (kNN)
As a requirement, provide a training sample (X), the class samples (Y), with an
undefined training data (x):
1: Categorised (X, Y, x)
2: for i = 1 to m do
3: Enter distance d (Xi, x)
4: terminate
5: for Enter Set I consists of the smallest groups with k distances. d
(Xi, x)
6: Return dominant label for {Yi; i I}
For the J class, the LR technique simulates with a conditionally probability for noticed
instance comprising a specific group Pr(G = j|X = x), where it is feasible to identify
the classes of unidentified cases using below Eq. (4.3).
Pr (G = j|X = x)
log = β Tj xi ; j = 1, · · · , J − 1 (4.4)
Pr (G = j|X = x)
eβ j xi
T
Pr (G = j|X = x) = J −1 ; j = 1, · · · , J − 1 (4.5)
eβ j xi
T
1+ l=1
1
Pr (G = j|X = x) = J −1 (4.6)
eβ j xi
T
1+ l=1
4.5.4.5 AdaBoostclassifier
This approach’s main idea is to build templates one after another with aiming to
decrease the shortcomings of the model prior to it. However, how should we approach
that? How will the error be reduced? Such is achieved by building another system
on the flaws or any residuals of the previous one.
Boosting the gradient whenever the objective column is constant, the regressor
is employed; whenever the issue involves a single of categorization, the gradient
boosting algorithm utilised. The only difference within two of them is the “Loss
factor”. The objective is to add poor learners then decrease this loss factor using
gradient descent techniques. Due to the fact that it relies on a loss factor, we are
going to have a variety of loss factor for regression challenges like Mean Squared
Error along with difficulties in classification as log-likelihood.
Let’s consider X and Y as the input and goal, respectively, with N samples each.
We seek to comprehend the function f(x) that transforms the input characteristics X
into the required variables y. It represents the cumulative count of trees, including
those that have been improved or modified. The discrepancy between the observed
and predicted variables is referred to as the loss function as shown in Eq. (4.7) [50].
N
L( f ) = L(yi , f (xi )) (4.7)
i=1
With regard to f, we aim to minimise the loss function L(f) as shown in Eq. (4.8)
[50].
N
f 0 (x) = arg min L( f ) = arg min L(yi , f (xi )) (4.8)
f f
i=1
If our gradient boosting approach is in M stages, the algorithm can add a new
estimator as hm having 1 ≤ m ≤ M to enhance the f m as shown in Eq. (4.9).
The steepest Descent determines hm = –pm gm for M stage gradient boosting where
pm is constant and known as step length and gm is the gradient of loss function L(f)
as shown in Eq. (4.10).
∂ L(yi , f (xi ))
gim = − (4.10)
∂ f (xi ) f (xi )= f m−1 (xi )
The gradient refers to the rate of change of a function at a certain point. It represents
the slope or steepness of the function at that point. Similarly, the same applies to M
trees as shown in Eq. (4.11).
4 Profiling and Classification of IoT Devices for Smart Home Environments 107
N
f m (x) = f m−1 (x) + arg min L(yi , f m−1 (xi ) + h m (xi )) .x (4.11)
h m ∈H
i=1
f m = f m−1 − ρm gm (4.12)
Among the algorithms utilised for controlled machine learning are detectors from
decision trees. This shows that they develop a method which may forecast outcomes
from pre-labeled data. Decision trees can also be used to address regression-related
problems. Much of the information you learn in this lesson may be applied to prob-
lems with regression. Classifiers using decision trees function in a manner resembling
flow diagrams. The nodes of a decision tree typically represent a point of choice that
splits into two nodes. Every one of such nodes represents the result of the option,
and each possibility includes a chance to evolve into a decision node. A conclusive
classification will be produced as the culmination of all the various assessments. The
primary node is the root or base node. Every of the decision points are referred to by
the decision nodes. The result decision point is known as a leaf node.
108 S. K. Das et al.
LightGBM is a type of gradient boosting method made with decision trees which
enhance model efficiency while utilizing small data or memory. To detect the
features of the histogram-dependent method, which is greatly deployed using
entirety Gradient Boost Decision Tree approach, it uses 2 creative strategy: gradient-
dependent first type testing and another bundles of special features. Characteristics
of the LightGBM method are made by above couple of techniques. Collectively, they
apply the model to perform accurate and give an edge over GBDT method.
For a training set of n occurrences, each of which is represented by a vector of
dimension s in space {x1, x2, …., xn}. The -ve inequalities in the demise factor with
relation to the technique outcomes shows up {g1,…., gn} within every iteration of
gradient boosting. The learning instances are rated using this GOSS technique based
on the absolute values of their gradients in descending order. Then, we acquire an
instance subset A by keeping the top-a 100% instances with the largest gradients.
Then, for the set Ac that is still present, we randomly choose a group B with dimension
b |Ac | that contains (1−a)× 100% cases with lower gradients [50]. The mathematical
formula is represented in Eq. (4.13).
2 2
1 xi ∈Al gi + 1−a
xi ∈Bl gi xi ∈Ar gi + 1−a
xi ∈Br gi
V j (d) = j
b
+ j
b
n n l (d) n r (d)
(4.13)
where
– Al = {x i ? A: x ij ? d},
– Ar = {x i ? A: x ij > d},
– Bl = {x i ? B: x ij ? d},
– Br = {x i ? B: x ij > d},
also value (1–a)/b is utilize which regularize the addition of gradients above B return
to the scale of Ac .
Gadgets are divided into seven categories by us: smart speaker, smart electricity and
lighting, smart camera, smart sensor, smart home assistant, and non-IoT gadgets.
These categories were chosen because of the functionality they provide. While home
assistants can carry out tasks, cameras and sensors are largely employed to gather
information. We may successfully implement regulations that forbid data-gathering
gadgets from doing acts that would jeopardise privacy by classifying them in this
way. Furthermore, as they both acquire data with variable degrees of privacy viola-
tion, we distinguish between cameras and sensors as information-gathering devices.
Comparing cameras to sensors, in particular, provides more thorough data about user
privacy.
4 Profiling and Classification of IoT Devices for Smart Home Environments 109
device will not correspond to any fingerprints from known IoT cameras that were
utilized to train the classifiers.
This section outlines our method of classifying the device during each re-
authentication, regardless of its previous authentication status with the server. This
approach differs from the reliance on identifying a new device solely based on its
MAC address, as demonstrated in [24–27]. Such reliance on MAC address can
be easily manipulated by adversaries to authenticate themselves with the hub, as
discussed in [28, 29, 36].
The database only verifies the device’s MAC address to see if it is already present.
However, even in the event of a falsified MAC address, the classifier in the database
will not possess the capability to accurately categorize any information to the highest
degree. This occurs as a result of a discrepancy between the fingerprint of the device
and the fingerprint that is already stored in the database. Although the MAC address
may be same, the device’s unique fingerprint results in a discrepancy. Consequently,
this addresses the situation where a device has been compromised, as the patterns
of communication and the unique characteristics of the device would have changed
from their previous state.
The MAC addresses match but the fingerprints don’t, which means the device must
be fake, the verifier may immediately confirm to the hub. By using this strategy, we
tighten the authentication procedure and stop unauthorised devices from connecting
to the network, even if they try to impersonate a device that has previously been
authenticated.
This concept is founded on the notion that a machine learning model exhibits a
propensity to provide highly precise forecasts when it undergoes training and testing
using either the same or a similar dataset. The motivation behind acquiring traffic
data of a novel device is rooted in this concept. After the device has been properly
authorised, we build a model based purely on its fingerprint. This model is then saved
in the database. By using this strategy, we make use of machine learning to improve
the overall efficacy of the authentication process and assure accurate forecasts.
This section describes about the dataset, various preprocessing and feature selection
approaches used in the proposed system.
4 Profiling and Classification of IoT Devices for Smart Home Environments 111
The baby monitor, lights, motion sensor, security camera, smoke detector, socket,
thermostat, TV, watch, and water sensor were among the ten IOT devices from which
the dataset for this study was primarily compiled. It had previously been split into
a train set, a validation set, and a test set. The dataset includes details on these
devices’ network activity as it was gathered over an extended period of time. A TCP
connection from the SYN packet to the FIN packet is represented by each instance
(example) in the dataset. The device’s type categorization serves as the dependent
variable. Nearly 300 characteristics and almost 400,000 cases make up the training
set.
It is important to note that not all of the data in the dataset was accessible for all of
the offered sessions. It happens frequently to come across a dataset where not all of
the data is accessible and usable for training. There are several ways to deal with this
missing data, and the one we used was to get rid of the instances where it existed.
We found that the number of cases with missing data is rather low and that there are
still enough examples to provide successful learning. In the original dataset, missing
data are denoted with a question mark. We also had to cope with the “thermostat”
device having just one class represented in this test set when we got to the testing
and utilising step. Therefore, we could determine its accuracy score but not AUC.
Feature Scaling
We quickly observed from the data that the different attributes have varying ranges
of values. It is well knowledge that such adjustments to the feature ranges may result
in less accurate findings and issues with training. As a result, we have chosen to
employ the Python sklearn library’s built-in MinMaxScaler. You may use this scaler
to conduct min–max scaling, which will result in the dataset’s values all falling inside
the range of (0,1). We really saw that the test set findings were substantially more
accurate and had a better AUC value after the feature scaling was done.
112 S. K. Das et al.
Feature Selection
Feature selection is one of the factors that should be taken into account in many
machine learning situations. The well-known idea of “the curse of dimensionality”
might make the model overfit or underperform. The feature selection idea was devel-
oped to address it. Some models, such as Decision Trees and Random Forests, do
not often need feature selection. The rationale is that because of how these models
are trained (the “best” feature is chosen at each split of the tree), the feature selection
process is done on the fly. But in order to get better outcomes, some models could
require feature selection. Given the large samples-to-features ratio in this study (the
training set contains over 400,000 instances and has about 300 features), the “curse
of dimensionality” shouldn’t have much of an effect.
4.5.7 Evaluation
We have 10 IoT devices like baby monitor, lights, motion sensor, security camera,
smoke detector, socket, thermostat, TV, watch, and water sensor. So according to the
device categories we include here the device’s data to the training set.
The Fig. 4.4 graph shows the appearances of IoT devices in training data set
according to their count. Similarly using dataset, we also count the categories of test
set of device categories as shown in Fig. 4.5. Y axis describe count of device and X
axis shows device categories.
By considering training and test set device categories we have above correla-
tion matrix and heat map shows the value of correlation between device categories.
Figure 4.6 shows heat map of correlation of IoT devices.
After getting correlation matrix we move towards our next stage by defining base
line model scores according to different machine learning models. Here base line
model used different ML models such as AdaBoost Classifier, GradientBoosting-
Classifier, LGBM Classifier, XGB Classifier, SVC, DecisionTreeClassifier, KNeigh-
borsClassifier, RandomForestClassifier, LogisticRegression, XGBRFClassifier etc.
Calculating different classifier model we get following baseline model scores as
follows:
The Table 4.2 shows the respective value of classifier models in order to determine
the baseline score model. From Fig. 4.7 we get the baseline model precision score and
found that a top 2 model scores of Gradient Boosting Classifier is 0.859649. To see
if the model becomes better, we may try adjusting the hyperparameters. So, for this
here we use Random search (RS) method to improve the classifier’s performance.
From Table 4.3 we get the values of different random search model of Gradient
Boosting Classifier in order to improve performance of the classifier. Since RS 4
yields the greatest results, we will build the model on it i.e. RS model 4 Gradient
Boosting Classifier 0.8491228070175438.
114 S. K. Das et al.
So here we getting the result of RS model 4 and considering that value further we have
to calculate different confusion matrix parameter like precision, recall, f–1 score.
Table 4.4 shows the supported value of different IoT device with their precision,
recall, f-1 score and support score value by considering the below confusion matrix
graph (Fig. 4.8). Finally, the table calculate the accuracy, macro and weighted average
of classification report.
Classification report shows the accuracy of the model in order to corresponding
RS model 4 value. Figure 4.8 describe the confusion matrix of 10 IoT device which is
plot in basis of predicted label and true label. The higher value shows the easiness of
predicating of some device label and lower value shows the difficulty of predicating
value of device label.
116 S. K. Das et al.
Table 4.4 Classification report for different IoT devices corresponding to the various performance
metrics used for evaluation
IoT devices Precision Recall F-1 score Support
TV 0.88 0.93 0.91 57
Baby monitor 0.98 1.00 0.99 48
Lights 0.46 0.54 0.50 48
Socket 0.50 0.46 0.48 57
Watch 0.97 0.93 0.95 69
Water sensor 0.55 0.51 0.53 35
Thermostat 0.97 0.97 0.97 67
Smoke detector 1.00 0.98 0.99 64
Motion sensor 0.98 0.97 0.98 67
Security 1.00 1.00 1.00 58
Camera
Accuracy 0.85 570
Macro avg 0.83 0.83 0.83 570
Weighted avg 0.85 0.85 0.85 570
After getting the confusion matrix we apply the cross-validation method in order
to get better accuracy of the model. Table 4.5 describe the value of cross-validation
of accuracy value and the mean score cross-validation shows the exact and better
accuracy of the model.
With a Cross validation accuracy of 0.85 the Overall, model does rather well,
although it struggles to forecast the outcomes for lights, water sensor and sockets.
In order to profile the devices, the study established a more objective system of cate-
gorization, isolating attackers The structure brings successful traits to future IoT secu-
rity, and uses mixed machine learning techniques to more reliably detect abnormal
behavior from various Smart Home devices. Internet of Things sensing technology
was inspired by the better processing and communication abilities offered by smart
sensors. But these sensors can be used to deliberately strike a single sensed point
rather than the entire data set. This hazard greatly undermines the accuracy with which
machine learning algorithms can detect deviant behavior. We tested eight classifiers,
including the DT, KNN, SVC, LR, AdaBoost, XGBRF, and LGBM. Therefore, the
framework utilized XGB with random search and four hyperparameter tuners to thor-
oughly compare them all while also achieving overall attack detection accuracy of
85.96%.
However, the framework’s limits must be acknowledged. A single detected data
point for anomaly detection may make the system vulnerable to sophisticated assaults
targeting sensor metrics, bypassing the detection algorithms. The model may also be
limited in its applicability to different IoT contexts and device kinds, requiring further
validation across different scenarios. The focus of this paper is on the classifica-
tion and detection of IoT devices employing flow-dependent system communication
assessment. Attacker may exploit IoT device categorization to uncover insecure IoT
devices by performing an effective network stream of traffic assessment. The device
characterization and recognition permit the network’s operator to recognise rogue
sensor in IoT system. We need to examine this proposed model in future research
based on IoT system networks utilised with more types of IoT devices. The XGBRF
118 S. K. Das et al.
showed promising anomaly detection results in IoT device profiling, but real-time
machine learning model deployment challenges must be addressed. Since it relies on
single observed data points for anomaly identification, the proposed system is vulner-
able to targeted attacks on sensor metrics. In instances when adversaries exploit this
vulnerability, they may avoid detection and compromise IoT system security. Due
to the complexity of varied IoT contexts and device kinds, the anomaly detection
method may be less successful, making the model difficult to modify and gener-
alize. This approach must be validated and considered to assure its dependability
and efficacy in dynamic, live situations when translating machine learning models
from controlled experimental settings to real-world applications.
References
1. Lyon, G.F.: Nmap network scanning: The official Nmap project guide to network discovery
and security scanning. Insecure, (2009)
2. Bebortta, S., Senapati, D., Panigrahi, C.R., Pati, B.: Adaptive performance modeling framework
for QoS-aware offloading in MEC-based IIoT systems. IEEE Internet Things J. 9(12), 10162–
10171 (2021)
3. Bebortta, S., Singh, A.K., Pati, B., Senapati, D.: A robust energy optimization and data reduction
scheme for iot based indoor environments using local processing framework. J. Netw. Syst.
Manage. 29, 1–28 (2021)
4. Francois, J., Abdelnur, H., State, R., Festor, O.: Ptf: Passive temporal fingerprinting. In:
12th IFIP/IEEE International symposium on integrated network management (IM 2011) and
workshops, pp. 289-296. IEEE, (2011)
5. Tripathy, S.S., Imoize, A.L., Rath, M., Tripathy, N., Bebortta, S., Lee, C.C., Chen, T.Y., Ojo,
S., Isabona, J., Pani, S.K.: A novel edge-computing-based framework for an intelligent smart
healthcare system in smart cities. Sustainability. 15(1), 735 (2023)
6. Bebortta, S., Senapati, D., Panigrahi, C.R., Pati, B.: An adaptive modeling and performance
evaluation framework for edge-enabled green IoT systems. IEEE Trans Green. Commun. Netw.
6(2), 836–844 (2021)
7. Senapati, D.: Generation of cubic power-law for high frequency intra-day returns: Maximum
Tsallis entropy framework. Digital Signal Processing. 1(48), 276–284 (2016)
8. Mukherjee, T., Singh, A.K., Senapati, D.: Performance evaluation of wireless communication
systems over Weibull/q-lognormal shadowed fading using Tsallis’ entropy framework. Wireless
Pers. Commun. 106(2), 789–803 (2019)
9. Nguyen, N.T., Zheng, G., Han, Z., Zheng, R.: Device fingerprinting to enhance wireless security
using nonparametric bayesian method. In: INFOCOM, 2011 Proceedings IEEE, pp. 1404–
1412. IEEE (2011)
10. Xu, Q., Zheng, R., Saad, W., Han, Z.: Device fingerprinting in wireless networks: Challenges
and opportunities. IEEE Commun. Surv. & Tutor. 18(1), 94–104 (2016)
11. Miettinen, M., Marchal, S., Hafeez, I., Asokan, N., Sadeghi, A.R., Tarkoma, S.: IoT
SENTINEL: Automated device-type identification for security enforcement in IoT. In: Proceed-
ings—International conference on distributed computing systems, pp. 2177–2184 (2017).
https://doi.org/10.1109/ICDCS.2017.283
12. Nayak, G., Singh, A.K., Bhattacharjee, S., Senapati, D.: A new tight approximation towards
the computation of option price. Int. J. Inf. Technol. 14(3), 1295–1303 (2022)
13. Bertino, E., Islam, N.: Botnets and internet of things security. Computer 50, 76–79 (2017)
14. Shah, T., Venkatesan, S.: Authentication of IoT device and IoT server using secure vaults. In:
Proceedings of the 2018 17th IEEE international conference on trust, security and privacy in
4 Profiling and Classification of IoT Devices for Smart Home Environments 119
computing and communications/12th IEEE international conference on big data science and
engineering (TrustCom/BigDataSE), New York, NY, USA, pp. 819–824 (2018)
15. Nayak, G., Singh, A.K., Senapati, D.: Computational modeling of non-gaussian option price
using non-extensive Tsallis’ entropy framework. Comput. Econ. 57(4), 1353–1371 (2021)
16. Mukherjee, T., Pati, B., Senapati, D.: Performance evaluation of composite fading channels
using q-weibull distribution. In: Progress in Advanced Computing and Intelligent Engineering:
Proceedings of ICACIE 2019, vol. 1, pp. 317–324. Springer Singapore, (2021)
17. Mukherjee, T., Nayak, G., Senapati, D.: Evaluation of symbol error probability using a new
tight Gaussian Q approximation. Int. J. Syst., Control. Commun. 12(1), 60–71 (2021)
18. Yi, H.-C., Huang, D.-S., Li, X., Jiang, T.-H., Li, L.-P.: A deep learning framework for robust
and accurate prediction of ncrna-protein interactions using evolutionary information. Mol Ther-
Nucleic Acids. 1(11), 337–344 (2018). https://doi.org/10.1016/j.omtn.2018.03.001
19. Ling, H., Kang, W., Liang, C., Chen, H.: Combination of support vector machine and k-fold
cross validation to predict compressive strength of concrete in marine environment. Constr.
Build. Mater. 206, 355–363 (2019). https://doi.org/10.1016/j.conbuildmat.2019.02.071
20. Kaur, H., Kumari, V.: Predictive modelling and analytics for diabetes using a machine learning
approach. Appl Comput. Inf. 2018. https://doi.org/10.1016/j.aci.2018.12.004
21. Zhang, H., Yu, P., et al.: Development of novel prediction model for drug-induced mitochondrial
toxicity by using naïve bayes classifier method. Food Chem. Toxicol. 10, 122–129 (2017).
https://doi.org/10.1016/j.fct.2017.10.021
22. Donzé, J., Bates, D.W., Schnipper, J.L.: Causes and patterns of readmissions in patients with
common comorbidities: retrospective cohort study. BMJ. 347 (7171), (2013). https://doi.org/
10.1136/bmj.f7171
23. Smith, D.M., Giobbie-Hurder, A., Weinberger, M., Oddone, E.Z., Henderson, W.G., Asch,
D.A., et al.: Predicting non-elective hospital readmissions: a multi-site study. Department of
veterans affairs cooperative study group on primary care and readmissions. J. Clin. Epidemiol.
53(11), 1113–1118 (2000)
24. Han, J., Choi, Y., Lee, C., et al.: Expression and regulation of inhibitor of dna binding proteins
id1, id2, id3, and id4 at the maternal-conceptus interface in pigs. Theriogenology 108, 46–55
(2018). https://doi.org/10.1016/j.theriogenology.2017.11.029
25. Jiang, L., Wang, D., Cai, Z., Yan, X.: Survey of improving naive bayes for classification. In:
Alhajj, R., Gao, H. et al., (eds). Lecture notes in computer science. Springer, (2007). https://
doi.org/10.1007/978-3-540-73871-8_14
26. Jianga, L., Zhang, L., Yu, L., Wang, D.: Class-specific attribute weighted naive bayes. Pattern
Recogn. 88, 321–330 (2019). https://doi.org/10.1016/j.patcog.2018.11.032
27. Han, L., Li, W., Su, Z.: An assertive reasoning method for emergency response management
based on knowledge elements c4.5 decision tree. Expert Syst Appl. 122, 65–74 (2019). https://
doi.org/10.1016/j.eswa.2018.12.042
28. Skriver M.V.J.K.K., Sandbæk, A., Støvring, H.: Relationship of hba1c variability, absolute
changes in hba1c, and all-cause mortality in type 2 diabetes: a danish population-based prospec-
tive observational study. Epidemiology. 3(1), 8 (2015). https://doi.org/10.1136/bmjdrc-2014-
000060
29. ADA: Economic costs of diabetes in the U.S. in 2012. Diabetes Care. (2013)
30. Landwehr, N., Hall, M., Frank, E.: Logistic model trees. Mach. Learn. 59, 161–205 (2005)
31. Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an
application to boosting. (1995)
32. Hastie, T., Tibshirani, R., Friedman, J.: Elements of statistical learning Ed. 2”, Springer, (2009)
33. Snehi, M., Bhandari, A.: A novel distributed stack ensembled meta-learning-based optimized
classification framework for real-time prolific IoT traffic streams. Arab. J. Sci. Eng. 47(8),
9907–9930 (2022)
34. Aksoy, A., Gunes, M.H.: Automated iot device identification using network traffic. In: ICC
2019–2019 IEEE international conference on communications (ICC). IEEE, (2019)
35. Cvitić, I., et al.: Ensemble machine learning approach for classification of IoT devices in smart
home. Int. J. Mach. Learn. Cybern. 12(11), 3179–3202 (2021)
120 S. K. Das et al.
36. Rey, V., Sánchez, P.M.S., Celdrán, A.H., Bovet, G.: Federated learning for malware detection
in IoT devices. Comput. Netw. 204, 108693 (2022)
37. Shenoy, M.V.: HFedDI: A novel privacy preserving horizontal federated learning based scheme
for IoT device identification. J. Netw. Comput. Appl. 214, 103616 (2023)
38. Msadek, N., Soua, R., Engel, T.: Iot device fingerprinting: Machine learning based encrypted
traffic analysis. In: 2019 IEEE wireless communications and networking conference (WCNC).
IEEE, (2019)
39. Miettinen, M., et al.: Iot sentinel: Automated device-type identification for security enforce-
ment in iot. In: 2017 IEEE 37th International Conference on Distributed Computing Systems
(ICDCS). IEEE, (2017)
40. Ullah, I., Mahmoud, Q.H.: Network traffic flow based machine learning technique for IoT
device identification. In: 2021 IEEE International Systems Conference (SysCon). IEEE, (2021)
41. Lee, S-Y. et al.: ProFiOt: Abnormal Behavior Profiling (ABP) of IoT devices based on a machine
learning approach. In: 2017 27th International telecommunication networks and applications
conference (ITNAC). IEEE, (2017)
42. Skowron, M., Janicki, A., Mazurczyk, W.: Traffic fingerprinting attacks on internet of things
using machine learning. IEEE Access 8, 20386–20400 (2020)
43. Shafagh, H., Hithnawi, A.: Poster: come closer: proximity-based authentication for the internet
of things. In: Proceedings of annual international conference on mobile computing and
networking, pp. 421–424. (2014)
44. Sheng, Y., Tan, K., Chen, G., Kotz, D., Campbell, A.: Detecting 802.11 mac layer spoofing
using received signal strength. In Proc of IEEE INFOCOM, pp. 1768–1776. IEEE, (2008)
45. Sivanathan, A., Gharakheili, H.H., Loi, F., Radford, A., Wijenayake, C., Vishwanath, A.,
Sivaraman, V.: Classifying IoT devices in smart environments us-48 ing network traffic
characteristics. IEEE Trans. Mob. Comput. 18(8), 1745–1759 (2018)
46. Bebortta, S., Singh, A.K., Senapati, D.: Performance analysis of multi-access edge computing
networks for heterogeneous IoT systems. Materials Today: Proceedings. 1(58), 267–272 (2022)
47. Bebortta, S., Dalabehera, A.R., Pati, B., Panigrahi, C.R., Nanda, G.R., Sahu, B., Senapati, D.:
An intelligent spatial stream processing framework for digital forensics amid the COVID-19
outbreak. Smart Health. 1(26), 100308 (2022)
48. Bebortta, S., Tripathy, S.S., Modibbo, U.M., Ali, I.: An optimal fog-cloud offloading framework
for big data optimization in heterogeneous IoT networks. Decis. Anal. Journal. 1(8), 100295
(2023)
49. Bebortta, S., Singh, A.K., Mohanty, S., Senapati, D.: Characterization of range for smart home
sensors using Tsallis’ entropy framework. In: Advanced computing and intelligent engineering:
proceedings of ICACIE 2018, vol. 2, pp. 265–276. Springer Singapore, Singapore (2020)
50. Bebortta, S., Singh, S.K.: An adaptive machine learning-based threat detection framework
for industrial communication networks. In: 2021 10th IEEE international conference on
communication systems and network technologies (CSNT), pp. 527–532. IEEE (2021)
51. Yun, J., Ahn, I.-Y., Song, J., Kim, J.: Implementation of sensing and actuation capabilities for
IoT devices using oneM2M platforms. Sensors 19(20), 4567 (2019)
52. Tripathy, S.S., Rath, M., Tripathy, N., Roy, D.S., Francis, J.S., Bebortta, S.: An intelligent
health care system in fog platform with optimized performance. Sustainability. 15(3), 1862
(2023)
53. sklearn.metrics.f1 score.: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_
score.html. [Online; Accessed 24 Mar 2022]
54. Bebortta, S., Senapati, D., Rajput, N.K., Singh, A.K., Rathi, V.K., Pandey, H.M., Jaiswal, A.K.,
Qian, J., Tiwari, P.: Evidence of power-law behavior in cognitive IoT applications. Neural
Comput. Appl. 32, 16043–16055 (2020)
55. Ahmad, T., Zhang, D.: Using the internet of things in smart energy systems and networks.
Sustain. Cities Soc., 102783 (2021)
56. Bebortta, S., Singh, S.K.: An opportunistic ensemble learning framework for network traffic
classification in IoT environments. In: Proceedings of the seventh international conference on
mathematics and computing: ICMC 2021, pp. 473–484. Springer Singapore, Singapore (2022)
4 Profiling and Classification of IoT Devices for Smart Home Environments 121
57. Bebortta, S., Senapati, D.: Empirical characterization of network traffic for reliable communi-
cation in IoT devices. Secur. Cyber-Phys. Syst.: Found. Appl., 67–90 (2021)
58. Bebortta, S., Panda, M., Panda, S.: Classification of pathological disorders in children using
random forest algorithm. In: 2020 International conference on emerging trends in information
technology and engineering (ic-ETITE), pp. 1–6. IEEE (2020)
59. Das, S.K., Bebortta, S.: Heralding the future of federated learning framework: architecture,
tools and future directions. In: 2021 11th International conference on cloud computing, data
science & engineering (Confluence), pp. 698–703. IEEE (2021)
60. Bebortta, S., Senapati, D.: Characterizing the epidemiological dynamics of COVID-19 using
a non-parametric framework. Curr. Sci. 122(7), 790 (2022)
61. Mukherjee, T., Bebortta, S., Senapati, D.: Stochastic modeling of q-Lognormal fading channels
over Tsallis’ entropy: Evaluation of channel capacity and higher order moments. Digit. Signal
Processing. 1(133), 103856 (2023)
62. Bebortta, S., Tripathy, S.S., Basheer, S., Chowdhary, C.L.: FedEHR: A federated learning
approach towards the prediction of heart diseases in IoT-based electronic health records.
Diagnostics. 13(20), 3166 (2023)
63. Bebortta, S., Rajput, N.K., Pati, B., Senapati, D.: A real-time smart waste management based
on cognitive IoT framework. In: Advances in electrical and computer technologies: select
proceedings of ICAECT 2019, pp. 407–414. Springer Singapore, Singapore (2020)
64. Bebortta, S., Singh, S.K.: An intelligent framework towards managing big data in internet
of healthcare things. In: International conference on computational intelligence in pattern
recognition, pp. 520–530. Springer Nature Singapore, Singapore (2022)
65. Bebortta, S., Singh, S.K., Rath, M., Mukherjee, T.: Dynamic framework towards sustainable
and energy-efficient routing in delay tolerant IoT-based WSNs. Int. J. Syst., Control. Commun.
15(1), 79–94 (2024)
66. Bebortta, S., Senapati, D.: Toward cost-aware computation offloading in IoT-based MEC
systems. Natl. Acad. Sci. Letters. 24, 1–4 (2023)
67. Bebortta, S., Das, S.K.: Assessing the impact of network performance on popular e-learning
applications. In: 2020 Sixth international conference on e-learning (econf), pp. 61–65. IEEE,
(2020)
68. Bebortta, S., Tripathy, S.S., Basheer, S., Chowdhary, C.L.: DeepMist: Towards deep learning
assisted mist computing framework for managing healthcare big data. IEEE Access., (2023)
69. Singh, A.K., Senapati, D., Bebortta, S., Rajput, N.K.: A non-stationary analysis of Erlang
loss model. In: Progress in advanced computing and intelligent engineering: proceedings of
ICACIE 2019, vol. 1, pp. 286–294. Springer Singapore (2021)
70. Tripathy, S.S., Bebortta, S., Gadekallu, T.R.: Sustainable fog-assisted intelligent monitoring
framework for consumer electronics in industry 5.0 applications. IEEE Trans. Consum.
Electron. (2023)
71. Bebortta, S., Senapati, D.: A secure blockchain-based solution for harnessing the future of smart
healthcare. In: InIoT-based data analytics for the healthcare industry, pp. 167–191. Academic
Press, (2021)
72. Bebortta, S., Das, S.K., Chakravarty, S.: Fog-enabled intelligent network intrusion detection
framework for internet of things applications. In: 2023 13th international conference on cloud
computing, data science & engineering (Confluence), pp 485–490. IEEE (2023)
73. Bebortta, S., Singh, S.K.: An intelligent network intrusion detection framework for reliable
UAV-based communication. In: International conference on cryptology & network security
with machine learning, pp. 169–177. Springer Nature Singapore, Singapore (2022)
74. Bebortta, S., Senapati, D.: Precision healthcare in the era of IoT and big data. Comput. Intell.
Aided Syst. Healthc. Domain. 14, 91 (2023)
75. Bebortta, S., Panda, T., Singh, S.K.: An intelligent hybrid intrusion detection system for internet
of things-based applications. In: 2023 International conference on network, multimedia and
information technology (NMITCON) (pp. 01–06). IEEE, (2023)
Chapter 5
Application of Machine Learning
to Improve Safety in the Wind Industry
Abstract The offshore wind industry has been gaining significant attention in recent
years, as the world looks to transition to more sustainable energy sources. While the
industry has successfully reduced costs and increased efficiency, there is still room for
improvement in terms of safety for workers. Using machine learning (ML) and deep
learning (DL) technologies can significantly improve offshore wind industry safety
by facilitating better accident prediction and failure prevention. The current study
aims to fill a significant gap in the existing literature by developing a useful selection
of machine learning models for simple implementation in the offshore wind industry.
These models will then be used to inform decision-making around safety measures,
such as scheduling maintenance or repairs or changing work practices to reduce
risk. The development of this tool has the potential to significantly contribute to the
long-term viability of the offshore wind industry and the protection of its workers. By
providing accurate predictions of potential accidents and failures, the tool can enable
companies to take proactive measures to prevent incidents from occurring, reducing
the risk of injury or death to workers and reducing the financial cost of accidents and
downtime. The chapter concludes with a summary of the present study’s research
challenge and the literature gaps. It highlights the importance of developing effective
machine learning models and implementing stricter data records to improve safety
in the offshore wind industry and the potential impact of these tools on the long-
term viability of the industry. The chapter also notes that the high performance of
selected models proves the reliability of the expected predictions and demonstrates
the effectiveness of machine learning models for decision- making around safety in
the offshore wind industry.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 123
J. Nayak et al. (eds.), Machine Learning for Cyber Physical System: Advances
and Challenges, Intelligent Systems Reference Library 60,
https://doi.org/10.1007/978-3-031-54038-7_5
124 B. D. Barouti and S. Kadry
5.1 Introduction
• Make a comparative study to select the most suitable predictive models to fit
certain types of datasets.
Physical work is the most challenging and demanding in an offshore wind farm.
Wearing safety suits and climbing ladders to the turbine with heavy equipment takes
a toll on workers. The work is physically demanding, with tasks involving work at
heights and inside installations. Transfer and access to facilities are also physically
and psychologically challenging. The risk of accidents and unstable weather leads
to mental stress. The 12-hour work shifts of workers are tedious and lengthy, often
involving overtime. Pressure to complete activities is very much present due to the
losses incurred with any downtime of the turbines. Each second spent rectifying a
wind turbine costs the wind farm company. Waiting times are also challenging in
offshore wind farms due to bad weather conditions. The occurrence of delays in
workflow due to modifications and the high costs involved is problematic.
Similarly, communication between offshore and onshore personnel is difficult.
Also, any medical emergency has limited treatment options and long emergency
routes. This brings risks to people working in offshore wind farms. Due to the complex
nature of work, many technicians and other personnel operating offshore wind farms
follow rotational shift schedules [5]. 12-hour shift rotations are standard at these
remote sites, making working schedules hectic. Another significant challenge with
offshore wind farms is downtime waiting for work during corrective maintenance.
In such a situation, shift rotation time can be prolonged due to the shortage of tech-
nical staff. Furthermore, scheduling conflicts can arise if fault duration is elongated,
contrary to what was set in the work orders. Nevertheless, with appropriate strategies,
all these challenges can be managed efficiently. Operation and maintenance costs are
among offshore wind farms’ most significant cost components.
One way to reduce costs is to make maintenance activities more efficient by
streamlining maintenance schedules and vessel routing. The European Committee
for Standardization categorizes maintenance activities for wind power systems into
corrective maintenance, preventive maintenance, and condition-based maintenance
[11]. Offshore wind farm Operations and Maintenance specific challenges can be
broadly listed as follows, according to [23].
• High crew dispatch costs: Assembling and deploying a maintenance team is
quite expensive since offshore turbines are frequently deployed offshore and in
remote locations where wind conditions are best.
• High production losses: As the scale and capacity of offshore turbines increase,
the cost of downtime is growing intolerably high due to the related production
losses of a failing ultra-scale offshore turbine.
• Limited accessibility: Access to turbines can frequently be restricted for extended
periods due to harsh weather and sea conditions, ranging from a few hours to many
days.
Scheduling activities, work orders and personnel rotation are all part of the operation
and maintenance of offshore wind farms. The men and women working in offshore
wind farms undergo many challenges to cope with their work. According to [8], the
126 B. D. Barouti and S. Kadry
employers’ offshore wind body covering offshore wind, the Total Recordable Injury
Rate (TRIR) was 3.28 while the number of Lost work day injuries was 50 for 2021
and, noticeably, the number of high potential incidents in the category’ civil works
onshore including excavations’ increased by 175% compared to 2020.
Following [20], implementing predictive indicators using predictive analytics
would benefit organizations trying to determine the likelihood of incidents occur-
ring. Although already somewhat implemented within the oil and gas sector, another
high-hazard industry, the wind industry, has not yet embraced machine learning
and predictive analytics. Machine Learning has been identified as a way to improve
process safety [2]. Similarly, some authors argue that machine learning will contribute
to learning from major accidents [26]. Although some industries, such as construc-
tion [28], automotive [4] and aerospace [19], have adopted machine learning for
safety. The offshore wind industry is relatively new and could benefit from the same
application of Machine Learning, specifically to improve the safety of the personnel
working in this industry, from installation to operation and maintenance of those
assets.
This section describes various literature works carried on offshore wind industry
along with its challenges, challenges of traditional safety management practices,
various machine learning and deep learning approaches used in the offshore wind
industry along with its challenges.
5.2.1 Context
The number of offshore wind energy installations and activities has increased dramat-
ically during the last several years. This sector offers a steady supply of clean energy,
but such facilities’ building, installation, operation, and maintenance are dangerous.
Therefore, novel and efficient approaches to enhance safety in the sector are urgently
required [3]. It has been suggested that machine learning (ML) and deep learning
(DL) methods might help with this issue. The offshore wind sector may utilize these
techniques to create prediction models to help them avoid accidents and keep workers
safe. This study intends to remedy the wind industry’s sluggish adoption of ML/DL
to ensure worker safety [7].
The created application is a GUI-based model selection tool for machine learning
and deep learning. With this graphical user interface, customers may choose the
most appropriate prediction model for their requirements and understand the results.
Weather occurrences, equipment failures, or other possible threats to safety may all
be anticipated with the help of this program. Overall, ML/DL’s use in the offshore
5 Application of Machine Learning to Improve Safety in the Wind Industry 127
wind business has the potential to improve worker safety significantly. The contin-
uing success of this vital sector of the economy depends on our ability to create
accurate predictive models that can be used to identify and eliminate possible threats
to employees. The rapid expansion of wind turbine installation has introduced new
hazards that must be mitigated for the sake of worker welfare and the long-term health
of the business sector. Machine learning and deep learning technologies provide a
possible answer to these issues by allowing the prediction and prevention of accidents
and equipment breakdowns [13].
This study of the relevant literature introduces the reader to the offshore wind
business, its safety problems, and the conventional safety management practices now
in use. The reader is also given an overview of machine learning and deep learning
and an examination of recent research on the use of ML to enhance safety in various
sectors [12]. Studies on wind turbine failure prediction, structural health monitoring,
and blade icing detection are examples of how ML has been used in the offshore wind
sector and are included in this overview. Challenges and restrictions of using ML
in the offshore wind business are explored, including data and computing resource
scarcity and the difficulty of understanding and explaining ML results. Finally, the
possible advantages of ML in the offshore wind business are highlighted. These
advantages include enhanced safety, decreased maintenance costs, and enhanced
efficiency. The review finishes with a synopsis of the research challenge addressed
by the present study and the gaps in the existing literature. Challenges and restrictions
of using ML in the offshore wind business are explored, including data and computing
resource scarcity and the difficulty of understanding and explaining ML results.
turbines and transported to land, are not correctly maintained or separated. Trans-
portation and logistics are other areas where the offshore wind business has diffi-
culties. Offshore installations rely on ships or barge to transfer large components
like blades, nacelles, and towers, which things like bad weather might delay. This
increases the potential for accidents and equipment failure due to delayed or disrupted
maintenance schedules [16]. Improving safety planning, identifying possible risks,
and forecasting equipment breakdowns are all areas where machine learning and
other cutting-edge technology might be helpful. Table 5.1 presents the analysis of
various works carried out on the offshore wind industry.
The offshore wind industry’s standard procedures for managing risk have always
included several safeguards designed to protect employees and bystanders. Risk
analyses, safety checks, contingency plans, and employee education and training are
all examples of such procedures. The offshore wind industry relies heavily on risk
assessments for its safety management [18]. Assessing risk entails seeing prospective
threats and weighing them against their probability and potential impact. Before
installing wind turbines and periodically afterwards, it is common to practice doing
a risk assessment. In the offshore wind business, safety checks are also a common
practice. Wind turbines undergo regular inspections by qualified workers to check
for damage, wear, and other potential safety hazards. In addition to routine checks,
occasional deep checks may be performed [18].
5 Application of Machine Learning to Improve Safety in the Wind Industry 129
Safety management in the offshore wind sector should also include emergency
response planning. Emergency procedures, including those for dealing with fires
and turbine failures, are developed as part of these plans. Worker preparedness and
coordination with local emergency services are essential to any disaster response
strategy. As a last point, safety management in the offshore wind business relies
heavily on employee training programs. Workers are educated on several safety
aspects, such as PPE usage, danger identification, and emergency protocol, as part of
these programs. Traditional safety management practices are crucial for protecting
employees and the general public [18]. However, as the offshore wind sector grows
and changes, so may the need for innovative solutions to new safety issues. Offshore
wind farms might benefit from incorporating machine learning and other cutting-edge
technology into their already established safety management procedures.
Traditional safety management in the offshore wind sector includes the prac-
tices above and careful adherence to industry rules and regulations (Table 5.2).
For example, the International Electrotechnical Commission (IEC) and the Occupa-
tional Safety and Health Administration (OSHA) are two organizations and bodies
responsible for developing such standards and rules. Wind turbines and associated
equipment may be more confidently relied upon if built, installed, and maintained
following these guidelines [21] (Table 5.3). Using safety equipment and protective
clothing is also an important part of the conventional approach to safety management
in the offshore wind sector. During turbine construction and maintenance, workers
must wear personal protection equipment (PPE) such as hard helmets, safety glasses,
and harnesses to reduce the likelihood of harm. Safety features like emergency stop
buttons, fire suppression, and lightning protection systems may also be added to wind
turbines.
The offshore wind sector still faces safety difficulties, notwithstanding the success
of conventional safety management practices (Table 5.4). For instance, equipment
Table 5.4 Challenges in traditional safety management practices and potential solutions
Reference Description Potential solutions
Olguin et al. Difficulty in Utilize machine learning and cutting-edge technologies to
[22] forecasting safety enhance predictive skills, identify potential hazards in
hazards in harsh advance, and optimize maintenance schedules for reduced
sea environment accident risk
breakdowns and other safety dangers might be hard to forecast in the harsh and
dynamic sea environment. By enhancing predictive skills, spotting possible hazards
in advance, and optimizing maintenance schedules to reduce accident risk, machine
learning and other cutting-edge technologies may assist in solving these issues
[22]. Conventional safety management practices are crucial to ensure the safety of
employees and the general public in the offshore wind business. However, to guar-
antee the sustained safety and success of offshore wind projects, the industry must
stay watchful and adaptive in the face of increasing safety problems and be open to
incorporating new technology and practices.
Algorithms and models that can learn from data and make predictions or judgments
based on that data are the focus of both machine learning (ML) and deep Learning
(DL), two subfields of artificial intelligence (AI) that are closely connected. The
offshore wind sector is only one area where ML and DL technologies are finding
widespread usage. Building algorithms and models that can automatically learn
from data is a critical component of ML. In other words, ML algorithms may learn
from existing data, apply that knowledge to new data, and then make predictions
or choices. Examples of popular ML algorithms include decision trees, SVMs, and
neural networks [36].
Creating algorithms and models that perform as the human brain does is DL’s
goal, a subfield of ML. Deep learning algorithms use simulated neural networks
to sift through data, draw conclusions, or make predictions. DL algorithms shine
5 Application of Machine Learning to Improve Safety in the Wind Industry 131
when analyzing multifaceted data types like photos, voice, and natural language.
The capacity to swiftly and effectively analyze vast and complicated datasets is
a significant benefit of ML and DL technology. This may aid with the discovery
of trends and patterns that human analysts would miss, leading to better decisions
across many domains [33]. ML and DL technologies have several potential appli-
cations in offshore wind, including failure prediction, maintenance optimization,
and enhanced safety planning. To forecast when maintenance is needed, ML algo-
rithms may examine data collected by sensors installed on wind turbines to look
for red flags that suggest impending failure. Images and videos captured at offshore
wind farms may be analyzed using DL algorithms to spot possible dangers, such as
personnel doing activities in risky environments. The offshore wind business is just
one area where ML and DL technologies may enhance productivity, security, and
decision-making. It’s safe to assume that as these technologies advance, they’ll gain
significance across various sectors and use cases. In Table 5.5, an overview of ML
and DL approaches has been presented.
Table 5.5 Overview of Machine Learning (ML) and Deep Learning (DL) technologies
Reference Aspect Description
Zulu et al. Definition of ML ML focuses on algorithms and models that learn from
[36] algorithms data to make predictions or judgments. Popular ML
algorithms include decision trees, SVMs, and neural
networks
Yeter et al. Definition of DL DL, a subfield of ML, aims to create algorithms and
[33] algorithms models that mimic the human brain. DL algorithms use
simulated neural networks to analyze complex data types
like photos, voice, and natural language
Zulu et al. Applications in ML and DL technologies have potential applications in
[36] offshore wind the offshore wind sector, including failure prediction,
maintenance optimization, and enhanced safety planning
Zulu et al. ML in offshore wind ML algorithms can analyze sensor data from wind
[36] turbines to predict maintenance needs and detect red flags
indicating potential failures
Yeter et al. DL in offshore wind DL algorithms can analyze images and videos from
[33] offshore wind farms to identify potential safety risks,
such as personnel in hazardous environments
Zulu et al. Overall impact and ML and DL technologies are expected to play a
[36] future trends significant role in enhancing productivity, security, and
decision-making across various sectors as they continue
to advance
132 B. D. Barouti and S. Kadry
Several research projects have looked at how machine learning may be used to
improve industry-wide safety. Some of the more notable instances include.
• Machine learning algorithms have been used to predict patient outcomes and
spot hidden health hazards in the healthcare sector. In the healthcare industry,
ML algorithms have been used to analyze patient data and predict, for example,
hospital readmission rates and illness risk [32].
• Machine learning algorithms may detect faulty machinery and anticipate service
requirements in the industrial sector. For instance, ML systems may examine
sensor data from machinery to forecast when maintenance is needed to discover
patterns that may suggest possible equipment breakdowns [25].
• Safer mobility is possible using ML algorithms in the transportation sector. For
instance, ML systems may examine vehicle sensor data to spot red flags like
unexpected stops or erratic driving that might threaten passengers [6].
• Safer building sites may be achieved via the application of ML algorithms by the
construction sector. ML systems may analyze image and videos from construction
site to spot employees in hazardous situations [1].
These studies show that machine learning can potentially enhance safety across many
sectors. Machine learning algorithms can produce more accurate predictions and
inferences by analyzing massive datasets in ways that would be impossible for human
analysts. The offshore wind business is no exception, and there is a rising interest in
using machine learning to increase safety. Various studies carried on the enhancement
of safety in various industries using ML approaches has been represented in Table 5.6.
Several studies in recent years have looked at the possibility of using machine learning
(ML) in the offshore wind business to boost efficiency, save costs, and promote safety.
In particular, as will be seen below, ML has been used in predicting failures in wind
turbines, checking on structural health, and identifying instances of blade icing.
• Prediction of Wind Turbine Failures: Several researchers have looked at the
possibility of using ML algorithms to foresee breakdowns in wind turbines. For
instance, SsCADA data and a long short-term memory (LSTM) neural network
were used to predict gearbox breakdowns in wind turbines [32]. Li et al. [15] also
employed vibration data and a stacked auto-encoder to foresee bearing failures in
wind turbine generators. These results show the promise of ML in predicting and
avoiding problems in offshore wind turbine components.
• Monitoring of Structural Health: Offshore wind turbines’ structural health has
also been tracked using ML. For instance, Zhu and Liu [35] analyzed spectrogram
5 Application of Machine Learning to Improve Safety in the Wind Industry 133
imagery using a convolutional neural network (CNN) to detect fractures and other
structural flaws in wind turbine blades. Similarly, the results of a study demon-
strated that a proposed framework could identify blade cracks using unmanned
active blade data and artificially generated images [29]. These results show the
potential of ML as a technique for proactively assessing the structural health of
offshore wind turbines and preventing catastrophic failure.
• Detection of Blade Icing: Blade icing detection is another area where ML has
found use in the offshore wind sector. The probability of ice shedding from wind
turbines may be increased by icing, which reduces their performance and poses
safety risks. Several researchers have looked at the possibility of using ML algo-
rithms to identify blade icing as a solution to this problem. Using deep neural
networks and wavelet transformation [34] could detect icing on wind turbine
blades using a classification anomaly detection system. These results show how
ML may be used to identify and prevent blade icing on offshore wind turbines,
increasing their efficiency and safety.
Table 5.7 represents various studies carried on Offshore wind industry using ML
techniques. Overall, these studies show how ML has the potential to enhance safety
and efficiency in the offshore wind sector via the detection of ice on turbine blades,
the prediction of equipment failures, and the monitoring of structural health. The
134 B. D. Barouti and S. Kadry
The offshore wind business might gain immensely by implementing ML, but several
obstacles and restrictions must be overcome first. Below are mentioned some of the
major barriers that have arisen due to the implementation of ML in the offshore wind
sector (Table 5.8).
5 Application of Machine Learning to Improve Safety in the Wind Industry 135
• Lack of Data: The paucity of data is a major obstacle to use ML in the offshore
wind business. Due to the novelty of offshore wind turbines, information on their
operation and performance is typically scant. ML algorithms need enormous
volumes of data to discover significant patterns and generate accurate predic-
tions, so it might be challenging to train them properly [30]. Researchers are also
considering using sensors, drones, and other monitoring technologies to solve this
problem.
• Computational Resources: The computing resources needed to analyze vast and
complicated datasets are another difficulty when using ML in the offshore wind
136 B. D. Barouti and S. Kadry
Worker, contractor, and visitor safety is a top priority in the wind energy sector.
Measuring and monitoring safety performance across industries requires the use
of safety metrics. Common applications for these measures include problem area
identification and creating plans to enhance worker safety. The Total Recordable
Incident Rate (TRIR) is one of the most popular safety indicators used in the wind
business. Injuries and illnesses are included in the total recordable incident rate
(TRIR), expressed as a number per 200,000 hours worked. The TRIR is a valuable
tool for comparing and contrasting firms’ and sectors’ safety records, and it is used
by a broad range of businesses, including the wind energy sector.
The LTIR, or Lost Time Incident Rate, is another crucial indicator of wind sector
safety. The number of occurrences leading to missed workdays per 200,000 hours
worked constitutes the LTIR. The LTIR is often used to gauge the seriousness of
accidents regarding lost time and productivity; it is a more nuanced indicator than
the TRIR. The wind industry also uses the severity rate (SR), quantifying the severity
of injuries and illnesses, and the near-miss rate (NMR), estimating the number of
near-misses that did not result in injuries or illnesses. However, these indicators do
not provide a whole picture of safety performance and have shortcomings. These
indicators, for instance, do not assess the value of safety initiatives or the bearing
of safety culture on safety results. In addition, these indicators can only be used
as a health check of any organization or group thereof. They do not provide the
5 Application of Machine Learning to Improve Safety in the Wind Industry 137
necessary information to specifically assign resources to the areas that would prove
most beneficial. One of the risks of using these kinds of indicators is that minor
incidents (with the potential to become major accidents and fatalities) are not always
recorded, thus giving an inaccurate reflection of the organization’s real performance.
While conventional safety criteria have been employed in the wind sector for a while,
they have drawbacks and cannot guarantee a completely risk-free environment. The
following are examples of restrictions and difficulties.
• Traditional safety metrics are reactive in nature [9], as they seek to analyze histor-
ical events and accidents to determine safety patterns and enhance safety precau-
tions. This method is reactive rather than proactive and therefore does not permit
the early detection of possible security flaws.
• Traditional safety measurements are typically based on partial data, which may
lead to erroneous conclusions about a system’s safety. Some occurrences involving
safety may not be recorded, and not all safety-related tasks may have their data
gathered.
• Due to the lack of a universally accepted set of safety criteria in the wind sector,
it is difficult to establish meaningful comparisons between the safety records of
various projects and businesses.
• Traditional safety measurements, which emphasize trailing indications like injury
rates, fail to capture the whole picture of safety performance [17]. Safety culture,
safety leadership, and safe practices are examples of leading indicators that are
poorly recorded.
• Traditional safety measurements frequently have a narrow emphasis, considering
just the risks associated with one or two areas of a wind farm’s operations. Thus,
there may be blind spots in terms of safety if we stick to this method.
• Traditional safety measures typically fail to engage employees in safety manage-
ment due to a lack of motivation. Employees may not participate in safety initia-
tives because they see safety measures as a tool for management rather than a tool
to enhance safety performance.
• Predicting future safety hazards is challenging since traditional safety indicators
do not consider industry shifts or emerging threats. This means they may not be
reliable indicators of impending danger.
• Technology to enhance safety performance is typically overlooked in conven-
tional safety measurements. Proactive safety management is made possible by
innovations like artificial intelligence and machine learning, which give real-time
data on potential hazards.
138 B. D. Barouti and S. Kadry
In summary, the limits and difficulties of existing safety criteria render them sub-
optimal for assuring the best safety for personnel in the wind industry. Finding
and fixing these weaknesses is crucial for enhancing safety performance. Interest
in using cutting-edge technology like machine learning to strengthen wind sector
safety measures has risen in recent years. Using these innovations, we may be able
to gauge safety performance better, discover new areas of concern, and devise more
efficient methods for enhancing it.
Machine learning (ML) can enhance safety, decrease maintenance costs, and boost
productivity in the offshore wind sector. Some possible gains from using ML in
offshore wind are listed.
• Improved Operations: ML may increase safety in the offshore wind sector
through predictive maintenance and early warning of possible dangers [22].
Machine learning algorithms can examine data collected by sensors and other
monitoring systems to predict when a piece of machinery may break down. This
may aid in preventing accidents and unscheduled downtime for offshore wind
enterprises.
• Reduced Maintenance Costs: ML may provide more focused and effective main-
tenance actions in the offshore wind business, hence lowering maintenance costs.
Offshore wind businesses may save money by planning for maintenance using ML
algorithms to forecast when their equipment will go down. In addition, ML may
be used to improve maintenance procedures by, for example, pinpointing when
parts should be replaced or investigating the reason for equipment breakdowns
[21].
• Increased Efficiency: ML may positively impact productivity by streamlining
operations and decreasing downtime in the offshore wind sector. By analyzing
variables like wind speed and direction in real-time, ML algorithms may improve
the performance of wind turbines, for instance [22]. This may aid offshore wind
farms in optimizing energy output while minimizing losses due to inclement
weather.
• Incident prevention: With the use of machine learning algorithms, safety parame-
ters can be tracked in real-time, allowing for immediate responses to any emerging
threats. Potential risks may be detected and handled before they become an issue,
leading to more proactive safety management methods [31].
• Better safety culture: Using machine learning to monitor safety parameters helps
businesses foster an environment where security is a top priority and is rigor-
ously administered. The result may be a more productive workforce and a safer
workplace [14].
5 Application of Machine Learning to Improve Safety in the Wind Industry 139
Table 5.9 Potential benefits of ML and deep learning in the wind industry
Reference Problem definition Findings Advantage Limitation
Olguin et al. Improved Predictive Enhanced safety Dependence
[22] operations maintenance and through on sensor data
early warning of preventive for accuracy
potential dangers measures
Mitchell et al. Reduced More focused and Cost savings Relies on
[21] maintenance costs effective through optimized accurate
maintenance maintenance prediction of
actions, lower planning equipment
maintenance costs failures
Olguin et al. Increased Streamlined Enhanced Sensitivity to
[22] efficiency operations, productivity real-time data
decreased through optimized accuracy
downtime, energy output
improved wind
turbine
performance
Xu and Saleh Incident Real-time tracking Proactive safety Accuracy and
[31] prevention of safety management timeliness of
parameters, methods with data crucial
detection, and immediate
handling of responses
potential risks
Le Coze and Better safety Fostering a Improved Requires
Antonsen [14] culture safety-focused workforce cultural
environment, productivity and adaptation and
resulting in a more safer work acceptance
productive and environment
safer workplace
Taherdoost, [27] Improved Enhanced Better-informed Dependence
decision-making decision-making, decision-makers on quality and
improved safety leading to relevance of
management improved safety data
strategies practices
140 B. D. Barouti and S. Kadry
acquire the skills to properly install and oversee ML algorithms if they want to reap
the advantages of this technology.
Research shows that the offshore wind sector might benefit greatly from using
machine learning and deep learning techniques to enhance safety. There is a shortage
of widespread usage of ML/DL for worker safety in the sector, despite some research
looking at its potential for forecasting wind turbine failures, monitoring structural
health, and identifying blade icing. The literature has also brought to light many diffi-
culties and restrictions connected to using Machine Learning in the offshore wind
business, including a shortage of data and computing resources and questions about
the interpretability and explainability of Machine Learning algorithms (Table 5.10).
In addition, while there has been progress in the wind business in using machine
learning, there has not been as much study of machine learning’s potential impact on
safety metrics. Most prior research has been on the application of machine learning to
the problems of failure prediction, structural health monitoring, and blade icing detec-
tion in wind turbines. However, a more significant investigation into the potential of
machine learning to enhance safety metrics in the wind business is required.
The present study seeks to remedy the knowledge gap caused by the offshore
wind industry’s inconsistent use of ML/DL to ensure worker safety. The researcher
has performed a comparative evaluation of commonly available and used machine
learning models. It then establishes guidelines to select the best model (performance)
for a given data set. The study aims to improve offshore wind sector safety by
facilitating better use of ML and DL technologies in accident prediction and failure
prevention. Because of the enormous stakes in human and environmental safety in
the offshore wind sector, this research topic is all the more pressing [22]. The number
of wind turbines built in offshore areas has increased significantly in recent years,
indicating the industry’s rapid development. However, new hazards and difficulties
have emerged alongside this expansion, and they must be resolved for the sake of
worker welfare and the sector’s long-term health. It is crucial to address the current
lack of widespread adoption of ML/DL, given the potential advantages of doing so for
increasing safety in the offshore wind sector. The present study addresses a significant
gap in the literature, and the creation of a useful ML/DL tool to enhance offshore
wind industry safety has the potential to significantly contribute to the long-term
viability of the industry and the protection of its workers.
5 Application of Machine Learning to Improve Safety in the Wind Industry 141
In this chapter, after performing detailed analysis and text cleaning, we evaluated the
dataset in three ways. As our dataset is highly imbalanced, (1) we tested our models
on the original dataset that is imbalanced, (2) we tested our models on undersampled
datasets, and (3) we tested our models on oversampled datasets. Sampling techniques
details are mentioned.
142 B. D. Barouti and S. Kadry
The dataset has been provided by an Offshore wind company as an extract from the
incident reporting system (This tool serves to document incidents, near-misses, and
observations, facilitating the analysis of reported occurrences. It helps identify under-
lying causes and implement corrective measures to enhance safety performance.
However, it should be noted that the tool does not offer predictive or prescriptive
analysis based on the gathered data.). The dataset is composed of 2892 rows and 12
columns and represent the data collected from January 2020 to December 2021.
the majority class. Undersampling aims to balance the class distribution by randomly
removing instances from the majority class to match the number of instances in the
minority class.
Advantages of Undersampling
• It reduces the size of the dataset, which can help reduce the computational time
and resources needed for training a model.
• It can improve the performance of machine learning algorithms on imbalanced
datasets by pro- viding a more balanced class distribution.
Disadvantages of undersampling
• It may lead to the loss of potentially important information as instances from the
majority class are removed.
• There is a risk of underfitting, as the reduced dataset may not represent the overall
population.
To balance the dataset, we use the resampling technique to upsample both minority
classes (1 and 2) to match the number of instances in the majority class (0).
(a) (b)
• SMOTE can generate noisy samples if the minority class instances are too close to
the majority class instances in the feature space, which may decrease the model’s
performance.
Figure 5.1 shows the impact of use of SMOTE on your datasets by displaying class
distribution before and after applying SMOTE.
5.4 Models
This section describes various models used in the enhancement of safety in wind
industry.
Figure 5.2 shows the Machine Learning based model with integration of SMOTE.
Logistic Regression is a linear model used for binary classification tasks. However,
it can be extended to handle multi-class problems using the one-vs-rest (OvR) or
the one-vs-one (OvO) approach. In the OvR approach, a separate logistic regression
model is trained for each class, with the target label being the class itself versus all
other classes combined. In the OvO approach, a model is trained for each pair of
5 Application of Machine Learning to Improve Safety in the Wind Industry 145
classes. During prediction, the class with the highest probability among all models is
assigned to the instance. Logistic Regression is simple, easy to interpret, and works
well when the features and target relationship is approximately linear.
Ridge Classifier is a linear classification model that uses Ridge Regression (L2 regu-
larization) to find the optimal weights for the features. It can handle multi-class
problems using the one-vs-rest approach, similar to Logistic Regression. For each
class, a Ridge Classifier model is trained to separate that class from the rest. The
class with the highest decision function score is then assigned to the instance. Ridge
Classifier can handle multicollinearity in the feature space and is less sensitive to
overfitting than unregularized linear models.to overfitting than unregularized linear
models.
is simple, easy to understand, and works well for problems with complex decision
boundaries.
Support Vector Classifier is a powerful classification algorithm that finds the optimal
hyper-plane that separates the classes in the feature spaceSVC or multi-class problem
SVCV.C. typically uses the one-vs-one approach. It trains a separate model for each
pair of classes, resulting in n*(n–1)/2 classifiers for n classes. During prediction, each
classifier votes for the class it was trained to identify, and the class with the most vote
SVCs assigned to the instan SVCV.C. can handle non-linear problems using kernel
functions such as the Radial Basis Function (RBF) kernel. It is robust to overfitting
and works well for high-dimensional data and complex decision boundaries.
stability and accuracy of the predictions. For multi-class problems, the bagging
CClassifier trains multiple base models, each on a bootstrapped sample of the dataset,
and combines their predictions using majority voting. The algorithm reduces the vari-
ance of the base models by averaging their predictions, leading to better generaliza-
tion and performance. Bagging classifiers work well with non-linear, high-variance
base models and can effectively handle non-linear relationships between features
and target variables.
employs a one-vs-all approach, training binary classifiers for each class. Each CClas-
sifier is fitted to the negative gradient of the logarithmic loss function, focusing on
reducing the misclassification error. The final prediction is made using a weighted
combination of the classifiers’ decisions, with the weights determined by the classi-
fiers’ performance. Gradient Boosting is highly adaptable and can handle non-linear
relationships effectively, often achieving improved performance compared to single
decision trees and other boosting methods.
XBG is a gradient boosting-based algorithm that aims to optimize both the model
performance and computational efficiency. For multi-class problems, XGB uses a
one-vs-all approach, training binary classifiers for each class. XGB employs a unique
regularization term in its objective function, which controls the complexity of the
trees and reduces overfitting. It also uses advanced techniques, such as column block
and cache-aware access patterns, to improve the training speed. XGB is highly scal-
able, robust to overfitting, and can handle non-linear relationships effectively, often
5 Application of Machine Learning to Improve Safety in the Wind Industry 149
adjust the learning rate dynamically based on the validation loss. The model is trained
on the dataset using a batch size of 32 and a total of 100 epochs, with the training
history recorded to analyze the model’s performance.
Neural Network Model-2
The Sequential neural network model comprises three (3) dense layers, with dropout
and batch normalization layers in between. The first dense layer has ten (10) neurons,
a ReLU activation function, He uniform initializer, L2 regularization, and a unit
norm kernel constraint. Following this layer, there’s a 20% dropout layer and a batch
normalization layer. The second dense layer also has ten (10) neurons, a ReLU acti-
vation function, He uniform initializer, L2 regularization, and a unit norm kernel
constraint, followed by a 50% dropout layer and a batch normalization layer. The
final dense layer has three (3) neurons, a SoftMax activation function for multi-class
classification, L2 regularization, and a unit norm kernel constraint. The model is
compiled using the Stochastic Gradient Descent (SGD) optimizer with a learning rate
of 0.001 and momentum of 0.9. The loss function used is categorical cross-entropy,
and the metric is categorical accuracy. Early stopping and ReduceLROnPlateau call-
backs are employed to prevent overfitting and dynamically adjust the learning rate
based on validation loss. Custom Metrics class is used to record performance during
training. The model is trained on the dataset for 100 epochs with a batch size of 32,
with training history recorded to analyze the model’s performance.
5 Application of Machine Learning to Improve Safety in the Wind Industry 151
Figure 5.4 shows Deep Learning based model with integration of SMOTE.
We have evaluated sequential-based deep neural networks Long Short-Term
Memory (LSTM). The model’s performances were evaluated on the original dataset
without sampling. We have built three variations of models.
In the case of “cause”, we are not dealing with the majority class (here the “accident
severity level” and using undersampling with our minority class “Cause”), which
already contains many classes, will create even more unbalance in our dataset and
warp the results of our models.
Table 5.13 presents the results of several machine learning models’ performance
on an undersampled dataset. The evaluation metrics include train and test accuracy,
precision, recall, F1-score, and multi-class log loss. The F1-score is particularly
154 B. D. Barouti and S. Kadry
Table 5.11 “Severity level” ML models results with the original dataset
Method Train Test Precision Recall F1 score Multi-class
accuracy accuracy log loss
Logistic 0.9956766 0.9430052 0.9432653 0.9430052 0.9428073 0.1299078
regression
Ridge 0.997406 0.9395509 0.9379522 0.9395509 0.9381799 1
classifier
K-Neighbors 0.9213143 0.8981002 0.8995582 0.8981002 0.8794643 2.285096
classifier
SVC 0.9783831 0.9395509 0.9415076 0.9395509 0.9351979 0.1425599
Decision tree 1 0.9412781 0.9423546 0.9412781 0.9415844 2.0281838
classifier
Random 0.997406 0.9360967 0.9340535 0.9360967 0.9323823 0.5341985
forest
classifier
Bagging 0.998703 0.9395509 0.9380643 0.9395509 0.9378462 0.2753485
classifier
Extra trees 0.9995677 0.9481865 0.9502953 0.9481865 0.9448554 0.1759012
classifier
AdaBoost 0.9468223 0.9395509 0.9391244 0.9395509 0.9360264 0.7060568
classifier
Gradient 0.9580631 0.9430052 0.9430966 0.9430052 0.9396935 0.1319889
boosting
Classifier
CatBoost 0.9969736 0.9464594 0.9453536 0.9464594 0.944641 0.136145
classifier
LGBM 1 0.9343696 0.9354349 0.9343696 0.9341319 0.2004699
classifier
XGB 0.9753567 0.9343696 0.933646 0.9343696 0.932742 0.1416465
classifier
Table 5.14 presents the performance metrics of various classification models trained
on a dataset. The metrics include train and test accuracy, precision, recall, F1-score,
and multi-class log loss. The higher the F1-score, the better the model’s performance.
From the results, Random Forest Classifier, Extra Trees Classifier, and LGBM Classi-
fier have the highest F1-scores of 0.9319914, 0.944082, and 0.9426346, respectively.
These models perform well in terms of both precision and recall. On the other hand,
Ridge Classifier and SVC have the lowest F1-scores of 0.0032996 and 0.0343382,
respectively, indicating poor performance.
156 B. D. Barouti and S. Kadry
and SVC have very low F1-Scores, indicating that they are not performing well in
balancing precision and recall.
When analyzing these results, it is also essential to consider other performance
metrics, such as Test Accuracy and Multi-Class Log loss, to comprehensively under-
stand the classifiers’ performance. Test Accuracy represents the proportion of correct
predictions, while Multi-Class Log loss measures the classifiers’ prediction proba-
bilities’ quality. It becomes apparent here that the difference in the number of classes
for the “Cause” attribute leads to poor performance of the classifier models.
5 Application of Machine Learning to Improve Safety in the Wind Industry 159
Table 5.16 presents the performance metrics of three neural network models on a
dataset. The metrics include test accuracy, precision, recall, and F1-score. Precision
measures the proportion of true positive predictions among all positive predictions
made by a model. A higher precision indicates that the model correctly identifies
more positive instances and minimizes false positives. From the results, Model-2 has
the highest precision of 0.924324, followed by Model-1 with a precision of 0.922280,
while Model-3 has a significantly lower precision of 0.288690. Model-2 is the most
accurate in identifying positive instances without making too many false-positive
predictions. It is, however, essential to consider other performance metrics like recall
and F1-score when evaluating the overall performance of a model. Model-1, with
an F1-score of 0.922280, demonstrates a balanced performance between precision
and recall. In contrast, Model-3 has a low F1-score of 0.212022, indicating poor
performance in terms of both precision and recall.
Table 5.17 shows the performance metrics of three different neural network feed-
forward classification models on imbalanced dataset. Model-2 has the highest test
accuracy (0.201923) and F1-score (0.010500) among the three models. Although the
F1-score is low, Model-2 outperforms Model-1 and Model-3 in terms of precision
and recall, which suggests it is the best choice among these options. Model-1, on the
other hand, exhibits very low values for all metrics, indicating that it is not a suitable
choice for this problem. Model-3 performs marginally better than Model-1, but its
F1-score is still lower than Model-2’s. Again, It becomes apparent that the difference
in the number of classes for the “Cause” attribute leads to poor performance of the
Neural Networks models.
Table 5.18 show varying performance levels across different evaluation metrics for
three (3) deep learning models.
Model-1 has a training accuracy of 0.81 and a test accuracy of 0.818653, with
precision, recall, and F1-score being the same value of 0.818653. This indicates a
well-generalized balanced performance in identifying true positives and false posi-
tives but overall lower accuracy than the other models. Model-2 has the highest
training accuracy of 0.94 and a test accuracy of 0.894646, with a precision of
0.916814, recall of 0.894646, and F1-score of 0.905594. These results suggest that
Model-2 is the best-performing model among the three, achieving a good balance
between precision and recall while maintaining high accuracy. However, the differ-
ence between training and test accuracy implies potential overfitting. Model-3 has
a training accuracy of 0.91 and a test accuracy of 0.873921, with a precision of
0.890845, recall of 0.873921, and F1-score of 0.882302. While Model-3’s perfor-
mance is slightly lower than Model-2, it shows less overfitting, indicating a more
generalizable model. In conclusion, Model-2 performs best in accuracy and F1-score,
but Model-3 might be more reliable when considering overfitting concerns.
Table 5.19 shows the performance metrics of three different classification models
on a dataset. Model-3 has the highest test accuracy (0.240385) and F1-score
(0.019339) among the three models. Although the F1-score is relatively low, Model-
3 outperforms Model-1 and Model-2 in terms of precision, recall, and test accuracy,
which suggests it is the best choice among these options. Model-1 and Model-2
exhibit similar performance across all metrics, with only marginal differences in
their F1-scores.
• Original Data
– Train Accuracy: Models such as DecisionTree Classifier, Random Forest
Classifier, and Extra Trees Classifier achieved a perfect 1.0 on training
accuracy, which might suggest overfitting.
– Test Accuracy: Logistic Regression had the highest test accuracy, closely
followed by Ridge Classifier and Random Forest Classifier.
– F1-Score: Random Forest Classifier led with the highest F1-Score, indicating
a good balance between precision and recall.
• Sampling Data
– Varied Train and Test Accuracy: There were discrepancies in the train and
test accuracies across models, with some, like SVC, showing a large drop,
which may indicate overfitting.
– F1-Score: The F1-Scores are generally lower than those observed with the
original data, which could suggest that the sampling technique might not be
improving the model’s ability to generalize.
• SMOTE Data
– Improved Test Accuracy: The use of SMOTE appears to have improved
the test accuracy for models like Logistic Regression and Ridge Classifier
compared to the original data.
– F1-Score Improvement: Models generally showed improved F1-Scores with
SMOTE, suggesting better performance on the minority class.
• Hyperparameter Tuning
– Enhanced Performance: Hyperparameter tuning likely enhanced model
performance metrics across the board, although specific details were not noted.
Machine learning and deep learning models perform well on the imbalanced dataset
but poorly on undersampled and oversampled datasets, which could be attributed to
a few factors. Firstly, when undersampling is applied, important information might
be lost as instances from the majority class are removed, leading to underfitting.
Secondly, oversampling techniques, especially when synthetic instances are gener-
ated, can introduce noise or artificial patterns that do not represent the underlying
relationship between features and the target variable, causing the model to overfit
the synthetic data.
In contrast, models might perform better on the original imbalanced dataset if
they can successfully learn the patterns and relationships in the data, despite the
class imbalance. In such cases, it is essential to consider alternative techniques,
such as cost-sensitive learning or ensemble methods, to handle imbalanced datasets
effectively without compromising model performance.
162 B. D. Barouti and S. Kadry
Overall, when comparing all models with data variations, Ridge Classifier models
perform better than all other models for “cause” on the original imbalanced dataset
with an F1-score of 0.334. In contrast, the Extratree Classifier models perform best
than all other models on the original imbalanced dataset having an F1-score of
0.9448554 for our majority class” (“accident severity level”). The above results
demonstrate that it is possible to use high-performance machine learning to predict
accident severity levels even with an imbalanced dataset, which is common when the
datasets are obtained from real-life sources. It also strongly highlights the necessity,
in the context of data related to safety and incidents, to implement strict policies for
recording the information required to apply machine learning for predicting incidents.
Predicting causes will allow organizations to implement such models, as studied
above, to prevent the occurrence of incidents by targeting the causes and removing
the conditions for the incident to happen.
5.6 Conclusion
accurately. The sampling data did not consistently improve model performance, indi-
cating that the technique used may not have been optimal for the dataset in question
or that the models may require more sophisticated sampling strategies. It’s crucial
to consider these findings within the context of the data and problem domain, and
further model validation is recommended to ensure robustness and generalization of
the results.
The study also highlighted which types of models, ExtraTree Classifier and Ridge
Classifier, have the best performance for, respectively, the majority class in the imbal-
anced dataset and a minority class in an imbalanced dataset. Classifiers performed
better than Neural Network and Deep Neural Network in the study context. Given
that those models are reasonably easy to implement in Production, it should help
pave the way for wider adoption of machine learning models to improve the safety
of the personnel working in the wind industry. The present study demonstrates that
machine learning models selection and implementation can be implemented widely
in the wind industry. It also shows that the high performance of selected models can
prove the reliability of the expected predictions and therefore be an effective tool for
decision-making when taking measures to improve health and safety.
Few studies look at applying machine learning to safety indicators in the wind
business, which is the key gap in the current literature. Existing research has dealt
chiefly with establishing generic predictive models for wind turbines or predicting
or detecting particular occurrences. As a result, additional study is required to build
individualized machine learning models that may be used to enhance safety metrics in
the wind business. There is also a shortage of studies that combine information from
many sources to enhance safety measures, which is a significant research gap. Most
previous research has concentrated on collecting data from sensors or maintenance
records, but additional information, such as weather data, is needed to produce more
all-encompassing safety metrics. Research on using machine learning models for
safety metrics in the wind sector is also needed. There is a need to examine the
efficacy of these models in real-world contexts since much of the previous research
has concentrated on constructing models in laboratory settings or utilizing simulated
data. In sum, this research intends to fill a need in the existing literature by providing
a plan for using machine learning to improve wind sector safety measures. The
proposed system will use data from a wide variety of sources and will be tested in
real-world scenarios to see how well it performs.
References
1. Adekunle, S.A. et al.: Machine learning algorithm application in the construction industry—a
review. Lecture Notes in Civil Engineering, pp. 263–271 (2023). https://doi.org/10.1007/978-
3-031-35399-4_21
2. Alcides, J., et al.: Making the links among environmental protection, process safety, and industry
4.0. en. Process. Saf. Environ. Prot. 117, 372–382 (2018). https://doi.org/10.1016/j.psep.2018.
05.017
164 B. D. Barouti and S. Kadry
3. Bagherian, M.A. et al.: Classification and analysis of optimization techniques for integrated
energy systems utilizing renewable energy sources: a review for CHP and CCHP systems.
Processes 9(2), 339 (2021)
4. Borg, M. et al.: Safely entering the deep: a review of verification and validation for machine
learning and a challenge elicitation in the automotive industry. (2018)
5. Bowles, M.: What is offshore life really like? en. In: Quanta part of QCS Staffing 17. Accessed
11 Jul 2022. http://www.quanta-cs.com/blogs/2018-7/what-is-offshorelife-really-like
6. Gangwani, D., Gangwani, P.: Applications of machine learning and Artificial Intelligence in
intelligent transportation system: A review. Lecture notes in electrical engineering, pp. 203–216
(2021). https://doi.org/10.1007/978-981-16-3067-5_16
7. Goodfellow, I., Bengio, Y., Courville, A.: Deep learning. en. MIT Press, (2016)
8. Gplusoffshorewindcom.: Health and safety statistics. en. (2022). Available at https://www.gpl
usoffshorewind.com/work-rogramme/workstreams/statistics
9. Herrera, I.A.: Proactive safety performance indicators. (2012)
10. Ims, J.B.: Risk-based health-aware control of Åsgard subsea gas compression station. en.
Master’s thesis, NTNU (2018)
11. Irawan, C.A. et al.: Optimization of maintenance routing and scheduling for offshore wind
farms. en. Eur. J. Oper. Res 256(1), 76–89 (2017). https://doi.org/10.1016/j.ejor.2016.05.059
12. Jaen-Cuellar, A.Y. et al.: Advances in fault condition monitoring for solar photovoltaic and
wind turbine energy generation: A review. en. Energies 15 (15), 5404 (2022)
13. Jordan, M.I., Mitchell, T.M.: Machine learning: Trends, perspectives, and prospects. en. Science
349 (6245), 255–260 (2015)
14. Le Coze, J.-C., Antonsen, S.: Safety in a digital age: Old and new problems—algorithms,
machine learning, Big Data and artificial intelligence. In: Safety in the digital age, pp. 1–9.
https://doi.org/10.1007/978-3-031-32633-2_1
15. Li, Y. et al.: Wind turbine fault diagnosis based on transfer learning and convolutional
autoencoder with small-scale data. Renew Energy 171. https://doi.org/10.1016/j.renene.2021.
01.143
16. Lian, J. et al.: Health monitoring and safety evaluation of the offshore wind turbine structure:
a review and discussion of future development. en. Sustain. 11(2), 494 (2019)
17. Luo, T.: Safety climate: Current status of the research and future prospects. J. Saf. Sci. Resil.
1(2), 106–119 (2020). ISSN: 2666–4496. https://doi.org/10.1016/j.jnlssr.2020.09.001. https://
www.sciencedirect.com/science/article/pii/S2666449620300268
18. Maldonado-Correa, J. et al.: Using SCADA data for wind turbine condition monitoring: A
systematic literature review. en. Energies 13(12), 3132 (2020)
19. Mangortey, E. et al.: Application of machine learning techniques to parameter selection for
flight risk identification. pt. In: AIAA Scitech 2020 Forum, p. 1850 (2020)
20. Mills, T., Turner, M., Pettinger, C.: Advancing predictive indicators to prevent construction
accidents. en. In: Towards better safety, health, well-being, and life in construction. Central
University of Technology, Free State, pp. 459–466 (2017)
21. Mitchell, D. et al.: A review: Challenges and opportunities for artificial intelligence and robotics
in the offshore wind sector. en. Energy and AI, 100146 (2022)
22. Olguin, E.J. et al.: Microalgae-based biorefineries: Challenges and future trends to produce
carbohydrate enriched biomass, high-added value products and bioactive compounds. en.
Biology 11(8)
23. Papadopoulos, P., Coit, D.W., Ezzat, A.A.: Seizing opportunity: maintenance optimization in
offshore wind farms considering accessibility, production, and crew dispatch. en. IEEE Trans.
Sustain. Energy 13(1), 111–121 (2022). https://doi.org/10.1109/TSTE.2021.3104982
24. Ren, Z. et al.: Offshore wind turbine operations and maintenance: A state-of-the-art review.
en. Renew. Sustain. Energy Rev. 144, 110886 (2021)
25. Surucu, O., Gadsden, S., Yawney, J.: Condition monitoring using machine learning: A review
of theory, applications, and recent advances. Expert Syst. Appl. 221, 119738 (2023). https://
doi.org/10.1016/j.eswa.2023.119738
5 Application of Machine Learning to Improve Safety in the Wind Industry 165
26. Tamascelli, N. et al.: Learning from major accidents: A machine learning approach. Comput
Chem Eng 162, 107786 (2022). ISSN: 0098–1354. https://doi.org/10.1016/j.compchemeng.
2022.107786. https://www.sciencedirect.com/science/article/pii/S0098135422001272
27. Taherdoost, H.: Deep learning and neural networks: Decision-making implications. Symmetry
15(9), 1723 (2023). https://doi.org/10.3390/sym15091723
28. Tixier, A.J.P., et al.: Application of machine learning to construction injury prediction. en.
Autom. Constr. 69, 102–114 (2016)
29. Wang, L., Zhang, Z.: Automatic detection of wind turbine blade surface cracks based on UAV-
taken images. In: IEEE Transactions on Industrial Electronics, vol. 64, no.9, pp. 7293–7303
(2017)
30. Wolsink, M.: Co-production in distributed generation: renewable energy and creating space for
fitting infrastructure within landscapes. en. Landsc Res 43(4), 542–561 (2018)
31. Xu, Z., Saleh, J.H.: Machine learning for reliability engineering and safety applications: review
of current status and future opportunities. (2020). ArXiv, abs/2008.08221
32. Yan, J.: Integrated smart sensor networks with adaptive real-time modeling capabilities. en.
(Doctoral dissertation, Iowa State University). (2020)
33. Yeter, B., Garbatov, Y., Soares, C.G.: Life-extension classification of offshore wind assets using
unsupervised machine learning. en. Reliab Eng Syst Saf 219, 108229 (2022)
34. Yuan, B. et al.: WaveletFCNN: A deep time series classification model for wind turbine blade
icing detection, (2019)
35. Zhu, Y., Liu, X.: A lightweight CNN for wind turbine blade defect detection based on spec-
trograms. Machines 11(1), (2023). ISSN: 2075–1702. https://doi.org/10.3390/machines1101
0099. https://www.mdpi.com/2075-1702/11/1/99
36. Zulu, M.L.T., Carpanen, R.P., Tiako, R.: A comprehensive review: study of artificial intelligence
optimization technique applications in a hybrid microgrid at times of fault outbreaks. Energies
16(4), (2023)
Chapter 6
Malware Attack Detection in Vehicle
Cyber Physical System for Planning
and Control Using Deep Learning
Abstract Cyber-Physical Systems (CPS), which comprise smart health, smart trans-
portation, smart grids, etc., are designed to turn traditionally separated automated
critical infrastructure into modernized linked intelligent systems by interconnecting
human, system, and physical resources. CPS is also expected to have a significant
positive impact on the economy and society. Complexity, dynamic variability, and
heterogeneity are the features of CPS, which are produced as an outcome of rela-
tionships between cyber and physical subsystems. In addition to the established and
crucial safety and reliability criteria for conventional critical systems, these features
create major obstacles. Within these cyber-physical systems and crucial infrastruc-
tures, for instance, connected autonomous vehicles (CAVs) may be considered. By
2025, it is anticipated that 95 per cent of new vehicles will be equipped with vehicle to
vehicle (V2V), vehicle to infrastructure (V2I), and other telecommunications capabil-
ities. To prevent CAVs on the road against unintended or harmful intrusion, innovative
and automated procedures are required to ensure public safety. In addition, large-
scale and complicated CPSs make it difficult to monitor and identify cyber physical
threats. Solutions for CPS have included the use of Artificial Intelligence (AI) and
Machine Learning (ML) techniques, which have proven successful in a wide range
of other domains like automation, robotics, prediction, etc. This research suggests a
Deep Learning (DL) -based Convolutional Neural Network (CNN) model for attack
detection and evaluates it using the most recent V2X dataset, According to the simu-
lation results, in this research CNN exhibits superior performance compared to the
most advanced ML approaches such as Random Forest (RF), Adaptive Boosting
(AdaBoost), Gradient Boosting (GBoost), Bagging and Extreme Gradient Boosting
C. R. Kishore (B)
Department of Computer Science and Engineering, Aditya Institute of Technology and
Management (AITAM), Tekkali, Andhra Pradesh 532201, India
e-mail: [email protected]
H. S. Behera
Department of Information Technology, Veer Surendra Sai University of Technology, Burla,
Odisha 768018, India
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 167
J. Nayak et al. (eds.), Machine Learning for Cyber Physical System: Advances
and Challenges, Intelligent Systems Reference Library 60,
https://doi.org/10.1007/978-3-031-54038-7_6
168 C. R. Kishore and H. S. Behera
6.1 Introduction
and RSUs is notably inconsistent [9]. The use of encryption technology in stan-
dalone is inadequate in ensuring the authenticity of messages and lacks the ability to
protect against numerous kinds of potential intruders. Conventional intrusion detec-
tion methods that depend on ML, statistical evaluation have some limits in effi-
ciently handling the continuously growing amount of data. DL methodologies include
essential capabilities that make them appropriate for overcoming these challenges.
6.1.1 Motivation
Emerging technologies in the area of Vehicles Cyber Physical System (VCPS) have
been utilized, which is speeding up the evolution of the IoV. The vehicle reveals a
lack of emphasis on network security which leads to enforces restrictions on storage
space, functions in a complex application environment, and dependent on numerous
dispersed nodes and sensor networks. As a result, challenging safety regulations
are necessary. These above limitations make the VCPS environment more vulner-
able to cyber-attacks, which in turn threaten the whole IoV ecosystem. The primary
challenges that must be resolved are as follows:
• The IoV performs the functions of attacking behavior, monitoring and analyzing
network data, categorizing normal and abnormal behavior, and identifying unusual
activities like threats on the network. This technology has emerged as a key
component in the defense of the IoV network.
• Current prominent research is focusing on integrating ML algorithms with more
conventional Intrusion Detection Systems (IDS). The massive amount of time
required for training ML-based intrusion detection algorithms is a key issue. This
is because large amounts of previous network data must be analyzed.
• For analyzing recent complicated VCPS network data, particularly in the complex
vehicle networking environment, DL technology in the VCPS environment is
170 C. R. Kishore and H. S. Behera
An increasing variety of attacks and various attacker types provide major challenges
for research in the areas of misbehavior and intrusion detection in VCPS network.
The high fluctuations in vehicle network architecture have a significant influence
on network, routing, and security factors. In this study, the use of DL methods to
the classification problems of malware attacks on the vehicular network has been
done. The CPS based model is an architectural framework that, when combined with
ubiquitous sensors and technologies for communication offers several advantages
to the ITS and its operations. After receiving signals from a vehicle, the outermost
computing devices on an RSU deploy the DL algorithms for protecting ITS commu-
nications in a significant way. Therefore, this research proposes a CNN model that
can effectively address those challenges. The proposed approach employs the excep-
tional learning capabilities of complex CNNs for analyzing malware samples. This
methodology also demonstrates better effectiveness in terms of both accuracy and
the time required for detecting new types of malwares. Additionally, the model has
an outstanding ability to accurately identify various malware types.
This study highlights many significant contributions, including:
1. DL-based CNN technique has been suggested as a smart IDS security system
for VCPS network.
2. The proposed intelligent IDS model employs the averaging strategy for feature
selection in order to enhance the performance of the IDS. This model intends
to investigate the features and attacks inside VCPS network for the purposes of
vehicular network monitoring.
3. The attack detection and accuracy probability of the suggested intelligent IDS
model has been enhanced in relation to the F1-Score for vehicular network traffic
based on VCPS.
4. The evaluation of the suggested intelligent IDS model is performed employing a
variety of performance standards. The effectiveness of proposed intelligent IDS
approach is evaluated by comparing it with many cutting-edge ensemble ML
algorithms, including RF, AdaBoost, GBoost, Bagging, and XGBoost, specifi-
cally in the context of VCPS. The approaches those were suggested demonstrate
superior performance compared to conventional methodologies when evaluated
on the VCPS-based vehicle V2X_train dataset.
6 Malware Attack Detection in Vehicle Cyber Physical System … 171
This article is structured in the following approach. Section 6.2 outlines the research
investigations that have been conducted on the analysis of network traffic in IOVs
using ML and DL techniques for the purpose of detecting malware. Additionally, it
discusses the many contemporary models of AI-based IDSs that have been devel-
oped specifically for the IoV. Sections 6.3 and 6.4 include a mathematical modeling
of DL-based CNN approaches as well as an overview of the experimental setup.
Furthermore, these sections provide details on the dataset used including precise
information pertaining to the classes and instances. The results of the suggested
methodology’s analysis in comparison with other models are discussed in Sect. 6.5.
The critical discussion is further discussed in Sect. 6.6, while the conclusion of the
study is discussed in Sect. 6.7.
a dataset concerning Android applications and their permission access. The highest
level of accuracy is achieved with the use of the KNN algorithm with an accuracy
rate of 96%. Additionally, the SVM algorithm achieved an accuracy rate of 93%.
Kan et al. [14] introduced a novel approach for detecting lightweight PC
malware, intended to address the computational time complexity associated with DL
approaches. The fundamental design of this model is based on the CNN approach.
This method has the capability to autonomously acquire features from the provided
input, which is presented as a series of grouped instructions. An accuracy rate 95% has
been achieved on a dataset consisting of 70,000 data points. Alzaylaee et al. [15] intro-
duced DL-Droid, a malware detection model that employs DL techniques. Malicious
Android apps are detected by the use of input generations in the process of dynamic
analysis. The collection consists of 30,000 instances, including both malware and
benign activities. Furthermore, experimental procedures comprise the implementa-
tion of hybrid attributes, which are formed by combining dynamic and static features.
With respect to the dynamic attributes, the model exhibited a maximum accuracy of
97.8%. In contrast, the hybrid exhibits an impressive accuracy rate of 99.6%.
Xu et al. [16] introduced a malware detection framework that employs a Graph
Neural Network (GNN). This research involves the conversion of the graph structure
of the Android application into vectors, followed by the classification of malware
families using a model. A malware detection accuracy of 99.6% has been achieved,
while a classification accuracy of 98.7% has been reached. Gao et al. [17] established
a model called GDroid, which utilizes a Graph Convolutional Network (GCN) for the
purpose of classifying malware. This study intended to provide a graphical illustration
of the interconnections between the various components of the Android platform by
way of a heterogeneous graph. There were less than one percent of false positives
and the accuracy is 98.99%. Table 6.1 highlights more research on IoV systems that
use IDS for the detection of malware attacks.
Based on the Table 6.1 shown in the literature review section, it can be concluded
that the efficacy of an IoV-IDS enabled by AI is mostly dependent upon the utilization
of a suitable dataset for training purposes. ML models could have been developed
with only a certain amount of data in order to get improved results. When dealing
with bigger datasets, ML model may not be appropriate unless the data is auto-
matically labeled. Due to the high costs and time requirements associated with the
process of labeling, DL algorithms are seen more advantageous for handling bigger
datasets. These methodologies aim to identify and extract valuable patterns from raw
datasets. To enhance the effectiveness of VCPS-IDS in anomaly detection, it is essen-
tial to consistently update it with newly acquired data derived from the monitoring
of network traffic. The utilization of extensive datasets and the complex architecture
of DL algorithms will result in a more demanding learning process in terms of time
frame and computational resources. There seems to be a tradeoff between model
complexity and the level of structure achieved by DL methods. The more in-depth
the approach, the more complex the model, and the more time and resources will be
needed to solve the problem. As a result, this drawback will be eventually resolved
by the intelligent selection of important characteristics for model training.
Table 6.1 Summary on related studies
Primary AI approach Smart model Comparison methods Dataset used Attack type detection Performance Year References
measures
ML FS-Stacking KNN, SVM, DT, RF, CICIDS2017 DoS, Bruteforce, Acc: 99.82% 2019 [10]
ET, XGBoost, portscan, Botnet, Web F1-Score: 0.997
Stacking attack
ML RF NB, LR, SVM, NSL-KDD Infiltration Attack, Acc: 99.95% 2019 [18]
GBDT, XGBoost UNSW-NB15 DDoS
ML K-mean SVM HCRL RPM, Fuzzy, DoS Acc: 99.98% 2021 [19]
DL CNN No Synthetic dataset Not Mentioned specific Acc: 99.87% 2019 [20]
(experimental) type
DL DCNN CNN, NN, SVM Synthetic dataset DDoS Acc: 100% 2020 [21]
(experimental)
DL DCNN DT, NB, KNN, ANN, Synthetic dataset Fuzzy, DoS, RPM, F1-Score: 99.95%: 2020 [22]
SVM, LSTM (experimental) GEAR
DL LSTM NB, SVM, ANN, DT UNSW-NB15 car RPM, GEAR, Fuzzy, Acc: 98.00% 2020 [23]
6 Malware Attack Detection in Vehicle Cyber Physical System …
The review of the existing literature reveals that many researchers have developed
various methodologies, encompassing statistical and ML techniques, to enhance the
effectiveness of malware detection strategies in the IoV. However, these approaches
exhibit certain limitations. For instance, statistical methods struggle to adapt to
the dynamic nature of IoV, posing challenges in defining appropriate evaluation
threshold values. Moreover, non-parametric statistical techniques are not well suited
for real-time applications due to their computational complexity. ML algorithms
including DT, LR, ANN, KNN, and RF have considered for malware detection. In
sensitive domains demanding high accuracy and performance, such as IoV, alter-
native solutions may be more promising than ML deployment. These algorithms
encounter difficulties when dealing with complex and extensive datasets, resulting
in processing slowdowns and limitations in effectively anticipating novel anomalous
patterns. Therefore, it is the necessity to create a DL-based CNN model capable of
handling substantial datasets. The objective of this study is to provide a potential
solution for detecting malware in the context of IoV. CNNs offer the capability to
identify anomalies in sensor data, enabling the detection of deviations from antic-
ipated patterns. This ability holds significant value for applications related to fault
diagnosis, security, and safety within the IoV domain.
6.3 Methodologies
This section outlines an in-depth explanation of the basic concepts behind ensemble
learning approaches and explains the architectural development of the proposed CNN
model.
6.3.1 RF
Once the training process for all DT is completed, these trees are then used to provide
predictions. In the context of classification problems, each tree within the ensemble
contributes a “vote” towards a certain class, and the class that receives the highest
number of votes is then designated as the predicted class. In regression tasks, the
final prediction is obtained by averaging the predictions of all trees. RF functionality
is described by Eq. (6.1).
∧
Y = mode(f1 (x), f2 (x), . . . , fn (x)) (6.1)
∧
where, Y = Final prediction of RF, fn (x) = Prediction of nth DT
6.3.2 AdaBoost
∧ K
Y = sign αk .hk (x) (6.2)
k=1
where, αk = 21 ln 1−ε
εk
k
weight importance of k-th weak learner, hk (x) = prediction
of kth weak learner with input x, εk = weight error of weak learner.
6.3.3 GBoost
Gradient Boosting is a very effective ML methodology used for the purposes of both
classification and regression assignments. The algorithm in concern is an example
of the ensemble learning category, similar to AdaBoost. However, it constructs a
robust predictive model by using a distinct approach to aggregating the predictions
of weak learners, often DT. The fundamental concept behind Gradient Boosting is the
sequential construction of a robust model by repeated emphasis on the errors caused
by preceding models. Every subsequent weak learner is taught with the objective
of addressing the errors made by the ensemble up to that particular point in time.
176 C. R. Kishore and H. S. Behera
∧ K
Y = η.hk (x) (6.3)
k=1
where, hk (x) = prediction of kth weak learner with input x, η = learning rate hyper
parameter controlling the step size during each update.
6.3.4 Bagging
∧
Y (x) = mode(C1 (x), C2 (x), ........., Cn (x)) (6.4)
∧
where, Y (x) = Final prediction of RF, Cn (x) = Prediction of nth DT.
6.3.5 XGBoost
XGBoost, also known as Extreme Gradient Boosting is a ML method that has excel-
lent efficiency and scalability. It is often used for problems involving classification
and regression. XGBoost is a variant of the GBoost technique that is renowned for
its computational efficiency, high predictive accuracy, and adeptness in managing
complex structured datasets. XGBoost was invented by Tianqi Chen and has gained
significant popularity in both ML contests and practical domains. XGBoost mostly
employs DT as its weak learners, while it is also capable of supporting several kinds
of base models. The depth of the trees is limited and regulated by hyper parameters
such as maximum depth, minimum weight, and minimum leaf weight. The XGBoost
algorithm incorporates L1 (Lasso) and L2 (Ridge) regularization terms in order to
manage the complexity of the model and reduce the risk of over fitting. The XGBoost
6 Malware Attack Detection in Vehicle Cyber Physical System … 177
N
T
Obj(θ ) = L(yi , pi ) + (fk ) (6.5)
i=1 k=1
where, L(yi , pi ) = Loss function with yi , pi denoting actual target value and predicted
value from weak learner respectively, (fk ) = regularization term for kth trees.
6.3.6 CNN
The CNN model, additionally referred to as the convolutional neural network model,
was introduced by Lecun in 1998. This particular model belongs to the category
of feed-forward neural networks, which has shown superior performance in the
domains of Natural Language Processing (NLP), larger complex dataset and image
processing. The use of local perception and CNN weight sharing has the potential
to significantly reduce the number of parameters, allowing for the projection of a
diverse range of characteristics via the DL process, which in response enhances the
accuracy of the learning model. Convolution-layer, pooling-layer, and at last fully-
connection-layer constitute the main component of this CNN model. The compu-
tations at each convolutional layer are made up of a unique convolutional kernel.
The data characteristics were recovered after the convolutional operation performed
by each convolutional layer. However, it seems that the extracted features have very
large dimensions. A max pooling layer was attached after this convolutional layer
to deal with this complexity and reduce the network’s training cost. Therefore, the
features’ dimensions are constrained by this layer. The last component of the neural
network architecture is the fully connected layer, which plays a crucial role in linking
the features obtained and determining the classification outcomes using the neural
network classifier. The framework of the CNN model is shown in Fig. 6.2.
Table 6.2 explains that the traditional ML techniques have some limits due to chal-
lenges in extracting accurate features, such as the curse of dimensionality, computing
constraints, and the need for domain expertise. Deep neural networks are a specific
kind of machine learning technique that uses several layers of networks. In addi-
tion, deep learning addresses the issue of representation by constructing several
elementary features to capture a sophisticated idea. As the quantity of training data
increases, the effectiveness of the deep learning classifier intensifies. Deep learning
models address intricate issues by using sophisticated and expansive models, aided
by hardware acceleration to save computational time.
178 C. R. Kishore and H. S. Behera
The primary goal of this study is to create a network IDS to detect attacks involving
malware in vehicle networks. Several virus attacks could have been launched on
automotive networks by cyber-assailants using wireless interfaces. Therefore, it is
important to implement the suggested IDS in both private and public transit networks.
The proposed IDS have the potential to effectively recognize abnormal signals inside
internal vehicle networks, thus generating warnings. This is achieved by the IDS
being integrated into the Controller Area Network (CAN-bus) system. The gateways
on external networks could be equipped with the suggested IDS to detect and discard
any malicious packets intended for the vehicles. This research introduces unique
IDS based on CNN for the purpose of detecting different forms of malware infec-
tions in VCPS systems. Figure 6.3 depicts the layer architecture of proposed model.
Figure 6.4 represents a detailed overview of the proposed IDS structure.
The deep architecture of CNN is used for intrusion detection is composed of four
important layers (2 convolution layers and 2 pooling layers). The network consists of
two convolutional layers that train 128 and 256 convolution kernels respectively with
a size of 5 × 5. The deep design incorporates a fully connected layer which includes
the use of two individual neurons for the purpose of classification. Two pooling
layers are used to implement average pooling with a factor of 2. The challenge
of intrusion detection could possibly be seen as a classification task; therefore the
sigmoid function has been integrated into the deep architecture. Table 6.3 represents
the parameter setup for the proposed model. The algorithm of proposed framework
represented in Table 6.4.
The suggested architecture is shown in Fig. 6.4, and its specific steps are as
follows:
1. The current research employed the real-time IoV V2X dataset. The dataset is
used for the purpose of investigating many characteristics, including the source
vehicle address, destination vehicle address, types of network services, network
connection status, message type, and duration of connections.
6 Malware Attack Detection in Vehicle Cyber Physical System … 179
2. Based on the specified data processing techniques, the data undergo a series of
procedures including preprocessing, handling missing values, numericalization,
normalization, and oversampling.
3. After the preprocessing stage, the data is divided into training and validation
sets, with a ratio of 80% for training and 20% for validation, relative to the whole
dataset.
4. During this training phase, all ensemble techniques such as RF, AdaBoost,
GBoost, XGBoost and Bagging are learned using training data.
5. For proposed CNN, the processed training data is sent to the convolution layer in
order to extract features, which are then outputted by a two-dimensional convo-
lution operation. In order to decrease feature dimensions, expedite convergence,
and mitigate the risk of network over fitting, a pooling layer is used alongside
each convolution layer. This pooling layer serves to eliminate redundant features.
Subsequently, the whole of local features are combined via a fully connected layer
to provide an extensive feature. Ultimately, the leaky rectified linear unit (ReLU)
activation function is used in the hidden layer. The sigmoid activation function
is often used in the output layer for classification purposes.
6. Following the completion of training on all of the models that are being consid-
ered, the test samples are used in order to assess the effectiveness of each model.
Accuracy, precision, recall, F1-score, and ROC-AUC were some of the perfor-
mance measures that were used in the evaluating the performance of each of the
models.
This section provides a comprehensive description of the dataset, features, data prepa-
ration technique, simulation settings, and performance metrics for both the proposed
ML algorithms and other associated algorithms.
182 C. R. Kishore and H. S. Behera
Imputing missing values in IoV data requires careful consideration, as this data often
includes a variety of data types, such as numerical and categorical variables, and
may have specific characteristics related to vehicular and sensor data. In this study
the missing values handled by using the mean, median, or mode imputation.
For the analysis of IoV network data, employing a label encoding technique is imper-
ative to convert categorical variables into numeric formats. This is essential due to the
heterogeneous nature of IoV network traffic, which encompasses both numeric and
184 C. R. Kishore and H. S. Behera
categorical attributes requiring conversion for analysis and processing. The objective
for this is because the suggested detection method has a high level of efficiency in
handling numerical characteristics.
6.4.2.3 Normalization
The IoV network is equipped with a range of electronic sensors, which operate both
autonomously and in conjunction with human actions. These sensors play a critical
role in collecting and transmitting real-time data within the IoV system. However, the
data generated by these sensors vary significantly in magnitude. To facilitate pattern
analysis, enhance convergence, and reduce training time, the proposed detection
system utilizes the Min–Max normalization technique.
6.4.2.4 Oversampling
The research has been carried out by using the Python notebook on GPU servers
provided by Google Colab, utilizing the Keras and TensorFlow frameworks. In this
experimental study, the hardware configuration consisted of an Intel Core i7 CPU
operating at a frequency of 2.20 GHz, 16 GB of random-access memory (RAM), the
Windows 10 operating system (64-bit), and an NVIDIA GeForce GTX 1050 graphics
processing unit (GPU). Several software packages in the PYTHON programming
language, including Imblearn, Pandas and Numpy packages are used for the purpose
of conducting additional data analysis. Furthermore, the visualization of data is
facilitated by including Matplotlib and Mlxtend. Additionally, the analysis of the
data is conducted using the Sklearn framework. The Keras and TensorFlow libraries
were used in this study, with Keras being a library specifically designed for neural
networks. In comparison, TensorFlow is an open-source framework for ML that can
be used for a wide range of applications.
6 Malware Attack Detection in Vehicle Cyber Physical System … 185
where, “true positive” (TP) refers to the number of requests accurately identified as
having harmful behaviors and “false positive” (FP) refers to the number of applica-
tions incorrectly identified as normal. By contrast, true negative (TN) refers to the
number of apps that are correctly labeled as normal, while false negative (FN) refers
to the number of applications that are incorrectly labelled as malware. Generally, a
greater level of precision, accuracy, and recall corresponds to an enhanced identifying
outcome. The effectiveness of the identification strategy may be better explained by
the higher F1 score, which combines the outcomes of precision and recall.
This study aims to evaluate the entire effectiveness of the specified models. The anal-
ysis originates with examining advanced ML metrics and concludes by explaining
the performance of the DL based CNN model. A wide variety of evaluation measures,
such as recall, accuracy, precision, ROC-AUC and F1-score are used to illustrate the
results. The accuracy obtained represents a metric for the measurement of the overall
performance of the suggested approach. In addition; more emphasis has given on the
use of the F1-Score metric across all methodologies due to its ability to facilitate
the attainment of harmonized precision-recall equilibrium. The research exhibits a
non-uniform and highly uneven distribution of class labels. The F1-Score is a rele-
vant metric to appropriately evaluate performance. Table 6.5 provides a detailed
description of the parameters used during the training of the other ML models.
Table 6.6 presents a comparative analysis of advanced ensemble learning
methodologies, including RF, AdaBoost, GBoost, XGBoost, Bagging, and the
suggested CNN. Both GBoost and XGBoost algorithms provide superior perfor-
mance compared to other advanced ensemble learning algorithms, as demonstrated
by their outstanding F1-Score of 98.95%. In contrast, RF technique has a suboptimal
186 C. R. Kishore and H. S. Behera
F1-Score. The CNN model under consideration has outstanding outcomes metrics
including an accuracy rate of 99.64%, precision of 100%, recall of 99.30%, F1-score
of 99.64%, and ROC AUC of 99.65%.
The ROC graph is frequently developed in order for evaluating the performance
of the classification. The ROC curve is defined by the representation of the sensitivity
test on the y-axis and the 1—false positive rate (or specificity) on the x-axis. The
evaluation of classifier performance is often regarded as an efficient procedure. In
general, when the area under the receiver operating characteristic (ROC) curve is
0.5, it indicates a lack of classification. This suggests that the classifier’s capacity to
accurately identify intrusions based on the detection of attack existence or absence is
dependent on the specific circumstances employed. The range including values from
0.7 to 0.8 is often referred to as the acceptable range, while the range spanning from
0.8 to 0.9 is typically labelled as the good range. Performance over 0.9 is generally
regarded as outstanding. Figure 6.5 illustrates the area under the receiver operating
characteristic curve (AUC-ROC) study for several advanced ensemble learning and
DL based CNN model. In this work, it can be observed that DL approach exhibits
significant dominance over advanced ensemble learning techniques.
This study presents the confusion matrix of several classifiers, including XGBoost,
RF, Bagging, AdaBoost, GBoost, and CNN. The employment of a confusion
matrix was performed to evaluation the effectiveness of the classification algorithms
employed. The confusion matrix for binary classification is shown in Fig. 6.6. The
Table 6.6 Comparison analysis of proposed method with other considered approaches
Evaluation RF AdaBoost GBoost XGBoost Bagging Proposed
metrics CNN
Accuracy (%) 84.34 96.08 98.93 98.93 98.92 99.64
Precision (%) 82.35 92.85 98.61 98.61 98.60 100
Recall (%) 88.11 100 99.30 99.30 99.28 99.30
F1-measure 85.13 96.29 98.95 98.95 99.93 99.64
(%)
ROC-AUC 84.27 96.01 98.92 98.92 98.90 99.65
(%)
6 Malware Attack Detection in Vehicle Cyber Physical System … 187
CNN classifier, when applied to the V2X dataset, correctly categorised142 instances
as assaults. In a comparable way, a total of 138 labels categorized as normal were
correctly whereas one instance of attacks were incorrectly classed as normal. The
experimental results indicate that the CNN demonstrated effective classification
ability.
Figure 6.7 presents a comparative analysis of the advanced ensemble learning
and DL-based CNN technique, with the accuracy measure being used for evalua-
tion. Figure 6.8 illustrate the metrics of precision, recall, F1 score, and ROC-AUC.
The proposed CNN technique exhibits superior performance compared to existing
techniques, proving itself as a very effective classifier.
Fig. 6.6 Analysis of confusion matrix for a RF, b AdaBoost, c GBoost, d XGBoost, e Bagging,
f Proposed CNN
190 C. R. Kishore and H. S. Behera
also be discussed, with a focus on the adaptation of model hyper parameters for
optimization in further research.
6 Malware Attack Detection in Vehicle Cyber Physical System … 191
References
1. Chen, Z., Boyi, W., Lichen, Z.: Research on cyber-physical systems based on software defini-
tion. In: Proceedings of the IEEE 12th International Conference on Software Engineering and
Service Science (ICSESS) (2021)
2. Alam, K.M., Saini, M., Saddik, A.E.: Toward social internet of vehicles: concept, architecture,
and applications. IEEE Access 3, 343–357 (2015)
3. Piran, M.J., Murthy, G.R., Babu, G.P.: Vehicular ad hoc and sensor networks; principles and
challenges. Int. J Ad hoc Sensor Ubiquit. Comput. 2(2), 38–49
4. Prakash, R., Malviya, H., Naudiyal, A., Singh, R., Gehlot, A.: An approach to inter-vehicle
and vehicle-to-roadside communication for safety measures. In: Intelligent Communication,
Control and Devices, 624. Advances in Intelligent Systems and Computing (2018)
5. Kumar, S., Dohare, U., Kumar, K., Dora, D.P., Qureshi, K.N., Kharel, R.: Cybersecurity
measures for geocasting in vehicular cyber physical system environments. IEEE Internet Things
J. 6(4), 5916–5926 (2018)
6. https://www.av-test.org/en/statistics/malware/. Accessed 11 Nov 2023
7. Lv, Z., Lloret, J., Song, H.: Guest editorial software defined Internet of vehicles. IEEE Trans.
Intell. Transp. Syst. 22, 3504–3510 (2021)
8. Maleh, Y., Ezzati, A., Qasmaoui, Y., Mbida, M.: A global hybrid intrusion detection system
for wireless sensor networks. Proc. Comput. Sci. 52(1), 1047–1052 (2015)
9. Kaiwartya, O., Abdullah, A.H., Cao, Y., Altameem, A., Prasad, M., Lin, C.-T., Liu, X.: Internet
of vehicles: motivation, layered architecture, network model, challenges, and future aspects.
IEEE Access 4, 5356–5373 (2016)
10. Yang, L., Moubayed, A., Hamieh, I., Shami, A.: Tree-based intelligent intrusion detec-
tion system in internet of vehicles. In: 2019 IEEE Global Communications Conference
(GLOBECOM), pp. 1–6 (2019)
11. Ullah, S., Khan, M., Ahmad, J., Jamal, S., Huma, Z., Hassan, M., Pitropakis, N., Buchanan,
W.: HDL-IDS: a hybrid deep learning architecture for intrusion detection in the Internet of
Vehicles. Sensors 22(4), 1340 (2022)
12. Firdausi, I., Lim, C., Erwin, A., Nugroho, A.: Analysis of machine learning techniques used
in behavior-based malware detection. In: Proceedings of the International Conference on
Advances in Computing, Control and Telecommunication Technologies, Jakarta, Indonesia,
2–3 December 2010
192 C. R. Kishore and H. S. Behera
13. Rana, J.S., Gudla, C., Sung, A.H.: Evaluating machine learning models for android malware
detection: a comparison study. In: Proceedings of the 2018 VII International Conference on
Network, Communication and Computing, New York, NY, USA, 14–16 December 2018
14. Kan, Z., Wang, H., Xu, G., Guo, Y., Chen, X.: Towards light-weight deep learning based
malware detection. In: Proceedings of the IEEE 42nd Annual Computer Software and
Applications Conference (COMPSAC), Tokyo, Japan, 23–27 July 2018
15. Alzaylaee, M.K., Yerima, S.Y., Sezer, S.: DL-droid: deep learning based android malware
detection using real devices. Comput. Secur. 89, 101663 (2020)
16. Gao, H., Cheng, S., Zhang, W.: GDroid: android malware detection and classification with
graph convolutional network. Comput. Secur. 106, 102264 (2021)
17. Xu, P., Eckert, C., Zarras, A.: Detecting and categorizing Android malware with graph neural
networks. In: Proceedings of the 36th Annual ACM Symposium on Applied Computing (SAC
’21), New York, NY, USA, 22–26 March 2021, pp. 409–412
18. Gao, Y., Wu, H., Song, B., Jin, Y., Luo, X., Zeng, X.: A distributed network intrusion detection
system for distributed denial of service attacks in vehicular ad hoc network. IEEE Access 7,
154560–154571 (2019)
19. D’Angelo, G., Castiglione, A., Palmieri, F.: A cluster-based multidimensional approach for
detecting attacks on connected vehicles. IEEE Internet Things J. 8(16), 12518–12527 (2021)
20. Peng, R., Li, W., Yang, T., Huafeng, K.: An internet of vehicles intrusion detection system
based on a convolutional neural network. In: 2019 IEEE International Conference on
Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustain-
able Computing & Communications, Social Computing & Networking (ISPA/BDCloud/
SocialCom/SustainCom), pp. 1595–1599. IEEE (2019)
21. Nie, L., Ning, Z., Wang, X., Hu, X., Cheng, J., Li, Y.: Data-driven intrusion detection for
intelligent internet of vehicles: a deep convolutional neural network-based method. IEEE Trans.
Netw. Sci. Eng. 7(4), 2219–2230 (2020)
22. Song, H.M., Woo, J., Kim, H.K.: In-vehicle network intrusion detection using deep convolu-
tional neural network. Vehicul. Commun. 21, 100198 (2020)
23. Ashraf, J., Bakhshi, A.D., Moustafa, N., Khurshid, H., Javed, A., Beheshti, A.: Novel deep
learning-enabled LSTM autoencoder architecture for discovering anomalous events from
intelligent transportation systems. IEEE Trans. Intell. Transp. Syst. 22(7), 4507–4518 (2020)
24. Liang, J., Chen, J., Zhu, Y., Yu, R.: A novel intrusion detection system for vehicular ad hoc
networks (VANETs) based on differences of traffic flow and position. Appl. Soft Comput. 75,
712–727 (2019)
25. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
26. Ying, C., et al.: Advance and prospects of AdaBoost algorithm. Acta Automat. Sin. 39(6),
745–758 (2013)
27. Shastri, S., et al.: GBoost: a novel grading-AdaBoost ensemble approach for automatic
identification of erythemato-squamous disease. Int. J. Inf. Technol. 13, 959–971 (2021)
28. Alzubi, J.A.: Diversity based improved bagging algorithm. In: Proceedings of the International
Conference on Engineering & MIS 2015 (2015)
29. Ramraj, S., et al.: Experimenting XGBoost algorithm for prediction and classification of
different datasets. Int. J. Control Theory Appl. 9(40), 651–662 (2016)
30. Jogin, M., et al.: Feature extraction using convolution neural networks (CNN) and deep learning.
In: 2018 3rd IEEE International Conference on Recent Trends in Electronics, Information &
Communication Technology (RTEICT). IEEE (2018)
31. https://ieee-dataport.org/documents/v2x-message-classification-prioritization-and-spam-det
ection-dataset
32. Kumar, R., Zhang, X., Wang, W., Khan, R.U., Kumar, J., Sharif, A.: A multimodal malware
detection technique for android IoT devices using various features. IEEE Access 7, 64411–
64430 (2019)
33. Yu, W., Ge, L., Xu, G., Fu, Z.: Towards neural network based malware detection on android
mobile devices. In: Cybersecurity Systems for Human Cognition Augmentation, pp. 99–117.
Springer (2014)
6 Malware Attack Detection in Vehicle Cyber Physical System … 193
34. McLaughlin, N., Doupé, A., Ahn, G.J., del Rincon, J.M., Kang, B.J., Yerima, S., Miller, P.,
Sezer, S., Safaei, Y., Trickel, E., Zhao, Z.: Deep android malware detection. In: Proceedings of
the Seventh ACM on Conference on Data and Application Security and Privacy—CODASPY
’17, pp. 301–308 (2017)
35. Fereidooni, H., Conti, M., Yao, D., Sperduti, A.: ANASTASIA: android malware detection
using static analysis of applications. In: 2016 8th IFIP International Conference on New
Technologies, Mobility and Security (NTMS), pp. 1–5. IEEE (2016)
36. Go, J.H., Jan, T., Mohanty, M., Patel, O.P., Puthal, D., Prasad, M.: Visualization approach for
Malware classification with ResNeXt. In: 2020 IEEE Congresson Evolutionary Computation
(CEC). IEEE, pp. 1–7 (2020)
37. Sudhakar, Kumar, S.: MCFT-CNN: Malware classification with fine-tune convolution neural
networks using traditional and transfer learning in Internet of Things. Future Gener. Comput.
Syst. 125, 334–351 (2021). https://doi.org/10.1016/j.future.2021.06.029
Chapter 7
Unraveling What is at Stake
in the Intelligence of Autonomous Cars
Abstract The integration of physical and cybernetic systems introduces new func-
tionalities that modify the configuration of autonomous driving vehicles. The
vehicle’s driving behavior is subject to respond differently than the driver expects,
causing accidents. Innovation in cybernetic systems is based on still immature infor-
mation. To achieve socially responsible innovation, it is necessary to dispel the uncer-
tainties of the black box of new technologies. We use an argumentative method to
show that there is a pattern, a unique structure, that appears repeatedly in the cogni-
tive linguistic process of both human beings and intelligent systems. From this, we
aim not only show that this pattern guarantees coherence to the decision-making
performed by cognitive computing, but also that it reveals what is at stake in the
intelligence of autonomous cars and in the biases of the black box of AI. Therefore,
by clarifying the dynamic of the unique cognitive linguistic process, as a common
process for individuals and machines, it is possible to manage the interpretive activity
of cyber-physical systems and the way they decide, providing safe and sustainable
autonomous cars.
7.1 Introduction
The intelligence, or better saying, the intelligent decisions of autonomous cars, unite
computational and physical resources, reconfiguring them to acquire autonomy, effi-
ciency, and functionality. There are still major challenges to be overcome in relation
D. M. Monte-Serrat
Computing and Mathematics Department, Law Department, USP, Unaerp, Brazil
C. Cattani (B)
Engineering School (DEIM), Tuscia University, Viterbo, Italy
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 195
J. Nayak et al. (eds.), Machine Learning for Cyber Physical System: Advances
and Challenges, Intelligent Systems Reference Library 60,
https://doi.org/10.1007/978-3-031-54038-7_7
196 D. M. Monte-Serrat and C. Cattani
to the safety of the automotive sector, revealing that scientific and engineering prin-
ciples need to be deepened in terms of integrating cybernetic and physical elements.
This chapter unravels the application of autonomous systems innovations confronting
the fundamentals of the human cognitive linguistic process to inspire the formulation
of algorithms and models of cyber-physical systems. To argue about the existence
of a unique structure present in the foundations of the human cognitive-linguistic
process that can be applied to intelligent systems, Chaim Perelman’s argumentative
method is used [1], which, instead of logical reasoning, makes use of a regressive
reasoning that considers variability of situations and special values. We clarify to
cyber systems’ researchers and developers, who deal with language and cognition,
that one cannot ignore the dynamic process through which human language and
cognition are expressed. This dynamic process is unique, integrating cybernetic and
physical elements.
In computational intelligence, the mechanisms of control and detection of context
elements are intertwined to reconfigure the machine’s cognition. The interconnec-
tion of these elements is still precarious because it does not imitate the human
cognitive linguistic process to the satisfaction. This chapter breaks new ground
by suggesting that, in addition to designing tasks that guide decision-making in
autonomous systems, it is necessary to consider the fundamentals of the human
cognitive linguistic process. Cognitive ability, when considered a ‘process’ encom-
passes the ‘dynamic’ aspect, which is subject to reconfiguration at different spatial
and temporal scales. Overcoming this spatial and temporal difficulty means opti-
mizing the autonomous system, preventing the degradation of its performance and
the robustness of its design. It is important to highlight that the dynamic cognitive
process is not limited to the influence of the logical sequence of tasks previously
established in the system’s cognitive core, but also responds to the unpredictability
of the environment. The temporal extension of cognition, both in humans and in
intelligent systems [2], has the role of making the system overcome the recurrent
limited capacity to manage uncertainties arising from accidental events during its
operation [3].
Pointing solutions do not solve the endemic problems of autonomous systems.
There is a need to intervene in the core of the machine’s cognitive system, providing
it with fundamental elements and information for the generation of its cognitive
activity. Under an argumentative method, we discuss the foundations of the dynamics
of the human cognitive linguistic process, in order to abstract basic principles that
can guide the autonomous system’s core design. In this way, all technicians and
researchers become aware of how they must act to improve the performance of
autonomous systems, so that synchronous computational and physical processes are
integrated with asynchronous computational processes. The fundamental principles
demonstrated in this chapter, therefore, not only have the potential to encourage the
development of tools and architectures that improve the functioning of autonomous
cars, but also raise awareness among technicians and researchers about how they
should act to ameliorate the performance of these systems.
In the quest to establish new principles for intelligent systems technicians to
design and implement the algorithmic core of autonomous systems, we resorted to
7 Unraveling What is at Stake in the Intelligence of Autonomous Cars 197
Highly automated systems can cause harm to people and property due to their misuse
or their design. Gillespie [4] suggests that the reliability of the automated system is
achieved through intervention in the Autonomous Human-Machine Team (A-HMT-
S), reallocating tasks and resources. The author states that this helps in approaching
autonomy development problems for teams and systems with a human-machine inter-
face. The difficulties encountered in teams of human beings-autonomous systems
(A-HMT-S) are due to the frequent reconfiguration of systems, which are not always
understandable or reproducible. To circumvent uncertainties in the interpretation of
input information in artificial intelligence, the author suggests the use of a hierar-
chical architecture to improve the effectiveness of the design and development of
the A-HMT-S through the use of specific tools machine learning, ML, the design
decisions that ensure actions are taken based on authorization from the human team
leader, and through a value adoption for tasks when setting priorities.
When it comes to automation of intelligent systems, there is a tendency, among
scientists, to develop a robotic consciousness that learns to adapt to changes, although
it is admitted that this subject is complex. [5] defines robotic consciousness as the
machine’s ability to recognize itself, imagine itself in the future and learn to react
to the world around it. This was the goal of Kedar et al. [6, p. 7] when they created
Spyndra, a quadruped robot, with an open-source machine learning platform to study
machine self-awareness. The authors bet on the robot’s self-simulation to predict the
sensations of its actions. They compare a simulated gear and the actual gear to push
the limits the machine presents in reshaping its own actions. The authors’ hypothesis
is that the system has self-awareness and record its own orientation and accelera-
tion. Visual camera information is combined with deep learning networks for path
7 Unraveling What is at Stake in the Intelligence of Autonomous Cars 199
planning. The experiment demonstrates that neither linear nor ridge regression accu-
rately predicted the global measurement. The explanation found is that direction
and orientation are related to yaw, and yaw is the least repeatable feature and diffi-
cult to predict. It was observed that the neural networks failed to make meaningful
predictions, leading the authors to assume that the robot state depends on the robot’s
previous state [6, p. 8]. An extra perturbation was also identified in the simulated
data due to the interference of the robot’s contextual reality, since the simulation
model assumed that the material is homogeneous [6, p. 9]. The authors promise to
improve their machine learning model. They make available open-source control
software and a data set for future researchers who want to develop a self-modeling
platform through augmentation of feedback sensors and resources extracted from
their simulation.
Autonomous systems deployed in cars, in turn, equipped with advanced driver
assistance systems (ADAS) [3], perform the task of driving. It has been noted that
while automation helps drivers, they must always be on the alert in case the computer
does not know what to do or intervenes incorrectly. These risks are still not sufficiently
recognized or managed. There are reports of situations where accidents occur because
drivers cannot understand why the vehicle responds or does not respond in a specific
way.
Artificial Intelligence, intelligent systems, machine learning, algorithms and
neural networks have something in common and challenging in the development
of a system that has its own conscience. When it comes to the correlations that the
system or robot uses to identify the context and promote its adaptation to it, it is
not enough to look at the superficial structures of cognition. No matter how many
tasks and resources are reallocated, the resulting reconfiguration will not always be
understandable or reproducible: neural networks end up failing to make predictions.
We propose a look into the depths of cognition, at its origins, to teach cyber systems
to intervene correctly to better manage risks.
The question is how to know which are the correlations that indicate a causal
connection with the behavior of the system? This is an important basis for machine
learning not to be vulnerable to human and algorithmic errors and biases. Errors
and spurious correlations confuse the results obtained by the intelligent system. The
challenge is still faced due to the complexity of neural network algorithms, called
black box model. This expose people to danger by not knowing exactly how and
why an algorithm arrived at a certain conclusion or decision. Much has been done
to manage risk, eliminate errors, adopt best practice protocols, but we know that this
is not enough. To better understand the reasons for this deadlock, we chose ADAS
system [3] to discuss possible solutions so that it avoids errors and failures.
200 D. M. Monte-Serrat and C. Cattani
Section 1.1 was dedicated to discussing the scenario of some autonomous systems,
mentioning some of their flaws. This Section and this chapter in general aim to explain
the foundations of cognition, whether human or machine. This is abstract knowledge
because it addresses dynamic structures. For this reason, quantification, represen-
tation or performance techniques do not occupy a prominent place. The theoretical
foundations of language and cognition, shared by humans and intelligent systems,
make up much-needed knowledge for developers and technicians who design the
algorithmic core of intelligent systems. It is these universal bases of language and
cognition that construct information or that determine the relationship between the
elements necessary for a system to carry out a certain task or decision. When seeking
to build an AI tool that has the intuitiveness of human cognition, the elements exposed
here are crucial.
Answering the complex question of what is at stake in the performance of self-
driving cars requires pooling knowledge from multiple disciplines. AI imitates human
behavior, and the use of neuroscience can help overcome some difficulties and find
new alternatives to impasses. We unite neuroscience with the branch of autonomous
AI systems to unravel the workings and weaknesses of ADAS. The increase in knowl-
edge promoted by the exchange between human cognition and cognitive computing
allows researchers and developers to optimize the self-experimentation of cyber-
systems. This exchange takes place through a unique architecture: the cognitive
linguistic dynamic process.
There is still no concise and clear concept of what language/cognition is. Language
is a system of conventional spoken or written symbols by which human beings
communicate. This system groups or combines things or elements forming a complex
or unitary whole, which, under a dynamic, involves value and gains the status of a
process. We focus the content of this chapter on this dynamic face of language,
understood as a form and not a substance [7]. We assume that, through Pereman and
Olbrechts-Tyteca’s [1] argument, human language and the language of intelligent
systems are similar because they share the same and unique cognitive linguistic
process (whether human or machine).
To establish the bridge that joins the human cognitive linguistic process to the
cognitive process of intelligent systems, we take advantage of the approach of
[1], to direct attention to relationships. Cognition and decision-making, as they are
processes and have dynamic relationships between various elements, fit perfectly into
7 Unraveling What is at Stake in the Intelligence of Autonomous Cars 201
their approach, which uses parameters of value and context with an appeal to reality.
The qualitative knowledge of Neurolinguistics and Artificial Intelligence (Cognitive
Computing) provided the meeting of a common element to the cognitive linguistic
process that both incorporate. With the investigative focus on the similarity of rela-
tionships between these disciplines, we were able to identify a repetitive pattern in
cognitive-linguistic functioning (Chap. 2 of [8]).
Recalling the examples in Sect. 7.1, we could observe that the reconfiguration
of the tasks (their logical sequence) of the A-HMT-S project is not always compre-
hensible or reproducible. Spyndra, though self-aware, does not accurately predict the
measurement linked to its own orientation. And, finally, the ADAS system, made clear
the need for some element to manage events in cases where the computer does not
know how to intervene correctly. The methodological approach of [1], by proposing
attention to relationships, brought to this research on the cognitive linguistic process
the opportunity to observe the existence of a standardized dynamic, present both in
human cognition and in cognitive computing.
The argumentative method used in this chapter about what is at stake in the
cognition of autonomous cars is not the analogy (which goes from the special to
the generic), nor the hierarchy between elements. The focus of the method is the
real observation of a repeated pattern in two branches of science (Neurolinguistics
and Artificial Intelligence). This pattern plays the role of a bridge, which organizes,
coordinates, unifies, and favors the exchange of information between disciplines
and even between cognitive systems (whether human or machine). Autonomous car
systems are designed to arrive at decision-making, which is, par excellence, the result
of the cognitive linguistic process.
According to Neurolinguistics, the cognitive linguistic system of human beings
encompasses the interconnection of neurons, glial cells, spinal cord, brainstem, cere-
bellum, and cerebrum [9]. This system somehow receives and processes electromag-
netic stimuli such as light, mechanics such as waves, pressure, vibration, or touch;
chemicals such as smell or taste; heat or cold [10, p. 26]. Its totality has not yet been
reproduced in AI, which leads us to explore new avenues of investigation oriented
towards the structural dynamics of the cognitive linguistic process, common to
humans and AI. The immutable structural dynamics of cognitive linguistic behavior
(Fig. 7.1) is put under the spotlight to show how it can be reproduced in its entirety
in the behavior of intelligent systems.
In humans, stimuli enter the sensory organs and are taken to the central nervous
system (brain at the center), where they are organized in a logical sequence so that they
make sense. In AI equipped with a perceptual model or a multisensory combination
design, the same process takes place. Environmental stimuli are captured by deep
neural networks (vehicle location, pedestrian detection, traffic sign detection, etc.)
and are taken to the algorithmic core (center) to be transformed into intelligible
information for the AI, which may or may not be activated behavior to perform
a task. (Figure created by the first author, art by Paulo Motta Monte Serrat. Icons
retrieved from https://www.flaticon.com).
We clarify that we are analyzing cognitive linguistic dynamics (which is
immutable), different from analyzes of specific models of autonomous vehicles,
202 D. M. Monte-Serrat and C. Cattani
Fig. 7.1 Shows the unchanging structural dynamics of cognitive linguistic behavior in the
individual (left side) and AI (right side)
which vary according to the tasks for which they were designed. In this Chapter we
show that all of them are constituted by a uniform cognitive linguistic process, yet
to be further explored. The analyzes of the dynamics of each of the autonomous
systems can be done individually to elaborate ways of improvement in search of a
model, or an organization of elements or even a network of connections that mimics
human behavior.
Everything that is done to optimize an autonomous system needs to conform to the
universal structure of the cognitive linguistic process present in AI and in humans. If
the design of a given system (to perform a given task) meets that universal architec-
ture, it will be successful. The positive result is achieved even though this system does
not reproduce the completeness of human cognition with all its elements (neurons,
glial cells, spinal cord, brainstem, cerebellum, and cerebrum). In other words, the
autonomous vehicle model that conforms to the universality of the cognitive linguistic
process acquires a universal coherence under the integration of the environment and
algorithmic design in its core.
The universality of the cognitive linguistic process, described in the book The
natural language for Artificial Intelligence [8], is represented by an algorithm that
guarantees dynamism to language and cognition [8, 11], Chap. 10. This algorithm not
only deals with events recorded in a chronological framework, but also with events
located within an order and meaning provided by the context. We seek to make the
machine learning operator aware that meaning and function come from a relationship
between elements within the cognitive linguistic process. It is in this way that the
intelligent system will be better adapted to its instrumentation.
7 Unraveling What is at Stake in the Intelligence of Autonomous Cars 203
Fig. 7.2 Shows the AI’s linguistic cognitive relationships: lower-range relationship and broader
relationship
204 D. M. Monte-Serrat and C. Cattani
dynamic and universal flow that goes from the data/stimulus collection (from the
context) to the cognitive center, where this stimulus/data is transformed into infor-
mation or behavior. If the design of a system does not observe the universal feature
of this cognitive linguistic flow, the purpose for which the autonomous system was
designed may be jeopardized. Therefore, the specific criteria used for modeling each
of the different autonomous systems should not be confused with the fundamental
unit of the cognitive linguistic process embedded in all intelligent systems.
In this Section we explore a new avenue of investigation oriented towards the dynamic
characteristic of the cognitive linguistic process. We emphasize that knowledge of
how cognitive dynamics is carried out will help technicians to optimize intelligent
systems. In the case of an autonomous system, it needs to comply with this dynamic,
as it is a structure present both in the cognitive linguistic process of human beings and
in the cognitive linguistic process of intelligent systems. It is, therefore, the universal
structure of the cognitive linguistic process.
The concern with the dynamics of the construction of information or the algo-
rithmic representation of the performance of a certain task by the intelligent system
necessarily requires coherence in the integration of the environment and design in
the algorithmic core. The concern with the representation of the dynamic process
under which information is constructed for the system to perform a certain task
necessarily requires planning coherence in the integration of the environment and
design in the algorithmic core. This coherence in the algorithmic core guarantees the
status of similarity with human cognition, with its dynamic sequences that imply rela-
tionships. The dynamic relationships carried out by the cognitive linguistic process
involve parameters of value and context which are supported by the reality of the
environment.
In short, the cognitive linguistic structure, common to humans and machines
has an essential function in the design of autonomous systems: the bridge func-
tion, which organizes, coordinates, unifies and favors the exchange of information
between human cognition and machine cognition, which is why it is so important.
Furthermore, we highlight in Figs. 7.1 and 7.2 that cognitive linguistic relations can
be discerned into two types: superficial and lower-range relations, and deep cogni-
tive linguistic relations with a broader scope. Lower-range cognitive relations act
on specific elements of the task sequence of a given system. Broad-ranging cogni-
tive relationships have to do with a hierarchy of elements that build information
or a sequence of tasks. This is the deep layer of cognition, shared by humans and
machines.
7 Unraveling What is at Stake in the Intelligence of Autonomous Cars 205
Advanced driver assistance systems, ADAS, are equipped with tools whose objec-
tives are defined by a sequence of tasks determined by algorithms (see lower-range
relationships at Sect. 2.2). This is the algorithmic core of the autonomous system.
Although determined to fulfill tasks, when it is linked to the individual’s use, it
becomes exposed to a wide range of contextual stimuli (see broader relationships
at Sect. 2.2.) with which it will have to deal due to its deep learning algorithms.
This occurs because the cognitive linguistic structure of the system has the same
human cognitive linguistic structure, that is, it integrates two fronts: the contextual
one, resulting from the collection of stimuli/data from the environment, and the logic
one, which organizes these stimuli in a logical sequence, making them intelligible
ones (turns stimuli into information) [8].
The fundamentals of human cognition as a dynamic process mixes the stimuli
arising from the context with the logical sequence of the central cognitive system
giving them a meaning. This fluid and changing composition of human cognition
should inspire ADAS design so that it is able to adapt to different contexts while
performing the main task for which it was designed. The mix of physical and compu-
tational resources in ADAS goes beyond the specific elements of its design. This can
be observed when, at the time of an error or accident, the system’s reaction to different
contexts can be deficient and result in weaknesses in the performance of its final task.
The fundamentals and principles of cognition serve as a guide to a more comprehen-
sive imitation of human behavior, so that highly automated systems are successful
when facing challenges. By bringing the two fronts of the cognitive linguistic process
together at the core of autonomous systems, there will be less likelihood of damage
to people and property and of misuse of system design.
The human cognitive linguistic structure to be imitated by ADAS must be guided
not only by the algorithmic core (lower-range relationship, see Sect. 2.2), but also by
receiving stimuli from its context (broader relationship, see Sect. 2.2). The union of
these two fronts mimics human cognition encompassing environmental parameters,
which makes the ADAS system dynamic, and its interpretive activity optimized. For
this, the juxtaposition of both fronts is not enough. There is a need for an organized
combination of tools that balance structural aspects of the algorithmic core with
contextual aspects collected by the system. If ADAS only deals with behavior patterns
determined by algorithms, the results of the system will not be satisfactory, since the
data is static. When dealing with the competition between contextual stimuli and the
sequence of tasks foreseen in the algorithmic core, the autonomous system works
more intuitively, but this has not yet proved to be enough. Designers report that they
cannot predict the results as it is a black box. The state of the art will be achieved
when the unification of the fronts, contextual and logical, occurs in a hierarchically
organized manner, ensuring sustainability in ADAS innovation, as the system starts
to focus on understanding the contextual dynamics, reflecting results with fewer
errors.
7 Unraveling What is at Stake in the Intelligence of Autonomous Cars 207
Autonomous systems integrate, as a rule, machine learning and deep learning algo-
rithms for different tasks such as movement planning, vehicle localization, pedes-
trian detection, traffic sign detection, road sign detection, automated parking, vehicle
cybersecurity and vehicle diagnostics failures [12]. The logical and executive func-
tions of intelligent systems are linked to the activity of interpretation, that is, to the
processing of semantically evaluable information. Monte-Serrat and Cattani [11]
explain that information processing by AI should imitate the human dynamic cogni-
tive linguistic process to result in the expected integration of the interpretive activity
of the intelligent system.
The integration of algorithms to the foundations of the cognitive linguistic process
allows computer scientists to optimize their cyber system. The overlapping of biolog-
ical and intelligent systems reveals a universal hierarchical structure in charge of
carrying out the interpretive activity [8, 11]. It is from this structure that we extract
strategies that offer good instrumentation and guarantee safe performance for AI
cognition. Describing in more detail, as a rule, the cyber system, to circumvent the
situations of the environment and carry out its tasks, interprets data. The level of data
interpretation by intelligent systems, despite the innovations in integrated detection
and control, result in punctual solutions, reaching only specific applications.
There is a need to rethink the unifying context of cyber-physical systems about
their interpretive activity. For now, what scientists have achieved is the use of open,
flexible, and extensible architectures for cyber-physical systems; the use of principle-
based compositions or integrations; activity in run-time operations to improve system
performance and reliability. In short, what has been sought is that the sensitivity of
the cyber-physical system to the context is combined with the ability to modify
its behavior, accommodating variable configurations. However, the autonomous
systems leave something to be desired, presenting defects and interpretive biases.
And yet, for these systems to perform these accommodations, new approaches and
human curation are needed to validate them. What has been noticed so far is that
the accommodation of new fundamentals, methods and tools has been insufficient
to mitigate errors in the interpretation of the autonomous system. In this chapter we
take a step forward: instead of accommodation we propose integration.
This third Section shows what is missing for ADAS to imitate human cognition.
We clarify that the construction of information or a sequence of tasks originates
from the mix of stimuli arising from the context with the logical sequence of the
208 D. M. Monte-Serrat and C. Cattani
central cognitive system. This is the fundamental structure of cognition, whose nature
is a fluid, dynamic process, capable of adapting to different contexts. We show
that the performance of ADAS when reacting to different contexts is still deficient
and weak. What would ADAS be missing to reach the state of the art and imitate
human behavior in the face of challenges? The computer does not know what to
decide or how to intervene specifically because it does not faithfully reproduce this
fundamental structure of the human cognitive linguistic process, articulating stimuli
from the environment to the logical sequence. While ADAS juxtaposes tasks at its
core, humans perform a hierarchically superior operation of combining stimuli or
data in order to achieve balance in the operation that encompasses continuous changes
over time.
Regarding deep neural networks, it has been claimed that the malfunction of intel-
ligent systems is due to black box AI, which is related to the lack of knowledge
of the algorithm’s intended behavior. To deal with the complexity and mystique of
the black box AI, it is necessary to understand that language and cognition form a
structure that is related to the semantic dimension. Semantics comes not only from
the linguistic system (logical functions of the system), but also from the context in
which information is produced (such as movement planning, vehicle localization,
pedestrian detection etc. [12]. Knowledge of the fundamental cognitive linguistic
structure as a single process for humans and machines ensures consistency in system
behavior and mitigates biases in system interpretation [8, 11]. The key to acceptable
ADAS performance, therefore, lies in the dynamic aspect under which it interprets
the information, making the system invariant to many input transformations and
preventing it from misinterpreting the events to which it is exposed.
The interpretive activity of AI has focused on the use of multilayer neural
networks designed to process, analyze, and interpret the vast amount of collected
data. Cybertechniques expect intelligent systems to produce responses similar to
human ones, but the results are subject to random interpretation and are often incon-
sistent with reality. To overcome this difficulty, Reinforcement Learning with Human
Feedback (RLHF) techniques are used. Another interpretation technique that makes
use of neural networks are knowledge graphs, but they also require exhaustive human
curation for the system to interpret the relationships between entities in accordance
with the real world. At the beginning of this Chapter, we cite the work of [4] who
also suggests human intervention in what the author calls the Autonomous Human-
Machine Team (A-HMT-S) to circumvent the defects presented by the intelligent
system.
On the other hand, we have dynamic programming [13, 14] as an example of
success in the mathematical optimization of cyber systems. It meets what we expose
7 Unraveling What is at Stake in the Intelligence of Autonomous Cars 209
Black box AI affects the interpretability of ADAS because this system makes use of
deep neural networks. It has been observed that the critical stages of systems with
autonomous driving are in the features related to the interpretive activity, such as,
for example, in perception, information processing and modeling [15]. The inputs
and operations of the algorithms are not visible to the user due to the complexity of
the cyber-physical system. Impenetrability in the system stems from deep learning
modeling that takes millions of collected data points as inputs and correlates that
data to specific features to produce an output.
By constituting a cyber-physical system and dependent on interpretive activity,
ADAS integrates the universal cognitive linguistic structure. The system makes use
of the linguistic process on two fronts: via logical reasoning (as a sequence previously
established by the algorithm designed by its technical developer) and via reception
of stimuli (the repetition of the input of stimuli in the circuits of the neural networks,
which, even being combined with reinforcement is still insufficient). This dual front
of the ADAS cognitive linguistic process is self-directed and difficult to interpret by
data scientists and users. Because it is not visualized or understood, the interpretive
activity of the autonomous system is led to errors, from inconspicuous errors to errors
that cause major problems, or even those that are impossible to repair. At a time before
these problems, one could also identify AI bias (in the training data, for example) by
the developers of the autonomous system, which could lead to potentially offensive
results for those affected. How to act so that the self-directed activity of ADAS ceases
to be a black box and interprets it in accordance with the human mind, to prevent
problems and losses?
ADAS that adequately performs its tasks must have its universal linguistic cogni-
tive structure organized according to a hierarchy of values. Values arising from
context inputs (broader relationship, see Sect. 2.2) and values arising from the inter-
pretive activity according to the algorithmic core model (lower range relationship,
see Sect. 2.2), must come into play in a targeted manner, in order to organize the
interpretive activity of the system before it accomplishes its ultimate goals (executive
function). As the executive functions of ADAS are connected to deep neural networks
210 D. M. Monte-Serrat and C. Cattani
responsible for collecting data from the environment, the collection of millions of
data points may prevail over the interpretive activity of the algorithmic core, biasing
it [11]. Although ADAS has a planned behavior (logical functions linked to the algo-
rithmic core), if there is no organization of the cognitive linguistic activity of the
system involving the broader and the lower-range relationships, it will not be adapt-
able to the changes that occur in the environment. On the other hand, if it is regulated
and organized, the executive functions of the system will be flexible when errors and
risks are detected.
The unification of the autonomous system (which is different from Reinforce-
ment Learning from Human Feedback or human curation) is what will allow the
monitoring of its decision making. The synchronized cognitive flexibility arising
from the dynamic linguistic process (broader relationship unified with the lower-
range relationship, see Sect. 2.2) allows it to adjust to unforeseen demands, over-
coming sudden obstacles. Cognitive flexibility allows ADAS to face a variety of
challenges, making the autonomous system more intuitive, which brings its mathe-
matical modeling closer to the dynamic structure of human cognition. Both human
cognition and AI cognition can translate, interpreting the real world, because they
reflect the fundamental structure of the dynamic cognitive linguistic process, which is
able to operate values to establish meanings, correlating logical pattern and contextual
pattern [16].
ADAS modeling deals with relationships within a dynamic process that generates
interpretation. Aware of this, it is assumed that the consistent interpretation of the
events to which ADAS is exposed results from the processing of these relationships.
There is imitation of the performance of human cognition to unify the operation
of values (of the context) with the sequence of tasks (of the algorithmic core that
determines the logical sequence of tasks to be executed). In this way, the supposed
black box of autonomous systems has its functioning revealed by unifying math-
ematical relations (logic/previously categorized elements/frozen context) to non-
mathematical relations (contextual/dynamic) [16]. The universal structure of the
linguistic cognitive process makes it clear how the autonomous system makes use of
the interpretive activity and how it can provide guidance consistently with the context
to which the system is exposed. The cybersystems’ developer needs to consider a
hierarchy of values in the dynamic processing of the (interpretive) behavior of the
system. The organization of this AI interpretive activity results in the valuation of
categorized elements of the algorithmic core that are unified with the fluid values
of the context of the environment to which the intelligent system is exposed. This
hierarchy and unification optimize executive functions and makes the system more
intuitive.
7 Unraveling What is at Stake in the Intelligence of Autonomous Cars 211
In this Sect. 7.4, we show the need to understand that the cognitive linguistic struc-
ture, shared by humans and machines, is related to the semantic dimension. ADAS’
semantic dimension deals with logical functions of the system and also with the
dynamic context of the environment (movement planning, vehicle location, pedes-
trian detection etc.). The interpretative activity of ADAS covers dynamic aspects
that cause input transformations, which can lead to erroneous interpretations of the
environment. This cognitive functioning helps to unveil the AI black box. ADAS’
core algorithmic collection of stimuli places them within a logical sequence of tasks.
However, ADAS still does not present the superior operation of cognition, which hier-
archically relates the elements it is dealing with. This lack of hierarchical dynamic
organization of stimuli and data leads the intelligent system to present imperceptible
errors and even errors that cause major problems. For ADAS to perform executive
functions properly, mimicking human cognition, it must have its cognitive core orga-
nized according to a dynamic hierarchy of values. These values arise from context
inputs (broader relation, see Sect. 2.2) and interpretive activity according to the
central algorithmic model (lower range relation, see Sect. 2.2).
Learning carried out by the autonomous driving system, when receiving stimuli
from new interactions not foreseen in the algorithm, undergoes the reorganization of
its neural circuits, similar to what happens in human learning. This process occurs
according to the fundamentals of the human cognitive linguistic process [9]. Because
it is a single structure, it overlaps in ADAS learning, which leads us to think that it is
not regulated only by the algorithm (core), but also by complex aspects arising from
the interaction of the system with the environment. How to perform the integration of
both (algorithmic core and context) to reach the state of the art in intelligent systems?
The expected result of autonomous systems is that there is a structural and
functional organization exactly as the result of the nervous system of individuals
when reacting to contextual factors. Reports of failures pointed out in [3] show that
autonomous systems still do not imitate human cognition satisfactorily. To resolve
this impasse, we point out, as an example, Bellman’s theory [17], which provides
means to bring cognitive computation closer to the human cognitive linguistic
dynamic process. Reproducing the biological mechanism of reorganizing neural
circuits based on environmental stimuli in self-driving cars is not an easy task. For
these systems to establish a memory and reorganize neural circuits to perform new
tasks, the juxtaposition of different tools or mechanisms is not enough. It is necessary
212 D. M. Monte-Serrat and C. Cattani
What is at stake in the behavioral ability of the intelligent system to carry out the
tasks that have been assigned to it is the process of learning. This reflection, when
carried over to ADAS learning, takes us beyond the dependence on its deep neural
networks, and leads us to consider the stimuli that have their origin in the environment
that surrounds the autonomous driving vehicle. The deep neural algorithms (core of
the intelligent system) are responsible for only a part of the cognitive process of
ADAS. The other part of its cognition is based on experiences in the environment,
7 Unraveling What is at Stake in the Intelligence of Autonomous Cars 213
which interfere, often in ways not foreseen by its designers, in the activity of deep
neural networks, resulting in AI black box biases.
Knowledge of how human learning works, which, upon receiving stimuli, reorga-
nizes its neural circuits [9] makes it clear that ADAS will be defective if its learning
is regulated only by the algorithmic core of logical sequence. The state of the art
will be found when the ADAS algorithmic core is chronologically integrated with
complex aspects arising from environment stimuli. How to perform this integration?
The way of doing physical-cybernetic systems is as important as the tool used in them.
This chapter brings a warning to researchers and developers of intelligent systems
that expressing mathematical propositions in logical sequences is not enough. The
system needs to understand its context to respond appropriately with a behavior. The
algorithmic core does not accurately describe the environment. There is a need for
the system to be powered by another type of information to receive stimuli from
real events. How can the (logical) algorithmic core integrate these real events into
their contextual structure, whose order of meaning cannot be summarized as a mere
logical sequence of tasks? Taking these questions into account and Carl Sagan’s
assertion that science is not just a body of knowledge, but also a way of thinking
[20], we list some principles that organize the way of doing the design of an intelligent
autonomous driving system:
1. the cognitive linguistic system of AI must be understood not as a substance, but
as a form, that is, a dynamic process;
2. the cognitive linguistic process of cyber-physical systems must have its compo-
nents inspired by the human cognitive linguistic process, which has two fronts:
a contextual one and a logical one;
3. The contextual front must align the design of the autonomous system’s cognition
in different spatial and temporal scales to respond to dynamic events;
4. the logical front must configure the sequence of tasks that may or may not result
in decision-making;
5. All the above organizing principles make up the interpretive activity of cyber-
physical systems. They must, therefore, be designed in a unifying way, like an
umbrella, to ensure consistency in the behavior of autonomous driving systems,
integrating context stimuli into the algorithmic sequence.
Section 7.5 highlights that the learning carried out by the autonomous driving system
has not yet reached the way in which the universal structure of cognition merges
logical sequence of tasks with the dynamics of stimuli received from the environment.
ADAS, when receiving stimuli from new unforeseen interactions, does not organize
its neural circuits satisfactorily, which prevents it from reaching the state of the art
of imitating human cognition.
ADAS learning, in addition to being regulated by the task sequence core, needs to
reach complex aspects arising from the system’s interaction with the environment.
214 D. M. Monte-Serrat and C. Cattani
The juxtaposition of different tools is still not enough. A memory capable of orga-
nizing neural circuits to perform new tasks under a specific chronological pattern to
be imitated is necessary. It is the synchronization between the logical sequence ‘if P
then Q’ to the chronological pattern resulting from the facts of the environment that
will avoid errors and defects of ADAS. Our proposal is that ADAS learning synchro-
nization is carried out in the form of the unification of the system’s algorithmic core
with its neural circuits that react to the environment. Our contribution, therefore,
lies in suggesting how to make physical-cybernetic systems. For this reason, this
Chapter does not focus on intelligent systems tools. We do not bring tools, but rather
a body of knowledge to overcome the difficulties of integrating the algorithmic core
of autonomous systems with real events in their contextual structure. Within this
purpose, we have brought in this section some principles that organize the way of
designing an intelligent autonomous driving system.
7.6 Conclusion
The integration between context and algorithmic sequence developed by the cognitive
linguistic dynamic process serves as an umbrella to encompass the various activities
related to cybersystems. We cite Richard Bellman’s solution process as an example
for cybernetic projects involving dynamic programming, i.e., to find the best deci-
sions in a problem-solving process, one must seek one solution after another, nesting
smaller decision problems within of major decisions [17]. Contrasting Bellman’s
solution adopted by us, we can observe that the application of different algorithms for
different autonomous driving tasks is a complex task. [12] claim that the complexity
of autonomous vehicles implies the use of more than a single algorithm, since the
vehicle’s activity provides information from different perspectives. They suggest
for faster execution the tree model as a learning model, for motion planning, they
suggest the dynamic model to reduce the planner execution time; Reinforcement
Learning (RL) for speed control; for pedestrian detection, they propose an algorithm
that combines a five-layer convolution neural network and a classifier; for lane recog-
nition, a steerable fusion sensor capable of remaining unchanged on structured and
unstructured roads.
We seek, in understanding the basic functioning of the human cognitive linguistic
process, a way to simplify this task. We show that the interaction of the human being
with the world is essential for the development and learning processes. This inter-
action deserves to be highlighted in the development of autonomous cars, no matter
how diverse the tools used are. What is at stake in the intelligence of autonomous
cars is not just the tool used, but how it works, how the human-machine-environment
interaction is carried out. The expected result of autonomous systems is that there is
a structural and functional organization similar to that of the nervous system of indi-
viduals that can be altered by contextual factors. We suggest that this integration be
done recursively according to Bellman’s theory [17], synchronizing the algorithmic
core to the collection of stimuli from the context in which ADAS is operating.
7 Unraveling What is at Stake in the Intelligence of Autonomous Cars 215
The credibility of ADAS will increase as it develops its capacity for self-regulation
within the proposed hierarchy, optimizing its capacity for self-guidance. This skill
includes developing strategies, seeking information on its own, solving problems,
making decisions, and making choices - regardless of human support. This learning
is slow and requires constant adjustments. The interpretation performed by ADAS
goes beyond the information [21] in its algorithm, as there is cognitive overload
generated by the contexts to which it is exposed, and this also shapes its cognition.
ADAS needs protection for a proper interpretation of the context combined with
concentration on the sequence of tasks given by the algorithm. In this way, the
autonomous system is not reduced to identifying something, but to thinking about
something. In other words, it is not about what to learn, but how to behave considering
different perspectives [22].
The fragility of ADAS lies in its cognitive activity in not being able to distinguish
information, in verifying the sequence of tasks with its environment and in aligning
its decision-making with the context in which it finds itself. The fundamentals of the
linguistic cognitive structure described in this chapter says less in terms of perfor-
mance or quantification techniques and more in terms of cognitive process. Thinking
about the how rather than the what legitimizes universality and makes the cognitive-
linguistic process a less obscure notion. Where is the characteristic of universality
of the cognitive linguistic process capable of unraveling the AI black box? In the
structure, in the dynamic process carried out both by the cognitive faculties of indi-
viduals and by the cognition of artificial intelligence. When ADAS does not reach
this universality, it does not acquire the necessary cognitive legitimacy to keep up to
date, which leaves it susceptible to weaknesses.
What is really at stake is how to pass instructions to the ADAS design, rather than
what instructions to pass to ADAS. The perspective of the universality of ADAS
cognition, therefore, does not lie in the statistical data it collects, nor in the combina-
tion of different algorithms, but in its ability to properly process different contextual
situations. The universal structure of the cognitive linguistic process reveals the
way in which human cognition processes information. Inspiring the design of cyber
systems in this universal structure means finding solutions to the security issues that
exist in the cyber-physical systems of autonomous driving vehicles. In addition to
mentioning dynamic programming [13, 14] as an example of success in the mathe-
matical optimization of cybernetic systems, we disclose that the challenges in imple-
menting the approach proposed in this Chapter by applying real-time cognition to
cybernetic systems represent the new directions of our research. It is moving towards
publishing new studies that teach intelligent systems not only to identify something,
but also to think about something. The universality of the cognitive-linguistic process
is leading the way for us to resort to new mathematical techniques that, as far as we
know, have not yet been related to language and cognition. These new techniques will
convey the embryonic aspect of cognition, preventing the researcher or technician
from getting lost in the complex aspects of the superficial layers of the cognitive
linguistic process. In this new approach, aspects of memory and representation that
organize neural circuits to perform new tasks are being considered. We believe that
this new point of view will be able to meet the real chronological pattern of the
216 D. M. Monte-Serrat and C. Cattani
human cognitive linguistic process. In this way, it will be possible to design cyber
systems that are able to synchronize learning in order to unify their algorithmic core
with neural circuits that react to the environment.
References
19. Monte-Serrat, D., Belgacem, F.: Subject and time movement in the virtual reality. Int. J. Res.
Methodol. Soc. Sci. 3(3), 19 (2017)
20. Sagan, C.: The Demon-Haunted World: Science as a Candle in the Dark. Ballantine Books
(2011)
21. Cormen, E., Inc.: Language. Library of Congress, USA (1986)
22. Monte-Serrat, D., Cattani, C.: Applicability of emotion to intelligent systems. In: Information
Sciences Letters, vol. 11, pp. 1121–1129. Natural Sciences Publishing, New York (2022)
Chapter 8
Intelligent Under Sampling Based
Ensemble Techniques for Cyber-Physical
Systems in Smart Cities
D. K. K. Reddy (B)
Department of Computer Science Engineering, Vignan’s Institute of Engineering for Women
(Autonomus), Visakhapatnam, Andhra Pradesh 530046, India
e-mail: [email protected]
B. K. Rao
Department of Computer Science and Engineering, GITAM (Deemed to be University)
Visakhapatnam Campus, Visakapatnam, Andhra Pradesh 530045, India
e-mail: [email protected]
T. A. Rashid
Erbil, Kurdistan Region, Iraq
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 219
J. Nayak et al. (eds.), Machine Learning for Cyber Physical System: Advances
and Challenges, Intelligent Systems Reference Library 60,
https://doi.org/10.1007/978-3-031-54038-7_8
220 D. K. K. Reddy et al.
affirms the promise of this approach. Moreover, the suggested method surpasses
conventional accuracy metrics, striking a favourable balance between efficacy and
efficiency.
Abbreviations
8.1 Introduction
At present, more than 50% of the global population lives in cities, and this trend is
projected to continue as urban areas grow in both population and size. As per the
UN WUPs, it is projected that by 2050, approximately 66% of the global popula-
tion will reside in cities [1]. To address the escalating complexity of modern urban
landscapes, several projects have been initiated to amalgamate advanced technolog-
ical solutions, thereby elevating the sophistication of urban design and management.
Prominent examples of these intelligent urban solutions include the implementa-
tion of ICT technologies in areas such as enhanced power grids for reduced energy
loss, progressive transportation systems along with connected vehicle innovations to
boost city mobility, and optimized infrastructures aimed at diminishing hazards and
bolstering operational effectiveness [2, 3]. The development of novel information
and communication technologies, such as the Cloud Computing, CPS, Big Data, and
IoT has made these advancements possible. The growing interest in incorporating
the concept of CPS into the realm of smart cities, has garnered increasing attention
recently. CPS represent the fusion of ICT with physical infrastructure and systems,
empowering cities to meet the growing demand for greater sustainability, efficiency,
and improved quality of life for their inhabitants, thereby advancing their smartness
[4]. This concept of smartness is closely tied to awareness, which involves the capa-
bility to identify, perceive, or be aware of objects, events, or physical arrangements.
The significant advancements in sensor and wireless technologies have led to the
capability to accurately monitor and capture physical phenomena in the environ-
ment. This data can then be preprocessed using embedded devices and seamlessly
transmitted wirelessly to networked applications capable of performing sophisti-
cated data analysis and processing [5]. CPS have deeply integrated CIs with human
life. So, it becomes imperative to prioritize the security considerations of these
systems. Model-based design and analysis, including the use of attack and coun-
termeasure models, offer significant potential in tackling the security challenges
associated with CPS [6]. IML-based CPSs play a crucial role in the development and
sustainability of smart cities. These systems seamlessly integrate physical infras-
tructure with advanced ML capabilities, enabling cities to enhance efficiency, safety,
and quality of life for their residents. IML-CPS facilitates real-time monitoring and
data-driven decision-making, enabling city authorities to optimize traffic manage-
ment, energy consumption, waste disposal, and emergency response systems. More-
over, these systems can predict and mitigate potential issues, contributing to more
resilient and sustainable urban environments. By harnessing the power of ML and
data analytics, IML-CPS empowers smart cities to not only address current chal-
lenges but also anticipate and adapt to future urban complexities, making them more
liveable, sustainable, and responsive to the needs of their citizens.
The objective of the chapter is
i. To develop an anomaly-based detection system tailored for CPS environments
characterized by resource constraints, capable of assessing the categorization of
network traffic into normal or anomalous events.
222 D. K. K. Reddy et al.
A CPS is a system that intricately combines both physical and computational elements
to function cohesively. CPS potentially consists of ICS (Cyber System) and SBS
(Physical System) [7]. SBS, as demonstrated through technologies like wireless
sensor networks and intelligent building management systems, utilize a network
of distributed sensors to gather data about the environment and system operations.
This information is then sent to a centralized system for analysis and processing. CPS
act as a conduit linking the tangible, physical world with the digital domain, a place
where data undergoes storage, processing, and transformation. CPS, which amal-
gamate computing, communication, and control functionalities, have emerged as a
pioneering frontier in the advancement of physical device systems. CPS is character-
ized as an interconnected assembly of loosely integrated distributed cyber systems
and physical systems, managed, and regulated through user-defined semantic rules.
The network serves as the conduit bridging the cyber and physical domains, creating
a sprawling, heterogeneous, real-time distributed system [8]. A CPS comprises
four fundamental components: physical elements, a sensing network, control node
(computing device), and a communication network. Figure 8.1 illustrates the CPS
system model. The physical components represent the systems of interest that require
monitoring and safeguarding. The sensing network consists of interconnected sensors
distributed to observe the physical environment. As an integral component of the
CPS, the sensing network actively engages in a closed-loop process encompassing
sensing, computing, decision-making, and execution [6, 7]. These sensor-generated
data are then transmitted to control node for processing and analysis. Computational
intelligence methods are applied to make informed decisions and control actuators,
ultimately influencing the behaviour of the physical components. The control nodes
8 Intelligent Under Sampling Based Ensemble Techniques … 223
Most of the data generated by CPS devices is not inherently biased. CPS devices
collect a huge data based on their design and sensors. If the sensors are not calibrated
properly or if they have limitations, the data collected may be inaccurate or biased.
In some IoT applications, data may be selectively collected from certain locations
or devices, omitting others, human decisions and actions in the design, deployment,
and maintenance of CPS systems can introduce bias. This selection bias can lead to
an incomplete or skewed view of the overall system. Due to a significant number
of false alarms, high FPR, and low DR, researchers and practitioners often rely
on feature selection, and hyperparameter tuning in the context of CPS. But by using
these techniques in the smart cites landscape there is an unintentional loss of data and
increase in computational time while adhering to resource constraints. Furthermore,
many CPSs have failed in practice because it’s difficult to design a quick, light, and
accurate IML model due to the quickly expanding number of devices and the large
variety of traffic patterns.
Feature selection for CPS faces several limitations. Firstly, the multidimensional
nature of CPS data often involves a high volume of features, making it challenging
to identify the most relevant ones efficiently. Additionally, CPS data can exhibit
dynamic and nonlinear relationships, and feature selection methods may struggle to
capture complex patterns adequately. Furthermore, some CPS applications demand
real-time processing, limiting the time available for exhaustive feature selection
procedures. Data quality issues, including noise and missing values, can also hinder
the accuracy of feature selection outcomes. Lastly, the diversity of CPS domains,
from healthcare to industrial automation, poses unique challenges, as feature selec-
tion techniques may need to be tailored to specific application contexts, making it
crucial to consider these limitations when implementing feature selection strategies
for CPS.
Sampling techniques can be highly valuable in the context of CPS for protecting
CI. CPS involves the integration of physical processes with digital systems, and
protecting these systems is paramount in safeguarding CI. Sampling allows for the
efficient collection of data from various sensors and components within the CPS
226 D. K. K. Reddy et al.
:
:
− Data: Original dataset
− Class A: Instances from the minority class
− Class B: Instances from the majority class
− IR: Class imbalance ratio
− UR: Desired undersampling ratio
:
− Balanced Dataset
IR
: UF =
UR
: UR < 1:
Number of Instances to Select
= floor(UF ∗ Number of Class B instances)
Randomly Select Instances from Class B
Balanced Dataset
= Concatenate (Class A, Selected Instances from Class B)
:
Balanced Dataset = Concatenate (Class A, Class B)
: return Balanced Dataset
230 D. K. K. Reddy et al.
∶
Training set
majority in
minority in
Base classifier
number of base classifiers
Hardness function
number of bins
initialized to zero
1
Final ensemble ( ) = ∑ ( )
=1
step 2 i= i+1
1 −1
step 3 Ensemble ( )= ∑ ( )
=0
step 4 Cut majority set into k bins w. r. t. ( , , ) : 1 , 2 , … ,
Average hardness contribution in ℎ bin: ℎ
step 5 ( , , )
=∑ , ⍱ = 1, …
∊ | |
∶
Training set
majority in
minority in , where | | < | |
Number of iterations to train AdaBoost ensemble
Number of subsets from N
False poistive rate(FPR)
initialized to zero
1
Final ensemble ( ) = ∑ ∑ , ℎ , ( ) − ∑ Ɵ
=1 =1 =1
step 1 i= i+1
step 2 Randomly sample a subset from , | | = | |
step 3 Learn using and . is an AdaBoost ensemble with s i weal class
and corrsponding weights , . The ensemble threshold is Ɵ i. e.,
( )=∑ , ℎ, ( )−Ɵ
=1
step 4 Adjust Ɵ , such that FPR is
step 5 Remove from N all examples that are correctly classifed by
i=T
∶
step 1
In each cycle of the random forest process , select a bootstrap sample fro
the smaller class. Then, choose an equivalent number of cases from the
larger class, using replacement .
step 2 Develop an unpruned classification tree to its full extent using the data .
This tree should be constructed using the CART methodology , with one
key variation: At every decision point, rather than examining all variable
for the best division, limit the search to a randomly chosen subset of
variables .
step 3
Execute the steps 1 & 2 repeatedly as many times as necessary . Compile
the outcomes from the collective ensemble and derive the final decision
based on this aggregation .
232 D. K. K. Reddy et al.
∶
Training set
minority in
majority in , where | | < | |
Number of subsets from N
Number of iterations to train AdaBoost ensemble
initialized to zero
step 1 i= i+1
step 2 Randomly sample a subset from , | | = | |
step 3 Learn using and . is an AdaBoost ensemble with s i weal class
and corrsponding weights , . The ensemble threshold is Ɵ i. e.,
( )=∑ , ℎ, ( )−Ɵ
=1
i=T
∶
Training set
Feature space
Class labels
ℎ WeakLearner
point in
point in
initialized to zero
1
Final ensemble ( ) = argmax ∑ ℎ ( , )
∊ =1
1
step 1 Initialize ()= for all
step 2 t = t+1
step 3 Create temporary training dataset , with distribution , using random
under sampling.
step 4 Call WeakLearn providing it with examples , and their weights , .
step 5 Hypothesis ℎ : × → [0,1]
step 6 Compute the loss ∊ = ∑ ( )(1 − ℎ ( , ) + ℎ ( , ) )
( , ): ≠
∊
step 7 Compute the weight update parameter: ∊ =
1 −∊
1
step 8 (1+ℎ ( , )−ℎ ( , : ≠ ))
Update : +1
()= () 2
step 9 +1 ( )
Normalize +1 : =∑ +1 ( ), +1
()=
t=T
8 Intelligent Under Sampling Based Ensemble Techniques … 233
This section provides a concise overview of both the system environment and
the dataset employed in the study. The procedures for collecting the dataset and
conducting experiments are outlined here, encompassing the materials and methods
234 D. K. K. Reddy et al.
The testing platform utilized was the Google Colab Notebook. The imbens.ensemble
framework is open-source and designed to harness the capabilities of ensemble
learning for tackling the challenge of class imbalance.
As the MSCA environment experiences rapid growth in tandem with the increasing
prevalence of networks and applications, there is a rising need for a dependable IDS
to safeguard networks and devices. To effectively address the unique features of
emerging threats, particularly in the context of MSCA, the availability of a current
and dependable dataset becomes imperative for robust IDS implementation. This
research introduces a novel benchmark MSCA dataset for analysing cyberattacks,
encompassing two distinct attack scenarios [20]. The primary setup focuses on pass-
word cracking attacks, while the next setup centers on volume-based DDoS attacks.
8 Intelligent Under Sampling Based Ensemble Techniques … 235
The dataset has been meticulously annotated, comprising six PCAP-processed files
and 77 network feature files acquired through Wireshark analysis. It is organized into
normal and anomalous network traffic categories, and the distribution of the MSCA
dataset is illustrated in Figs. 8.3, 8.4 and 8.5.
The experimental results indicate that the under-sampling classifiers SPEC, UBC,
and BCC accurately detect network anomalies. Figures 8.6, 8.7, 8.8, 8.9, 8.10 and
8.11 shows the under-sampling classifiers training distribution with respect to the
estimators. For a fair study of the proposed work all the models were seeded with n_
estimator s = 100. Precision, recall, and F1-score metrics were used to assess the
performance of each algorithm. Tables 8.2, 8.3, 8.4, 8.5, 8.6 and 8.7 present the eval-
uation metrics TPR, FPR, precision, recall, AUC, F1-score, error rate, and accuracy
are used for under-sampling ensemble techniques. In the experiment conducted on
the MSCA dataset to address class imbalance problems, SPEC achieved the highest
accuracy among all six classifiers. SPEC consistently achieved an average accu-
racy of approximately 0.9613 in all cases, indicating its outstanding classification
correctness. UBC and BCC also exhibited promising results with slightly lower accu-
racy, demonstrating commendable predictive correctness. Despite the significantly
lower density count of cases related to ICMP_Flood, Web_Crwling, and HTTP_
DDoS compared to Port_Scan and Brute_Force, all the classifiers achieved decent
accuracy. UBC, BRFC, RUSBC, and BCC showed accuracy in the range of 0.97
to 0.99. However, the EEC classifier exhibited lower accuracy of 0.88. It appears
236 D. K. K. Reddy et al.
Fig. 8.5 Attack distribution with total data and attack data
8 Intelligent Under Sampling Based Ensemble Techniques … 237
that the EEC classifier struggled to effectively address class imbalance using under-
sampling techniques, suggesting the need for further research using over-sampling
in the case of the EEC classifier. The precision, recall, F1-score, and accuracy values
were the lowest for cases associated to ICMP_Flood, and Web_Crwling anomalies
due to their small number of instances. Nonetheless, the precision and recall metrics
exhibited consistently high values across different anomalies, particularly notable in
the case of Brute_Force and Normal. The evaluation metrics of the proposed work
are visually depicted in Figs. 8.12 and 8.13. Table 8.8 illustrates the weighted average
of the under-sampling-based ensembles techniques. Table 8.9 shows a comparison
study of various researchers work on MSCA dataset. It is worth noting that relying
solely on a single rule to detect intrusions based on typical traffic patterns often leads
to false positive results. Anomaly-based CPS models consider any traffic deviating
from the normal pattern as abnormal. The utilization of under-sampling techniques
helps address this issue. While under sampling can be effective in balancing imbal-
anced data, there are some challenges to consider when deploying it in real-time
applications. As the data is constantly changing in real-time applications, it may be
difficult to maintain a balanced dataset. It is important to carefully monitor and adjust
the sampling technique to ensure accurate results.
8.8 Conclusion
References
1. Ghaemi, A.A.: A cyber-physical system approach to smart city development. In: 2017 IEEE
International Conference on Smart Grid and Smart Cities (ICSGSC), IEEE, pp. 257–262.
https://doi.org/10.1109/ICSGSC.2017.8038587
2. Wang, C. et al.: Dynamic Road Lane management study: A Smart City Application To cite this
version: HAL Id: hal-01259796 A Smart City Application (2019)
3. Reddy, D.K.K., Behera, H.S., Naik, B.: An intelligent security framework for cyber-physical
systems in smart city. In: Big Data Analytics and Intelligent Techniques for Smart Cities, vol.
10, no. 16, pp. 167–186. CRC Press, Boca Raton (2021). https://doi.org/10.1201/978100318
7356-9
4. Nam, T., Pardo, T.A.: Conceptualizing smart city with dimensions of technology, people, and
institutions. In: Proceedings of the 12th Annual International Digital Government Research
Conference: Digital Government Innovation in Challenging Times, pp. 282–291. ACM, New
York, NY, USA (2011). https://doi.org/10.1145/2037556.2037602
5. Neirotti, P., De Marco, A., Cagliano, A.C., Mangano, G., Scorrano, F.: Current Trends in
Smart City Initiatives: Some Stylised Facts, vol. 38 (2014). https://doi.org/10.1016/j.cities.
2013.12.010
6. Sallhammar, K., Helvik, B.E., Knapskog, S.J.: Incorporating attacker behavior in stochastic
models of security (2005)
7. Nayak, J., Kumar, P.S., Reddy, D.K.K., Naik, B., Pelusi, D.: An intelligent security framework
for cyber-physical systems in smart city. In: Big Data Analytics and Intelligent Techniques for
Smart Cities, pp. 167–186. Wiley, Boca Raton (2021)
8. Tang, B. (2016). Toward Intelligent Cyber-Physical Systems: Algorithms, Architectures, and
Applications (2016)
244 D. K. K. Reddy et al.
9. Reddy, D.K.K., Nayak, J., Behera, H.S.: A hybrid semi-supervised learning with nature-inspired
optimization for intrusion detection system in IoT environment. In: Lecture Notes in Networks
and Systems, vol. 480 LNNS, pp. 580–591 (2022). https://doi.org/10.1007/978-981-19-3089-
8_55
10. Reddy, D.K.K., Behera, H.S.: CatBoosting Approach for Anomaly Detection in IoT-Based
Smart Home Environment, pp. 753–764 (2022). https://doi.org/10.1007/978-981-16-9447-9_
56
11. Reddy, D.K.K., Behera, H.S., Pratyusha, G.M.S., Karri, R.: Ensemble Bagging Approach for
IoT Sensor Based Anomaly Detection, pp. 647–665 (2021). https://doi.org/10.1007/978-981-
15-8439-8_52
12. Liu, Z., et al.: Self-paced Ensemble for Highly Imbalanced Massive Data Classification. In:
2020 IEEE 36th International Conference on Data Engineering (ICDE), IEEE, Apr. 2020,
pp. 841–852. https://doi.org/10.1109/ICDE48307.2020.00078
13. Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE
Trans. Syst. Man Cybern. B Cybern.Cybern. B Cybern. 39(2), 539–550 (2009). https://doi.org/
10.1109/TSMCB.2008.2007853
14. Chen, C., Liaw, A.: Using Random Forest to Learn Imbalanced Data
15. Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: RUSBoost: a hybrid approach
to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. Part A Syst. Humans 40(1),
185–197 (2010). https://doi.org/10.1109/TSMCA.2009.2029559
16. Jamal, M.H., et al.: Multi-step attack detection in industrial networks using a hybrid deep
learning architecture. Math. Biosci. Eng.Biosci. Eng. 20(8), 13824–13848 (2023). https://doi.
org/10.3934/mbe.2023615
17. Dalal, S., et al.: Extremely boosted neural network for more accurate multi-stage Cyber-attack
prediction in cloud computing environment. J. Cloud Comput. 12(1) (2023). https://doi.org/
10.1186/s13677-022-00356-9
18. Udas, P.B., Roy, K.S., Karim, M.E., Azmat Ullah, S.M.: Attention-based RNN architecture for
detecting multi-step cyber-attack using PSO metaheuristic. In: 3rd International Conference
on Electrical, Computer and Communication Engineering, ECCE 2023 (2023). https://doi.org/
10.1109/ECCE57851.2023.10101590
19. Alheeti, K.M.A., Alzahrani, A., Jasim, O.H., Al-Dosary, D., Ahmed, H.M., Al-Ani, M.S.:
Intelligent detection system for multi-step cyber-attack based on machine learning. In:
Proceedings—International Conference on Developments in eSystems Engineering, DeSE,
vol. 2023-Janua, pp. 510–514 (2023). https://doi.org/10.1109/DeSE58274.2023.10100226
20. Almseidin, M., Al-Sawwa, J., Alkasassbeh, M.: Generating a benchmark cyber multi-step
attacks dataset for intrusion detection. J. Intell. Fuzzy Syst. 43(3), 3679–3694 (2022). https://
doi.org/10.3233/JIFS-213247
Chapter 9
Application of Deep Learning in Medical
Cyber-Physical Systems
Abstract The integration of IoT devices to healthcare sector has enabled remote
monitoring of patient data and delivery of suitable diagnostics whenever required.
Because of the rapid advancement in embedded software and network connectivity,
Cyber physical systems (CPS) have been widely used in the medical industry to
provide top-notch patient care in a variety of clinical scenarios because of the quick
advancements in embedded software and network connectivity. Due to the hetero-
geneity of the medical devices used in these systems, there is a requirement for
providing efficient security solutions for these intricate environments. Any alter-
ation to the data could have an effect on the patient’s care, which may lead to acci-
dental deaths in an emergency. Deep learning has the potential to offer an efficient
solution for intrusion detection because of the high dimensionality and conspicuous
dynamicity of the data involved in such systems. Therefore, in this study, a deep
learning-assisted Attack Detection Framework has been suggested for safely trans-
ferring healthcare data in medical cyber physical systems. Additionally, the efficacy
of the suggested framework in comparison to various cutting-edge machine and
ensemble learning techniques has been assessed on healthcare dataset consisting of
sixteen thousand records of normal and attack data and the experimental findings
indicate that the suggested framework offers promising outcomes when compared
with the state-of-the-art machine learning and ensemble learning approaches.
H. Swapnarekha (B)
Department of Information Technology, Aditya Institute of Technology and Management,
Tekkali 532201, India
e-mail: [email protected]
Y. Manchala
Department of Information Technology, Vardhaman College of Engineering, (Autonomous),
Hyderabad, Telangana 501218, India
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 245
J. Nayak et al. (eds.), Machine Learning for Cyber Physical System: Advances
and Challenges, Intelligent Systems Reference Library 60,
https://doi.org/10.1007/978-3-031-54038-7_9
246 H. Swapnarekha and Y. Manchala
9.1 Introduction
and classify network attacks with an accuracy of 99.99% and 94.75% respectively.
A deep neural network (DNN) model consisting of one input layer, three hidden
layers and one output layer has been suggested by Tang et al. [20] for accomplishing
flow-based anomaly detection. The NSL-KDD dataset was used to validate the DNN
model, and the results show that this ML technique is superior to other ones at accu-
rately detecting zero-day assaults. Li et al. [21] have proposed an enhanced DNN
model known as HashTran-DNN for the classification of Android malware. In order
to preserve locality features, the input samples are transformed using hash function.
To enhance the he performance of the system, denoising task has been carried by
HashTran-DNN by utilizing auto encoder that attains locality information in poten-
tial space. From the empirical outcomes, it is observed that HashTran-DNN can
detect four distinct attacks more effectively when compared with the standard DNN.
For efficient and reliable online monitoring of AGVs (automated guided vehicles)
against cyber-attacks, an integrated IoT framework that makes use of DNN with ReLu
was suggested by Elsisi et al. [22]. The developed framework along with distinct
deep learning and machine learning approaches namely 1D-CNN (one dimensional
convolutional neural network), SVM, decision tree, XGBoost and random forest were
trained and validated on real AGV dataset and various types of cyber-attacks such
as pulse attack, ramp attack, sinusoidal attack and random attack. From the empir-
ical findings, it is clear that the suggested integrated IoT framework attained better
detection accuracy of 96.77% when compared with other standard deep learning and
machine learning approaches.
Presently, deep neural networks are the basis for many contemporary artificial
intelligence applications because of their superior performance in various applica-
tion domains over the traditional machine learning approaches. The DNN model is
capable of learning series of hidden patterns hierarchically as it comprises of set of
stacked layers. Moreover, DNNs offer superior performance over various machine
learning approaches as they are capable of extracting high-level features with fewer
parameters. Keeping in view all these aspects, a deep neural network approach has
been developed in this study for the classification of cyber-attacks in medical cyber
physical systems. The following are the major contributions of this study.
1. An intelligent security framework based on deep neural network has been
developed for the detection of cyber-attacks in healthcare sector.
2. The suggested framework has been validated using WUSTL -EHMS 2020 dataset
that consist of network traffic indicators collected from the patient’s biometric
data.
3. Further, the performance of the suggested framework along with various tradi-
tional machine learning and ensemble learning approaches has been validated
using various performance metrics to show the efficacy of the suggested approach.
The chapter remaining sections are structured as follows. Section 9.2 outlines the
study of the literature on machine learning techniques for detecting cyberattacks in
the healthcare industry, as well as their shortcomings. Methodology of the proposed
approach has been represented in Sect. 9.3. The environmental setup and dataset
9 Application of Deep Learning in Medical Cyber-Physical Systems 249
description has been described in Sect. 9.4 and evaluation metrics and compara-
tive analysis of proposed DNN model along with other considered model has been
described in Sect. 9.5. Finally, the conclusion and future scope of work has been
represented in Sect. 9.6.
The recent advances in the field of machine learning have attracted several researchers
to carry out their research work in the detection of attacks in medical cyber physical
systems. This section describes some of the recent research endeavors undertaken
for the detection of cyber attacks in the MCPS.
To protect patient data in healthcare networks, AlZubi et al. [5] have presented
the CML-AD framework (cognitive machine learning attack detection). The patient-
centric design-based plan that has been suggested minimises the local load resulting
from the numerical outcomes while simultaneously guaranteeing the security of
patient data in MCPs. Further, the empirical outcomes also indicate that the suggested
approach has attained 96.5%, 98.2%, 97.8% of prediction ratio, accuracy ratio and
efficiency ratio respectively when compared with other existing approaches.
Schneble et al. [23] have proposed a unique paradigm based on ML technique
for intrusion detection in healthcare cyber physical system. To reduce the computa-
tion and communication associated with solutions based on conventional machine
learning approaches, authors have explored the conception of federated learning in
the suggested framework. Then the suggested framework has been evaluated on real-
time patient dataset for determining the security attacks and the empirical outcomes
indicates that the suggested framework not only detects security attacks with an
accuracy of 99% but also minimizes the communication overhead.
A novel real-time healthcare system based on ensemble classifier for detection of
cyber-attacks has been suggested by Kumar and Bharathi [24]. Initially, the authors
have utilized greedy routing technique for the creation and placement of sensor node
and an agglomerative mean shift maximization clustering approach for the normal-
ization and grouping of transmitted data. A feature extraction process that makes
use of multi-heuristic cyber ant optimization approach is used for the extraction of
abnormal features from health data. Then, the suggested framework makes use of
XGboost classifier for the detection of security attacks. Ultimately, the findings of the
experiment demonstrate that the suggested framework performs better in identifying
cyberattacks within the healthcare system.
An inventive security framework based on machine learning approach has been
developed by Sundas et al. [25], for the identification of harmful attacks in smart
healthcare system. The suggested system observes and compares the vitals of various
devices connected to the smart health system in order to differentiate the normal
activity from abnormal activity. Moreover, the framework utilizes distinct machine
learning approaches such as Random Forest, Artificial Neural Network, K-nearest
neighbor and decision tree for the identification of harmful attacks in healthcare
250 H. Swapnarekha and Y. Manchala
systems. Further, the suggested framework has been trained on twelve harmless
occurrences collected from eight distinct smart medical devices and the empirical
results indicate that suggested frameworks is reliable with success rate and F1-score
of 91% and 90% respectively.
Tauqeer et al. [26] have developed a unique method for the identification of cyber-
attacks in an IoMT environment that combines three machine learning techniques:
SVM, Random Forest, and GBoost. The network and biometric feature-rich WUSTL
EHMS 2020 dataset has been used to assess the proposed methodology. To improve
the system’s performance, preprocessing methods including feature selection and
cleaning were first performed to the dataset. With an accuracy of 95.85%, 96.9%,
and 96.5%, respectively, the suggested techniques GBoost, SVM and random forest
achieved greater performance, according to the empirical results. Table 9.1 lists the
numerous studies that have been done on applying machine learning techniques to
identify cyberattacks in the healthcare system.
This section describes about the mathematical background and structure of the
proposed Deep Neural network model.
Deep neural networks are formed by the combination of feedforward neural networks
that does not contain the feedback connection. The three significant layers such as
input, hidden and output layers are the basic components of the feedforward neural
network. The architectural layout of the deep neural network is illustrated in Fig. 9.1.
The preprocessed data is fed into the network through the input layer. The amount
of input features that the network receives is equal to the number of neurons in the
input layer. Equation (9.1) illustrates how the input layer with “N” input features is
represented.
X = [x1 , x2 , . . . , x N ] (9.1)
DNN’s can have more than one hidden layer. Each of the hidden layers contains
units with weights that are used for performing activation processes of the units
obtained from the previous layer. The mathematical expression described in Eq. (9.2)
represents the mapping function of the neuron in the hidden layer.
( )
h(x) = f x t w + b (9.2)
Table 9.1 Various works on the detection of cyber attacks using ML approaches
Author and Objective Dataset Approach Results Observations References
year
Gupta et al. Intrusion detection Wustl-ehms-2020 Random Forest with Grid Accuracy = 94.23%, The dataset considers [27]
(2022) in IoMT network Search F1 score = 93.8% only two types of
attacks such as data
alteration and data
spoofing
Kumar et al. For detection of ToN-IoT dataset An ensemble approach that Attained an accuracy False Alarm rate is [28]
(2021) cyber attacks in makes use of decision tree, of 96.35% very high
IoMT network random forest and naïve bayes
at first level and XGBoost at
next level
Zachos et al. To detect malicious TON_IoT Naïve Bayes, random forest, Decision tree, KNN Not considered [29]
(2021) attacks in IoMT decision tree, linear regression, and random forest computational
network SVM and KNN performed better when overhead on gateway
compared with other and sensors
approaches
Hady et al. For detection of Real time dataset Random Forest, ANN, SVM ANN attained better Dataset is imbalance [30]
9 Application of Deep Learning in Medical Cyber-Physical Systems
In the above Eq. (9.1), h, f, x, w and b are used for representing hidden layer,
activation function, input vector, weight vector and bias. Generally, sigmoid, rectified
linear unit and hyperbolic tangent function are the typical activation functions used in
neural network. As ReLu activation function minimizes vanishing gradient descent
problem, it offers better results despite of non-linearity and non-differentiability at
zero value when compared with other activation function. Therefore, in the proposed
architecture, ReLu is the activation function used at the hidden for obtaining smooth
approximation as shown in Eq. (9.3).
The sigmoid activation function, as demonstrated in Eq. (9.4), is used at the output
layer to assign an estimated label to the input data that flows through the network.
1
sigmoid(x) = (9.4)
e−x +1
9 Application of Deep Learning in Medical Cyber-Physical Systems 253
The inputs from hidden layer are processed at the output layer through the
activation function and produces the outputs of the deep neural network which is
represented as shown in Eq. (9.5).
eX j
sigmoid(X ) j = ∑k (9.5)
Xk
k=1 e
where vector of inputs transferred to the output layer is represented as ‘X’ and the
number of output units is represented by ‘j’ and j = 1, 2, . . . , k.
The network training with huge dataset is carried out with the above-mentioned
DNN setup using the inputs at the input layer to produce respective class output.
Further, the weight of each input neuron is iteratively modified in order to reduce the
errors occurred at the training phase.
The hyperparameter that impacts the training of the deep neural network is learning
rate. Hence, there is need to adopt efficient neural network architecture and parame-
ters to curtail the errors occurred during the training phase. These hyperparameters
have direct impact on the performance of the network architecture. In this study,
Adam optimizer has been chosen which optimized the hyperparameter using first
and second moment estimates of the gradients [32]. The primitive functionality of
the Adam optimizer has been depicted in Fig. 9.2.
In the above Fig. 9.2, f (θ ), α, β1 , β2 , θt , λ represents the objective function, step
size, exponential decay rates, convergence parameter and tolerance parameter respec-
tively. The equations for updating and calculating time step, gradient descent, first and
second moment estimates, unbiased first and second moment estimates and objective
function parameters of the Adam optimizer are represented in Eq. (9.6) to Eq. (9.12)
t ←t +1 (9.6)
gt → ∇θ f t (θt − 1) (9.7)
m t ← β1 · m t−1 + (1 − β1 ) · gt (9.8)
∧ ( )
m t ← m t / 1 − β1t (9.10)
∧ ( )
vt ← vt / 1 − β2t (9.11)
∧
/( )∧
θt ← θt−1 − α · m t / vt + λ (9.12)
9 Application of Deep Learning in Medical Cyber-Physical Systems 255
This section covers the dataset and environmental setup that were utilized in the
experiments utilizing the recommended methodology as well as different machine
learning and ensemble learning techniques.
The suggested approach as well as other machine learning and ensemble learning
techniques have been simulated using the following system requirements. The envi-
ronmental setup consists of HP Pavilion × 360 desktop with Windows 10 64-bit
operating system, Intel (R) Core (TM) i7-10510U CPU with a capacity of 2.30 GHz
processor and 16 GB RAM. Further, the experiments are carried using Python soft-
ware. For better analysis of the data, it makes use of Python libraries such as Pandas,
Numpy and Imblearn. The visualization of data has been done using matplotlib
framework. Additionally, it applies ensemble learning and machine learning tech-
niques by utilizing the sklearn and Mlxtend libraries. Figure 9.3 shows the general
framework of the suggested methodology.
In this work, the WUSTL-EHMS-2020 dataset has been used to train and evaluate the
suggested DNN strategy in conjunction with other machine learning and ensemble
learning techniques. The dataset includes biometric information about patients as
well as network flow indicators that were gathered from the real-time enhanced
health monitoring system testbed medical sensors, network, gateway and control
unit with visualization constitutes the basic components of the EHMS testbed. The
data collected from medical sensors connected to patient’s body is transferred to the
gateway. The gateway then transfers the data to server through router or gateway
for visualization purpose. Both the network traffic data and sensor data generated
in the testbed is utilized for the detection of threats. In addition, an attack dataset
was produced by injecting three attacks in the dataset such as spoofing attack, man-
in-the-middle attack and data injection. The ARGUS (Audit Record Generation and
Utilization System) tool was used to gather both network traffic and biometric data
of patient in the form of csv file [33]. The dataset comprises 16,318 samples in total,
of which 14,272 are samples pertaining to regular network records and 2046 samples
are samples of network attacks. A total of 44 features were included in the dataset:
35 of these had to do with network traffic, 8 had to do with the biometric data of
the patients, and 1 was used as a label feature. The parameters such as temperature,
heart rate, pulse rate, systolic blood pressure, diastolic blood pressure, respiration
256 H. Swapnarekha and Y. Manchala
rate, ECG ST segment data and peripheral oxygen saturation are related to biometric
data and the remaining forty-three features are related to network traffic data. The
entire dataset is categorized into two distinct classes namely attack data represented
with “0” and normal data represented with “1”.
9 Application of Deep Learning in Medical Cyber-Physical Systems 257
The several assessment metrics that were utilized to validate the model are shown
in this section. Additionally, this section offers an examination of the outcomes
produced with the suggested DNN strategy in addition to other ML techniques such
as Decision tree (DT), Random Forest (RF) and ensemble learning approaches such
as Adaptive Boost (AdaBoost), Gradient Boost (GBoost) and Categorical Boost
(CatBoost).
T r ue Positi ve + T r ue N egati ve
Accuracy = (9.13)
T otal no. o f samples
258 H. Swapnarekha and Y. Manchala
2 ∗ (r ecall ∗ pr ecision)
F1 − scor e = (9.14)
r ecall + Pr ecision
T r ue Positi ve
Pr ecision = (9.15)
T otal pr edcited Posti ve
T r ue Positi ve
Recall = (9.16)
T otal actual Positi ve
In Eq. (9.15), Total predictive positive is total no. of True positive + False Positive
samples, whereas in Eq. (9.16) Total actual positive is total no. of True Positive +
False Negative samples.
This section compares the proposed DNN model’s performance against that of
existing conventional machine learning and ensemble learning algorithms such as
DT, RF, AdaBoost, GBoost, and CatBoost. Tables 9.2 and 9.3 show the parameters
used in training the suggested DNN technique as well as other approaches.
Furthermore, K-fold cross validation has been used to validate the suggested DNN
model in conjunction with further ML and ensemble techniques. K-1 folds are used
for model training in K-fold validation, while the remaining data is used for model
testing. In a similar manner, the procedure is repeated K times, with the end result
serving as the cross-validation result. The WUSTL-EHMS-2020 dataset is split into
tenfolds for this investigation. Table 9.4 shows the outcomes of the DNN model’s
tenfold cross-validation as well as other alternative methods. Table 9.4 shows that the
suggested DNN model outperformed previous approaches in terms of cross validation
accuracy.
Table 9.5 depicts a comparison of several evaluation metrics such as precision,
recall, F1-score, AUC-ROC, and accuracy utilised in assessing the suggested DNN
and other established techniques. From the Table 9.5, it is noticed that proposed DNN
model surpassed other considered approaches with a precision of 0.9999, recall
of 1.0, F1-score of 1.0, AUC-ROC of 0.9999 and accuracy of 100% respectively.
From all the models, AdaBoost model obtained lowest accuracy of 98.38%. The
other models DT, RF, GBoost, CatBoost obtained an accuracy of 99.42%, 99.35%,
99.07%, 99.98% respectively. Table 9.5 further demonstrates that, in comparison to
machine learning approaches, ensemble approaches performed better. This is mainly
because of combination of multiple models in ensemble approaches which results in
minimizing the variance and bias of the model. Moreover, the proposed DNN model
attained superior performance over ensemble approaches because of their capability
in optimizing features while extracting.
The confusion matrix of the suggested DNN model and other models under consid-
eration is shown in Fig. 9.4a–f. From the Fig. 9.4a, it is observed that in decision
tree out of 4276 network attack samples, 4227 samples were correctly classified as
network attack samples. 49 Samples of normal data were incorrectly classified as
network attack samples. All 4288 samples of normal data are correctly classified. It is
observed from Fig. 9.4b, random forest correctly classified 4223 and 4286 samples of
network attack and normal data. 53 samples of normal data and 2 samples of network
data were incorrectly classified in random forest model. From the confusion matrix
of AdaBoost in Fig. 9.4c it is observed that all samples of network attack data are
Table 9.5 Evaluation metric of the suggested DNN and other considered models
Classification model Precision Recall F1 score ROC-AUC Accuracy (%)
DT 0.9887 1.0 0.9943 0.9942 99.42
RF 0.9877 0.9995 0.9936 0.9935 99.35
AdaBoost 1.0 0.9678 0.9836 0.9839 98.38
GBoost 1.0 0.9815 0.9907 0.9907 99.07
CatBoost 0.9997 1.0 0.9998 0.9998 99.98
Proposed DNN 0.9999 1.0 1.0 0.9999 100.0
correctly classified. Out of 4288 normal samples, only 4150 samples were correctly
classified as normal data samples. 138 samples of network attack data are incorrectly
classified as normal data samples. From Fig. 9.4d, It is noticed that GBoost was able to
classify correctly all samples of network attack data. Whereas only 4209 samples of
normal data out of 4288 samples were correctly classified and 79 samples of network
attack data were incorrectly classified as normal data samples. From Fig. 9.4e, it is
noticed that 4275 samples of network attack data and 4288 samples of normal data
are correctly classified. Only one sample from network attack data was incorrectly
classified as normal data sample. Finally, the confusion matrix of proposed DNN
model in Fig. 9.4f represents that all 4276 and 4288 samples of network attack data
and normal samples were correctly classified as network attack and normal data
samples.
The AUC-ROC curve results for the DT, RF, AdaBoost, GBoost, CatBoost, and
suggested DNN model are shown in Figs. 9.5, 9.6, 9.7, 9.8, 9.9 and 9.10. Figure 9.10
shows that, in comparison to other traditional methods, the suggested DNN model
achieved an AUC-ROC of 1.00 for both network attack and normal data class labels.
Additionally, the suggested DNN approach’s macro- and micro-average ROC curve
values were both identical to 1.00, indicating that every occurrence was correctly
identified. This suggests that, in comparison to other approaches, the recommended
method can examine every case in the data.
9 Application of Deep Learning in Medical Cyber-Physical Systems 261
(a) (b)
(c) (d)
(e) (f)
Fig. 9.4 Confusion matrix of a DT, b RF, c AdaBoost, d GBoost, e CatBoost, f proposed DNN
262 H. Swapnarekha and Y. Manchala
9.6 Conclusion
From the past few decades, the cost of healthcare services has been tremendously
increased due to the advancement in technology as well as growing population all
over the world. In addition, the rapid advancement in IoT technology has lead to the
monitoring and diagnosing of patients from remote locations. The integration of IoT
264 H. Swapnarekha and Y. Manchala
References
1. Murguia, C., van de Wouw, N., Ruths, J.: Reachable sets of hidden cps sensor attacks: analysis
and synthesis tools. In: IFAC-PapersOnLine 50.1, pp. 2088–2094 (2017)
2. Jha, A.V., et al.: Smart grid cyber-physical systems: Communication technologies, standards
and challenges. Wirel. Netw. 27, 2595–2613 (2021)
3. Habibzadeh, H., et al.: A survey on cybersecurity, data privacy, and policy issues in cyber-
physical system deployments in smart cities. Sustain. Cities Soc. 50, 101660 (2019)
4. Atat, R., et al.: Enabling cyber-physical communication in 5G cellular networks: challenges,
spatial spectrum sensing, and cyber-security. IET Cyber Phys. Syst. Theory Appl. 2(1), 49–54
(2017)
5. AlZubi, A.A., Al-Maitah, M., Alarifi, A.: Cyber-attack detection in healthcare using cyber-
physical system and machine learning techniques. Soft Comput. 25(18), 12319–12332 (2021)
6. Ahmed, A.A., Nazzal, M.A., Darras, B.M.: Cyber-physical systems as an enabler of circular
economy to achieve sustainable development goals: a comprehensive review. Int. J. Precis.
Eng. Manuf. Green Technol. 1–21 (2021)
7. Rajawat, A.S., et al.: Cyber physical system fraud analysis by mobile robot. Machine Learning
for Robotics Applications, pp. 47–61 (2021)
8. Haque, S.A., Aziz, S.M., Rahman, M.: Review of cyber-physical system in healthcare. Int. J.
Distr. Sensor Netw. 10(4), 217415 (2014)
9. Dey, N., et al.: Medical cyber-physical systems: a survey. J. Med. Syst. 42, 1–13 (2018)
10. Sliwa, J.: Assessing complex evolving cyber-physical systems (case study: Smart medical
devices). Int. J. High Perform. Comput. Netw. 13(3), 294–303 (2019)
11. Nagarhalli, T.P., Vaze, v., Rana, n.k.: Impact of machine learning in natural language processing:
a review. In: 2021 Third International Conference on Intelligent Communication Technologies
and Virtual Mobile Networks (ICICV). IEEE (2021)
12. Nahid, A.Al, Kong, Y.: Involvement of machine learning for breast cancer image classification:
a survey. Comput. Math. Meth. Med. (2017)
9 Application of Deep Learning in Medical Cyber-Physical Systems 265
13. Vashisht, V., Pandey, A.K., Yadav, S.P.: Speech recognition using machine learning. IEIE Trans.
Smart Process. Comput. 10(3), 233–239 (2021)
14. Singh, J., Singh, J.: A survey on machine learning-based malware detection in executable files.
J. Syst. Architect. 112, 101861 (2021)
15. Alzahrani, A., et al.: Improved wireless medical cyber-physical system (IWMCPS) based on
machine learning. Healthcare 11(3). MDPI (2023)
16. Kilincer, I.F., et al.: Automated detection of cybersecurity attacks in healthcare systems with
recursive feature elimination and multilayer perceptron optimization. Biocybernet. Biomed.
Eng. 43(1), 30–41 (2023)
17. Halman, L.M., Alenazi, M.J.F.: MCAD: a machine learning based cyberattacks detector in
software-defined networking (SDN) for healthcare systems. IEEE Access (2023)
18. Maithem, M., Al-Sultany, G.A.: Network intrusion detection system using deep neural
networks. J. Phys. Conf. Ser. 1804(1). IOP Publishing (2021)
19. Cil, A.E., Yildiz, K., Buldu, A.: Detection of DDoS attacks with feed forward based deep neural
network model. Expert Syst. Appl. 169, 114520 (2021)
20. Tang, T.A., et al.: Deep learning approach for network intrusion detection in software
defined networking. In: 2016 International Conference on Wireless Networks and Mobile
Communications (WINCOM). IEEE (2016)
21. Li, D., et al.: Hashtran-dnn: a framework for enhancing robustness of deep neural networks
against adversarial malware samples (2018). arXiv:1809.06498
22. Elsisi, M., Tran, M.-Q.: Development of an IoT architecture based on a deep neural network
against cyber attacks for automated guided vehicles. Sensors 21(24), 8467 (2021)
23. Schneble, W., Thamilarasu, G.: Attack detection using federated learning in medical cyber-
physical systems. In: Proceedings of 28th International Conference on Computing Communi-
cation Networks (ICCCN), vol. 29 (2019)
24. Kumar, C.N.S.V.: A real time health care cyber attack detection using ensemble classifier.
Comput. Electr. Eng. 101, 108043 (2022)
25. Sundas, A., et al.: HealthGuard: an intelligent healthcare system security framework based on
machine learning. Sustainability 14(19), 11934 (2022)
26. Tauqeer, H., et al.: Cyberattacks detection in IoMT using machine learning techniques. J.
Comput. Biomed. Informatics 4(01), 13–20 (2022)
27. Gupta, K., et al.: A tree classifier based network intrusion detection model for Internet of
Medical Things. Comput. Electr. Eng. 102, 108158 (2022)
28. Kumar, P., Gupta, G.P., Tripathi, R.: An ensemble learning and fog-cloud architecture-driven
cyber-attack detection framework for IoMT networks. Comput. Commun. 166, 110–124 (2021)
29. Zachos, G., et al.: An anomaly-based intrusion detection system for internet of medical things
networks. Electronics 10(21), 2562 (2021)
30. Hady, A.A., et al.: Intrusion detection system for healthcare systems using medical and network
data: a comparison study. IEEE Access 8, 106576–106584 (2020)
31. Saba, T.: Intrusion detection in smart city hospitals using ensemble classifiers. In: 2020 13th
International Conference on Developments in eSystems Engineering (DeSE). IEEE (2020)
32. Yazan, E., Fatih Talu, M.: Comparison of the stochastic gradient descent based optimiza-
tion techniques. In: 2017 International Artificial Intelligence and Data Processing Symposium
(IDAP). IEEE (2017)
33. Argus. https://openargus.org. Accessed 14 Nov 2023
34. Priddy, K.L., Keller, P.E.: Artificial Neural Networks: An Introduction, vol. 68. SPIE Press
(2005). https://doi.org/10.1117/3.633187
Chapter 10
Risk Assessment and Security
of Industrial Internet of Things Network
Using Advance Machine Learning
Geetanjali Bhoi, Rajat Kumar Sahu, Etuari Oram, and Noor Zaman Jhanjhi
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 267
J. Nayak et al. (eds.), Machine Learning for Cyber Physical System: Advances
and Challenges, Intelligent Systems Reference Library 60,
https://doi.org/10.1007/978-3-031-54038-7_10
268 G. Bhoi et al.
10.1 Introduction
The Industrial Internet of Things (IIoT) is often described as a revolution that is funda-
mentally changing how business is conducted. However, it is actually a progression
that began more than 15 years ago with technology and features created by forward-
thinking automation vendors. The full potential of the IIoT may not be realized for
another 15 years as global standards continue to develop. The changes to the industry
during this time will be significant, but the good news is that machine builders can
now maximize their returns by combining new IIoT technologies with their current
investments in people, end-users, and technology. As part of the Internet of Things
(IoT), the IIoT is also known as Industry 4.0. According to current estimates, indus-
trial IoT will continue to rise exponentially. As we approach a world with more than
75 billion connected devices by 2025, about a third will be used in manufacturing-
related industrial applications. By connecting industrial machines and devices, manu-
facturing and industrial processes can be improved using the IIoT. Data analytics can
be achieved by monitoring, collecting, exchanging, and analyzing large amounts of
data using IIoT applications. In turn, companies will be able to make more informed,
data-driven decisions in their business operations. While IoT and IIoT share similar
basic principles, their purposes differ. IoT is about connecting physical objects to the
internet, such as smart devices, vehicles, home appliances, and more. Agriculture,
transportation, manufacturing, gas and oil, and other businesses are using the IIoT to
connect devices and machines. Among the IIoT devices in this network are sensors,
controllers, industrial control systems, and other connected devices used for moni-
toring productivity and assessing machine performance. The combination of edge
computing and actionable insights from analytics allows machines to do autonomous
or semi-autonomous activities without the need for human intervention at a speed
that is unimaginably faster than humans.
Today, industry is experiencing a number of technology trends driven by the IIoT.
When this technology gains momentum, a whole new industry will be created. As a
result, industries worldwide will be able to benefit from a data-driven, digital world
in the future. The widespread embrace of the IIoT is anticipated to surge considerably
with the expanding count of interconnected devices. A major goal of the IIoT is to
provide real-time information about processes and efficiency through the connection
of devices and machines. IIoT devices connected to sensors collect and store a large
amount of data. A business can then make data-driven decisions with the help of this
data, which is then transformed into actionable insights.
Industrial IoT includes sensor-driven computing, data analytics, and intelligent
machine applications with the goals of scalability, efficiency, and interoperability.
The integration of this technology allows for automation of critical infrastructure,
which increases business efficiency [1]. Even with improvements in productivity,
there are still issues that need to be resolved, chief among them being the critical
security of industrial infrastructure and its elements. Cyberattacks on vital industries
and infrastructure are becoming more frequent, which presents a serious risk and can
result in large losses. As such, it is critical to learn from these events and recognize
10 Risk Assessment and Security of Industrial Internet of Things Network … 269
that industries are becoming easy targets for cybercriminals. It becomes imperative
that IIoT security issues be resolved. The confluence of information technology
(IT) with operational technology (OT) is a common definition of IIoT. Whereas OT
deals with the plant network where production takes place, IT handles the enterprise
network. To avoid security breaches in IIoT infrastructure, these two components
have different security requirements that need to be carefully taken into account.
A common paradigm in the field of information technology security is the
client-server model, in which protocols such as Transmission Control Protocol
(TCP), Internet Protocol (IP), Hypertext Transfer Protocol (HTTP), or User Data-
gram Protocol (UDP) are used to facilitate communication between these entities.
Successful attacks in this domain typically result in financial or reputational damage,
with safety threats being a rare occurrence [2]. On the contrary, OT systems were
initially designed to ensure the safe and reliable operation of industrial processes. In
contrast to IT systems, security considerations were not initially integrated into the
conception of OT components and subsystems. To counter this, security measures
for OT include isolating OT networks and implementing physical security measures.
However, these security controls exhibit unreliability due to inherent loopholes
that can be exploited for attacks. While isolating OT networks may serve to thwart
external network-based attacks, it proves inadequate in preventing threats originating
within the network itself. In an isolated network, the deployment of malware becomes
a potent strategy for compromising the system. Consequently, there is a pressing
need to delve into the examination of potential attacks at different levels of the IIoT
architecture.
Cyber attackers engage in the theft or destruction of information within computers
and network structures, fuelling the landscape of cyberwarfare. Various attack types,
such as data theft, botnet, Man-in-the-Middle (MitM), network scanning, port-sweep,
port-scan attacks, and address-sweep contribute to the vulnerability of systems. The
diverse array of IoT devices introduces the risk of unsecured connections, both
Machine-to-People (M2P) and Machine-to-Machine (M2M), providing hackers with
easy access to crucial information. This not only infringes upon privacy and network
space usage but also leads to operational disruptions and financial losses [3]. Impor-
tantly, there have been cyberattacks against industrial IoT systems, as evidenced
by the well-known attack on the Ukrainian power grid in 2015. In one instance,
nearly 230,000 subscribers’ electricity was interrupted due to cybercriminals gaining
remote access to the control unit [4]. Similarly, an attack that targeted a Taiwanese
chip factory with an IIoT network in 2018 caused damages estimated to be at USD
170 million [5] in losses of over USD 170 million [6]. Even with the inevitable
trend toward greater reliance on automation and digitalization, industrial enterprises
are actively looking for improved ways to fortify their IIoT networks. The financial
consequences are significant: IIoT firms who do not put in place appropriate mitiga-
tion techniques against cyber-attacks on their networks could end up spending up to
USD 90 trillion by 2030 [7].
An Intrusion Detection System (IDS) is essential for protecting the privacy and
information integrity of transmitted data and for strengthening the security of the
IIoT network. Its main goal is to automatically identify, record, address, and stop
270 G. Bhoi et al.
any malevolent or intrusive activity that could compromise the security of an IIoT
network [8]. An intrusion detection system’s efficacy is determined by how well it
can identify attacks with a high degree of precision and a low false-positive rate [3].
Additionally, an IDS should excel in identifying the initiation of probing activities
by hackers, a vital step in establishing a secure IIoT environment [9].
Smart data generation is a new factor in a synthetic net of outcomes IIoT. Utmost
diligence is to try and automate the system of developing and producing products.
Using Machine Learning (ML) in Industry 4.0 is a crucial factor to take advantage
of the IIoT. Due to the added scale of stationed outstation IoTs in synthetic, the IIoT
becomes miscellaneous, distinctive, and stoutly changeable. An IIoT device typi-
cally consists of information processing technology, a phase of wise manipulation,
and a community conversation generator. ML uses cunning techniques to virtually
attach the real global. Many businesses use machine learning methods and algo-
rithms to save operating and product costs. The use of ML to identify and detect
malicious activity within target networks has gained more attention in recent times.
This technology is particularly appealing for next-generation IoT networks due to its
ability to strike a balance between performance efficiency and computational costs.
Researchers have made significant strides in developing advanced IDS methods lever-
aging ML techniques, resulting in promising outcomes in terms of attack detection
performance [10]. However, a significant challenge linked to current IDS datasets
lies in their considerable size, encompassing both the quantity of network traces
and the dimensions of the feature space. Additionally, an uneven distribution of
sample numbers for each sort of assault plagues the IDS dataset. This imbalance
has posed a barrier for previous ML or deep learning (DL) models, hindering their
ability to attain high performance in detecting specific attack types. In this chapter,
an EL based model is designed to detect anomalies in IIoT network. Here, gradient
boosted decision tree is used with its optimized hyperparameters using gravitational
search algorithm. The remaining contents are organized in to following sections:
literature survey presented in Sects. 10.2 and 10.3 presents methodology used in the
proposed model followed by Result analysis and Conclusion in Sects. 10.4 and 10.5
respectively.
Gao et al. study [11] delves into noncoherent maximum likelihood detection in large-
scale SIMO systems for industrial IoT communication. Their proposed scheme
focuses on optimizing power consumption and reducing latency, resulting in an
energy-efficient and low-latency method for massive SIMO using noncoherent ML
detection. Through simulations, the authors prove that their proposal surpasses
existing methods in terms of energy efficiency and latency. This study is a crucial
10 Risk Assessment and Security of Industrial Internet of Things Network … 271
networks. By combining MOO with ML, the authors optimize the network slicing
process and enhance performance. Experimental evaluations of the method on real-
world scenarios demonstrate its superiority over traditional techniques. The study
underscores the potential of their approach to improve IIoT network slicing, boost
network efficiency, and provide insights for system designers and developers. The
research presents a significant contribution to the field of IIoT network slicing. In
their research, Marino and his team [18] propose a distributed system for detecting
faults in IIoT using machine learning techniques. The system uses data-driven models
generated from sensor readings to achieve scalable fault detection quality. The team
demonstrated the effectiveness of their approach in detecting faults in an industrial
pump system, achieving high detection rates while minimizing false positives. The
study emphasizes the potential of machine learning-based fault diagnosis systems
for the industrial IoT industry.
In their study, Taheri and colleagues [19] present a federated malware detection
architecture, FED-IIoT, designed for Industrial IoT (IIoT) systems. The proposed
architecture operates on a collaborative model that permits sharing and processing of
data across multiple IIoT networks while guaranteeing data privacy and security. The
authors assess the effectiveness of their approach using real-world datasets, show-
casing its superiority over existing centralized and distributed detection approaches
in terms of accuracy and detection rates. The research emphasizes the significance
of a robust and secure federated approach for malware detection in IIoT systems.
In their work, Yazdinejad et al. [20] put forward an ensemble deep learning model
designed to detect cyber threats within the IIoT framework. The authors empha-
sized that IIoT systems are highly vulnerable to malicious attacks, with poten-
tially catastrophic consequences. In tackling this challenge, the suggested model
employs a blend of a CNN (Convolutional Neural Network), a RNN (Recurrent
Neural Network), and a LSTM (Long Short-Term Memory) network to detect and
highlight suspicious activities. The model underwent evaluation using a publicly
accessible dataset, demonstrating superior performance compared to conventional
machine learning methods in both accuracy and speed. Le et al. [21] explored the
application of the XGBoost algorithm to enhance the accuracy of IDS in the context
of IIoT, specifically in scenarios involving imbalanced multiclass classification. The
authors argued that detecting cyber-attacks on IIoT systems is crucial for maintaining
sustainability and avoiding environmental damage. The XGBoost model was eval-
uated on a publicly available dataset, and it outperformed other ML techniques in
terms of F1-score and overall accuracy. Mohy-Eddine et al. [22] presented an intru-
sion detection model for IIoT systems that leverages ensemble learning techniques.
The authors highlighted the need for robust threat detection mechanisms to protect
IIoT systems from cyber-attacks. Their proposed model combines multiple machine
learning algorithms to detect unusual patterns of behaviour. The model was tested on a
real-world IIoT dataset, and its performance surpassed that of other machine learning
approaches in terms of detecting malicious activity while minimizing false-positive
alerts. Rashid et al. [23] introduced an innovative approach to enhance intrusion
detection in IIoT networks by employing federated learning. The authors argued that
10 Risk Assessment and Security of Industrial Internet of Things Network … 273
10.3 Methodology
i. Construction of initial model (often a single decision tree) from the training
data which is refers as a weak learner.
ii. Making prediction by using initial model on the training data.
iii. Identifying variations between predicted values and actual targets, referred to
as residuals, through the utilization of a differentiable loss function. Calculate
the residuals by subtracting the predicted values from the true values. This
274 G. Bhoi et al.
mass. The position signifies a potential solution to the problem, and the calculation of
gravitational and inertial masses is achieved through a fitness function. Essentially,
each mass encapsulates a solution, directing the algorithm in the iterative process of
refining gravitational and inertial masses. Over time, the expectation is that masses
will converge toward the heaviest mass, representing an optimal solution within the
search space. The fundamental steps of the GSA, can be outlined as follows:
i. Randomly generate a set of initial solutions, where each solution is represented
as a mass in the search space.
ii. Evaluate each solution’ fitness to determine its mass based on a fitness function.
Calculate the gravitational acceleration acting on each mass using the fitness
values.
iii. Utilize the gravitational force between masses to update their positions in the
search space. Adjust the positions of masses according to their masses and the
gravitational forces.
iv. Implement boundary checking to ensure that the updated positions of masses
remain within the defined search space.
v. Recalculate the masses of solutions based on their updated positions and
recompute the gravitational accelerations.
vi. Iteratively perform the steps of updating positions, checking boundaries, and
calculating mass-acceleration until a specified stopping criterion is met, whether
it be reaching a maximum number of iterations or attaining a satisfactory
solution.
vii. The algorithm outputs the best solution found during the iterations as the
optimized solution to the given problem.
In this work, GBDT based model for malicious access detection and its hyperparam-
eter optimization using gravitational search algorithm has been designed (Algorithm
1 to Algorithm 3). The used model GBDT (for malicious attack detection) is affected
by various hyperparameters such as learning rate, maximum depth, number of esti-
mators, and bin sub-sample size. In this section, we have used GSA for finding
optimal hyperparameters combination that produce better prediction performance of
GBDT. In this study, following hyperparameters are considered: maximum depth
(ρ1), learning rate (ρ2), number of estimators (ρ3), and bin sub-sample size (ρ4).
The GSA (Algorithm 1) starts with initial population (θ ) has n number of hyper-
parameter set θ = {θ1 , θ2 ...θn } drawn from hyperparameters space with following
ranges: ρ1i ∈ (1, 16), ρ2i ∈ (0, 1), ρ3i ∈ (1, 31), and ρ4i ∈ (0, 1). The goal is to
explore optimal hyperparameter set θi ∗ = {ρ1i , ρ2i , ρ3i , ρ4i } in the search space.
276 G. Bhoi et al.
f θi − f θwor st
m θi = (10.1)
f θbest − f θwor st
10 Risk Assessment and Security of Industrial Internet of Things Network … 277
Algorithm-2: ( , )
ℎ
INPUTS: = { 1 , 2 , 3 , 4 }: hyperparameter set
Step-1: 0
( )← 1 = ( )
4
Step-2: Create a subsample = ( , ) =0 of size 4 from
For in 3
← − −1
( ) (computed as in Eq.6)
( , ( ))
←[ ]
( )
−1 ( )
′
Store ( , )as instance and create a dataset
[ ]← ( ′
, , 1 ) (Algorithm-3)
= ∪
( ′
Algorithm-3: , 1)
′
=( , ) : Data includes data samples with their associated gradient
=1
= ( ) =1 : ℎ
input data sample with dimension
: Current depth
ℎ ℎ
: Leaf space ={ , } , where is the number of samples in leaf of tree
=1
Step-1: =1
For each in
Find min(MSE)
If < 1
′ ′ ′
Split in to 1 and 2
= +1
( ′
1, 1)
( ′
2, 1)
′
Else, Make as leaf node
Step-4: Return and
Fθi
aθi = (10.5)
Mθi
In Eq. (10.4), Fθi is force applied on mass of θi by the all other mass of θ j , and r
is a random number generated in between 0 and 1. In Eq. (10.5), aθi is acceleration
of θi .
The hyperparameter optimization process (Fig. 10.2) begins by generating a
random population of hyperparameter sets. These sets represent different configura-
tions of hyperparameters for the GSA. Once the population is obtained, GSA param-
eters, such as gravitational constant and population size, are defined to guide the
optimization process. Subsequently, the fitness of each hyperparameter set is calcu-
lated. This involves configuring GBT model based on the values of hyperparameters
within the selected set and training the model using the IIoT dataset. The prediction
score obtained from the trained model serves as the fitness measure. To identify the
best and worst hyperparameter sets in terms of fitness, the algorithm compares the
prediction scores across the entire population. The mass of each hyperparameter set is
then computed, taking into account its individual fitness, as well as the best and worst
fitness values within the population. The gravitational forces between hyperparam-
eter sets are determined by their respective masses. These forces, in turn, influence
the acceleration of each hyperparameter set. Acceleration values are employed to
calculate velocities, leading to the determination of the next updated hyperparameter
set. This iterative process continues until a termination condition is satisfied. If the
condition is met, the optimal hyperparameter set is returned. If not, the algorithm
recalculates fitness, updates masses, and repeats the entire sequence, dynamically
adjusting hyperparameters based on gravitational principles. This comprehensive
approach ensures that the algorithm converges towards an optimal configuration,
refining hyperparameter sets iteratively until the termination condition is met, and
the most effective set is identified.
modifications sourced from protocols and devices within the IIoT network. Addi-
tionally, it includes resources, log, and alert features from various connections and
devices. The simulation encompassed a diverse array of IoT devices, including
controllers, sensors, mobile devices, actuators, edge components, and cloud traffic.
Additionally, the dataset encapsulated the intricate dynamics of connectivity proto-
cols such as WebSocket, CoAP, and MQTT. Notably, various communication patterns
like Machine-to-Machine (M2M), Human-to-Machine (H2M), and Machine-to-
Human (M2H) were integrated, incorporating substantial network traffic and event
scenarios. The model under consideration undergoes testing with a simulated X-IIoT
ID dataset comprising 820,834 instances, categorized into Normal type (421,417
instances) and Anomalous type (399,417 instances). Each instance in this dataset is
characterized by 66 attributes, with the class label represented by the final attribute.
To address limitations related to memory and running time, we opted for stratified
sampling to resample the dataset, resulting in a total of 82,000 instances, before
constructing the model.
models. While comparing all base EL model Bagging approach is found better than
all other compared EL models.
10.5 Conclusion
The increasing uses of IoT in industries have witnessed a new efficiency and connec-
tivity in industrial processes. This revolution in IoT technology has enhanced the
streamline operations and productivity. At the same time, security challenge becomes
a vital issue because of its potential consequences. Security attack to IIoT may
interrupt significant industrial processes, leading to financial losses, and operational
disruption. IIoT infrastructure requires network of interconnected devices to control
and monitor industrial processes. Therefore, ensuring security of IIoT network is
essential to avoid unauthorized access. IoT devices interact with each other and they
generate and share sensitive data on the network. So, security threats are always a big
concern in order to provide safeguard to data confidentiality, and proprietary informa-
tion related to industrial processes. In this work, an ensemble learning based model
is designed to detect anomalies in IIoT network. Here, gradient boosted decision tree
is used with its optimized hyperparameters using gravitational search algorithm. The
real-time deployment of machine learning security solution demands low-latency
284 G. Bhoi et al.
model with faster data processing and analysis without compromising performance.
This requires high computational resources and IoT devices are usually resource-
constrained and energy-constrained. Another challenge in implementing machine
learning based security solution is adapting model to fast changing industrial settings
and operational patterns. The real-time implementation of machine learning solution
for IIoT security becomes challenging due to growing number of interconnected IIoT
devices. Further, the data privacy is also a important issue while analyzing sensitive
IIoT data and its real-time data processing.
Acknowledgements This research is funded by the Department of Science and Technology (DST),
Ministry of Science and Technology, New Delhi, Government of India, under Grant No. DST/
INSPIREFellowship/2019/IF190611.
References
1. Hassanzadeh, A., Modi, S., Mulchandani, S.: Towards effective security control assignment
in the industrial internet of things. In: Internet of Things (WF-IoT), IEEE 2nd World Forum
(2015)
2. Industrial Internet of Things Volume G4: Security Framework,
IIC:PUB:G4:V1.0:PB:20160926
3. Muna, A.H., Moustafa, N., Sitnikova, E.: Identification of malicious activities in Industrial
Internet of Things based on deep learning models. J. Inf. Secur. Appl. 41, 1–11 (2018)
4. Defense Use Case. Analysis of the Cyber Attack on the Ukrainian Power Grid. Electricity
Information Sharing and Analysis Center (E-ISAC) 388 (2015). https://africautc.org/wp-con
tent/uploads/2018/05/E-ISAC_SANS_Ukraine_DUC_5.pdf. Accessed 7 May 2022
5. Alladi, T., Chamola, V., Zeadally, S.: Industrial control systems: cyberattack trends and
countermeasures. Comput. Commun. 155, 1–8 (2020)
6. Sitnikova, E., Foo, E., Vaughn, R.B.: The power of handson exercises in SCADA cybersecurity
education. In: Information Assurance and Security Education and Training. Springer, Berlin/
Heidelberg, Germany, pp. 83–94 (2013)
7. Dash, S., Chakraborty, C., Giri, S.K., Pani, S.K., Frnda, J.: BIFM: big-data driven intelligent
forecasting model for COVID-19. IEEE Access 9, 97505–97517 (2021)
8. Koroniotis, N., Moustafa, N., Sitnikova, E.: A new network forensic framework based on deep
learning for Internet of Things networks: a particle deep framework. Fut. Gener. Comput. Syst.
110, 91–106 (2020)
9. Vaiyapuri, T., Binbusayyis, A.: Application of deep autoencoder as an one-class classifier for
unsupervised network intrusion detection: a comparative evaluation. PeerJComput. Sci. 6, e327
(2020)
10. Garcia-Teodoro, P., Diaz-Verdejo, J., Maciá-Fernández, G., Vázquez, E.: Anomaly-based
network intrusion detection: techniques, systems and challenges. Comput. Secur. 28, 18–28
(2009)
11. Gao, X.-C., et al.: Energy-efficient and low-latency massive SIMO using noncoherent ML
detection for industrial IoT communications. IEEE IoT J 6(4), 6247–6261 (2018)
12. Zolanvari, M., Teixeira, M.A., Jain, R.: Effect of imbalanced datasets on security of indus-
trial IoT using machine learning. In: 2018 IEEE International Conference on Intelligence and
Security Informatics (ISI). IEEE (2018)
13. Zolanvari, M., et al.: Machine learning-based network vulnerability analysis of industrial
Internet of Things. IEEE IoT J 6(4), 6822–6834 (2019)
10 Risk Assessment and Security of Industrial Internet of Things Network … 285
14. Latif, S., et al.: A novel attack detection scheme for the industrial internet of things using a
lightweight random neural network. IEEE Access 8, 89337–89350 (2020)
15. Mudassir, M., et al.: Detection of botnet attacks against industrial IoT systems by multilayer
deep learning approaches. Wirel. Commun. Mobile Comput. (2022)
16. Qolomany, B., et al.: Particle swarm optimized federated learning for industrial IoT and smart
city services. In: GLOBECOM 2020–2020 IEEE Global Communications Conference. IEEE
(2020)
17. Ksentini, A., Jebalia, M., Tabbane, S.: Fog-enabled industrial IoT network slicing model based
on ML-enabled multi-objective optimization. In: 2020 IEEE 29th International Conference on
Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE). IEEE (2020)
18. Marino, R., et al.: A machine-learning-based distributed system for fault diagnosis with scalable
detection quality in industrial IoT. IEEE IoT J 8(6), 4339–4352 (2020)
19. Taheri, R., et al.: FED-IIoT: A robust federated malware detection architecture in industrial
IoT. IEEE Trans. Ind. Informatics 17(12), 8442–8452 (2020)
20. Yazdinejad, A., et al.: An ensemble deep learning model for cyber threat hunting in industrial
internet of things Digital Commun. Netw. 9(1), 101–110 (2023)
21. Le, T.-T.-H., Oktian, Y.E., Kim, H.: XGBoost for imbalanced multiclass classification-based
industrial internet of things intrusion detection systems. Sustainability 14(14), 8707 (2022)
22. Mohy-Eddine, M., et al.: An ensemble learning based intrusion detection model for industrial
IoT security. Big Data Min. Anal. 6(3), 273–287 (2023)
23. Rashid, Md.M., et al.: A federated learning-based approach for improving intrusion detection
in industrial internet of things networks. Network 3(1), 158–179 (2023)
24. Rafiq, H., Aslam, N., Ahmed, U., Lin, J.C.-W.: Mitigating malicious adversaries evasion attacks
in industrial internet of things. IEEE Trans. Industr. Inf. 19(1), 960–968 (2023). https://doi.org/
10.1109/TII.2022.3189046
25. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 1189–
1232 (2001)
26. Rashedi, E., Nezamabadi-Pour, H., Saryazdi, S.: GSA: a gravitational search algorithm. Inf.
Sci. 179(13), 2232–2248 (2009)
27. Al-Hawawreh, M., Sitnikova, E., Aboutorab, N.: X-IIoTID: a connectivity-agnostic and device-
agnostic intrusion data set for industrial internet of things. IEEE Internet Things J. 9, 3962–3977
(2022)
Chapter 11
Machine Learning Based Intelligent
Diagnosis of Brain Tumor: Advances and
Challenges
Abstract One of the fatal diseases that kills a large number of humanity across the
globe is brain tumor. If the brain tumor detection is delayed, then the patient has
to spend a large amount of money as well as to face severe suffering. Therefore,
there is an essential need to detect the brain tumor so that money and life can be
saved. The conventional examination of brain images by doctors does not reveal the
presence of a tumor in a reliable and accurate manner. To overcome these issues,
early and accurate brain tumor identification is of prime importance. A short while
ago, methods employing machine learning (ML) and artificial intelligence (AI) were
utilized to properly diagnose other diseases using test attributes, electrocardiogram
(ECG), electromyography (EMG), Heart Sounds, and other types of signals obtained
from the human body. This chapter presents a complete overview of the detection
of patient-provided brain MR pictures and classifying patients’ brain tumor using
AI and ML approaches. For this pose, brain images obtained from kaggle.com web-
site have been employed for developing various AI and ML classifiers. Through
simulation-based experiments conducted on the AI and ML classifiers, performance
matrices have been obtained and compared. From the analysis of results reported in
the different articles, it is observed that Random Forest exhibit superior detection of
brain tumor. There is still further scope for improving the performances as well as
developing affordable, reliable, and robust AI-based brain tumor classifiers.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 287
J. Nayak et al. (eds.), Machine Learning for Cyber Physical System: Advances
and Challenges, Intelligent Systems Reference Library 60,
https://doi.org/10.1007/978-3-031-54038-7_11
288 S. K. Panda et al.
11.1 Introduction
With thousands of instances discovered each year, brain tumors are a major health
problem on a global scale. Conventional methods to identify brain tumors involve the
use of Computed Tomography(CT) scans, MR images, etc.. Medical professionals
use MR images to diagnose patients and analyze brain tumor. Thus, the MR images
continues to play an important role in brain tumor detection [1]. The accuracy and
time requirements of conventional diagnostic techniques are constrained. For suc-
cessful treatment and better patient outcomes, brain tumor must be identified as early
as possible. The use of computer-aided techniques(CAT) might lead to more accu-
rate brain tumor detection [2]. The internet of medical things (IoMT), ML, and DL
among other technological breakthroughs, provide potential options to improve brain
tumor detection and diagnosis. As a result of their impressive performance in image
analysis and pattern recognition tasks, ML and DL algorithms are excellent options
for detecting brain tumor from MR images and CT scans. Various ML based tech-
niques are constantly advancing to enhance the precision of detection. Identifying
the features is very important in the process of brain tumor detection. For instance,
Ghassemi et al. [3] have proposed a DL technique in which a generative adversarial
network (GAN) is trained on multiple datasets so to make it capable of extracting
strong and required features from the MR images to make brain tumor detection
easier. Along with features extraction, segmentation also plays a very essential role
in brain tumor detection. Many methods have been introduced to improve the seg-
mentation process. In [4], a grab cut method is introduced which helps in accurate
segmentation. It also uses VGG-19 for tuning to extract features. Duan et al. [5] have
discussed the importance of deformable registration. They proposed a tuning-free
3D image registration model which provided very accurate registration. In [6], the
automated segmentation of MR images has been proposed. The approach has also
taken the noisy and inconsistent data into consideration. Such data might lead to
unexpected and inaccurate results. Hence, they must be handled to achieve accurate
results. Brain tumors are categorized as benign and malignant. Malignant tumors are
more harmful as compared to benign tumors. Brain tissue is differentiated using a
hybrid technique [7], according to whether it is normal, has a benign tumor, or has
a malignant tumor. Further brain tumor has two variant one is glioma and another
is meningioma, pituitary tumors, etc based on their position. Anaraki et al. [14] per-
formed multiple case studies to identify the type of brain tumour present. In order
to identify the kind of brain tumour that is present and to do it extremely early on,
they have used a hybrid genetic algorithm (GA) combined with convolutional neural
networks (CNNs). A brain tumor can further have different stages or grades such as
grade I,II,III and IV etc. The rate of recovery greatly depends on the grade of the
tumor. Sultan et al. [16] processed two distinct datasets using DL methods. Their
11 Machine Learning Based Intelligent Diagnosis … 289
objectives were to identify the type of brain tumor in one dataset and the grade of
glioma in another dataset. They achieved very promising results for the two datasets.
Consequently, the employment of DL and ML technologies has a notable positive
effect on the identification and diagnosis of brain tumor. Brain tumor detection and
treatment might be revolutionized by their capacity to accurately and quickly analyze
complicated medical imaging data, which would ultimately improve patient care and
results.
Contribution of the chapter:
• This chapter provides ML based methods for early and accurate detection of brain
tumor using standard brain images obtained from kaggle.com website.
• The chapter presents a generalized ML based method for brain tumor detection.
• Seven different ML methods have been proposed and performance matrices have
been obtained and compared.
• It is demonstrated that in ML categories, standard ML based classifiers shows
improved performance compared to other methods.
The organization of this chapter is represented as below:
The Sect. 11.2 consists of system under study, while the Sect. 11.3 explains the
material and methodology used in the chapter. The Sect. 11.4 provides a detailed
analysis of the results obtained, and the Sects. 11.5 and 11.6 consist of the discussion
and conclusion, respectively.
Numerous methodologies have been put forth to increase the effectiveness and pre-
cision of brain tumor detection. El-Melegy et al. [6] have formulated a new fuzzy
method based on the traditional fuzzy algorithm. It helps in the automatic segmenta-
tion of MR images, by taking into consideration the noisy data as well. As a result, the
performance of the Fuzzy C-means (FCM) method is notably improved. An amal-
gam of GA and the support vector machine (SVM) has been introduced by Kharrat
et al. [7], which is used to classify tumor in brain MR images. The GA model is used
to classify the wavelet’s texture feature, which is provided as an input to the SVM.
In this instance, the accuracy percentage ranges from 94.44 to 98.14. It analyzed the
ML-based back propagation neural networks (MLBPNN) by using an infrared sensor
imaging technique. They used the fractal dimension algorithm (FDA) to extract fea-
tures and the multi-fractal detection (MFD) to select the most essential features. The
data is then transferred to a clinician via a wireless infrared imaging sensor, which
is reported in [8]. The average specificity was 99.8%, while the average sensitivity
was 95.103%. Kanmani et al. [9] proposed an approach for classifying brain tumor
using threshold-based region optimization (TBRO). It helps to overcome the limita-
tions of traditional CAT. It had 96.57% percent accuracy. An attentive residual U-Net
(AResU-Net) is being used to segment the ROI area from two well-known 2-D image
290 S. K. Panda et al.
11.3.1 Dataset
A total of 3757 MR images are utilized in this chapter. Eighty percent of the dataset
is utilised for training, while twenty percent is used for testing. Figure 11.1. shows
brain images with no tumor and Fig. 11.2. shows brain images having a pituitary
tumor. A total of 3004 MR images are included in the training set, while the testing
set holds 751 MR images. Table 11.1 describes the dataset. The link of dataset is
given below:
https://www.kaggle.com/datasets/masoudnickparvar/brain-tumor-mri-dataset.
11.3.3 Preprocessing
Feature extraction entails the conversion of unprocessed picture data into a set of
representative features that effectively record essential details about the underlying
structures. To distinguish between areas of tumor and normal brain tissue, several
characteristics are retrieved from MR images. These features are designed to identify
the distinctive traits of tumor and facilitate precise identification. Instead of directly
providing the MR image data to the model, the required extracted features act as
feature vectors that are provided to the model.Transform domain, statistical, and
technical characteristics are retrieved as three different categories of features.
The average intensity of the image pixels is known as the mean intensity. It is
mathematically represented as given in Eq. (11.1):
∑
I
μ=
. (11.1)
np
where .μ represents the mean intensity, I denotes the intensity of each pixel’s value,
and the sum of the pixels indicated by np.
11 Machine Learning Based Intelligent Diagnosis … 293
where .σ depicts standard deviation, .μ is the mean intensity, I indicate pixel intensity
value, and np represents all of the pixels.
The energy of an image is used to determine its homogeneity. It is mathematically
represented as given in Eq. (11.3):
∑
. En = M(x, y)2 (11.3)
x,y
where En represents the energy, both x, y expresses the intensity values, and M(x, y)
shows the normalized co-occurrence matrix element.
Contrast is a measurement of the intensity difference between a pixel and its
neighbour in an image. It is shown mathematically as given in Eq. (11.4):
∑
Cn =
. |x − y|2 M(x, y) (11.4)
x,y
where Cn represents the contrast, the intensity values are x and y, and M(x, y) is the
normalized co-occurrence matrix element.
The measure of how a pixel is correlated to its neighbor is known as correlation.
It is mathematically represented as given in Eq. (11.5):
∑ (x − μx)(y − μy)M(x, y)
.Cr = (11.5)
x,y
σx σ y
where Cr represents the correlation, x, and y exhibit the intensity values, M(x, y)
denote the normalized co-occurrence matrix element,.μ represents the mean intensity,
and .σx and .σ y indicates the relevant x and y standard deviations.
Homogeneity offers details on the regional variation or coarseness of the texture
of an image area. It is mathematically represented as given in Eq. (11.6):
∑ M(x, y)
. Hm = (11.6)
x,y
1 + |x − y|
where Hm represents the homogeneity, x, and y depict the intensity values, and
M(x, y) represents the normalized co-occurrence matrix element.
The measure of complexity of an image texture is known as entropy. It is mathe-
matically represented as given in Eq. (11.7):
11 Machine Learning Based Intelligent Diagnosis … 295
∑
. Et = − M(x, y)log(M(x, y)) (11.7)
x,y
where Et represents the entropy, x, and y act as the intensity values, and M(x, y)
imitate the normalized co-occurrence matrix element.
The measure of the asymmetry of intensity distribution is known as skewness. It
is mathematically represented as given in Eq. (11.8):
∑
(I −μ)3
np
. γ = (11.8)
σ3
where.γ represents the skewness,.μ represents the mean intensity, I indicate the value
of the intensity of each pixel, np illustrates all pixels, and the standard deviation is
denoted by .σ
The measure of the peak point of the intensity distribution is known as kurtosis.
It is mathematically represented as given in Eq. (11.9):
∑
(I −μ)4
np
.kt = (11.9)
σ4
where kt represents the kurtosis, .μ represents the mean intensity, I indicate the value
of the intensity of each pixel, np represents the full count of pixels, and .σ indicates
the standard deviation.
A total of seven algorithms are used for the simulations study. They are SVM, RF,
AdaBoost, Decision Tree (DT), LDA, ANN and RBF. A detailed description of the
algorithms along with their limitations are described in Table 11.1.
c + (c1 ∗ d1 ) + (c2 ∗ d2 ) = 0
. 0 (11.10)
296 S. K. Panda et al.
where .c1 and .c2 determines the slope of the line, .c0 represents the intercept and .d1
and .d2 represents the input variables.
Random Forest is the finest variant of a combination of decision trees which is also
focused towards classification and regression approach based on the labeled data.
Rather than concentrating on a unified decision tree prediction, it predicts the result
from a series of decision tree for forecasting as an ensemble learning approach.
The architecture of RF is shown in Fig. 11.5. The equation is given in (11.11). The
architecture of RF given in Fig. 11.5 was made by referring to the architecture given
in [19].
. G I = 1 − [( pr + ) + ( pr − ) ]
2 2
(11.11)
where GI represents Gini Index, . pr+ and . pr− represents the probability of positive
and negative class respectively.
11.3.5.3 AdaBoost
Boosting a series of weak learner to generate a strong learner step by step is the policy
of AdaBoost algorithm. The value of the alpha parameter’s is indirectly proportional
to the mistakes of the weak learners. The architecture of AdaBoost is shown in
Fig. 11.6. The equation is given in (11.12).
∑
Y
. h(x) = sg( α y o y (i)) (11.12)
y=1
where h(x) represents the hypothesis function of a value x, sg represents the sign,
α y represents the weight given to the classifier, and .o y (i) exhibit the output of weak
.
classifier y for input i.
A decision tree is a conventional technique used for regression and classification that
focuses on supervised, labeled data. Every leaf or terminal node preserves the labeled
class whereas every branch defines the results of the test. Internal nodes signify
attribute-based tests. The architecture of DT is shown in Fig. 11.7. The equation is
given in (11.13).
. E v = (F po L o ) + (S po L o ) − Cost (11.13)
298 S. K. Panda et al.
where . E v represents the expected value, . F po and . S po represents the first and second
possible outcome respectively, and . L o represents the likelihood of outcome.
One supervised learning approach for classification tasks in machine learning is linear
discriminant analysis (LDA). A linear feature combination that best distinguishes the
classes in a dataset is found using this method. LDA works by projecting the data into
a smaller-dimensional space where the distance between the classes is maximized.
The architecture of LDA is shown in Fig. 11.8. The equation is given in (11.14).
μ1 + μ2 pr (cl1 )
β T (m − (
. )) > −log (11.14)
2 pr (cl2 )
where .β T represents the coefficient vectors, m represents the data vector, .μ1 and .μ2
represent the mean vector, and . pr (cl1 ) and . pr (cl2 ) represent the class probability.
ANN are created using a model of a human brain’s neuronal network. Units, some-
times known as artificial neurons, are found in artificial neural networks. These
components are stacked in a number of layers. The input, hidden, and output layers
comprise the three tiers of this layout. The input layer is where the neural network
gets data from the outside world that it needs to evaluate or learn about. After there-
after, the inputs are processed by one or more hidden layers into information that
11 Machine Learning Based Intelligent Diagnosis … 299
may be utilised by the output layer. The architecture of ANN is shown in Fig. 11.9.
The equation is given in (11.15).
. Z = Bias + y1 i 1 + y2 i 2 + . . . + yn i n (11.15)
where Z represents the sum of bias along with the product of input node and its
weight, y represents the weights of the beta coefficients, The intercept is represented
by Bias, while the independent variables are marked by i.
It is composed of three distinct layers, an input layer, a hidden layer, and an output
layer. Radial basis functions are a unique class of feed-forward neural networks. This
300 S. K. Panda et al.
is fundamentally distinct from the majority of neural network topologies, which have
several layers and produce non-linearity by repeatedly using nonlinear activation
functions. The architecture of RBF is shown in Fig. 11.10. The equation is given in
(11.16).
∑ N
.h(obs) = wtn ∗ ex p(−γ ||obs − obsn ||2 ) (11.16)
n=1
11.3.7.1 Linear
Often referred to as the linear function, the linear activation function is one of the
most basic activation functions utilised in neural networks and other computational
models. The function’s result is the same as its input. The linear activation function
11 Machine Learning Based Intelligent Diagnosis … 301
does not cause the network to become non-linear. Figure 11.11. shows the graphical
representation of the linear activation function. It is mathematically represented as
given in Eq. (11.17):
.φ(z) = z (11.17)
where .φ(z) denotes the output of the activation function for an input z.
Example: Adaline Linear Regression.
The Heaviside step function is a binary activation function that is used in various
applications. It can incorporate a threshold behavior into a neural network when
applied to the output of a neuron or layer, where outputs below a given value are set
to one value and outputs beyond that value are assigned to another. The graphical
depiction of the unit step heaviside term activation function is displayed in Fig. 11.12.
It is mathematically represented as given in Eq. (11.18):
⎧
⎪
⎨0 if z < 0
.φ(z) = 0.5 if z = 0 (11.18)
⎪
⎩
1 if z > 0
where .φ(z) is equal to 0 if input z is less than z0, 0.5 if z is equal to 0, and 1 if z is
greater than 0.
11.3.7.3 Sign(Signum)
The sign activation function generates a binary output that encodes the polarity of the
input (positive or negative). In situations where the magnitude of the input doesn’t
matter and the value’s direction is more concerned than its magnitude, this function
may be helpful. Figure 11.13 shows the graphical representation of the sign activation
function. It is mathematically represented as given in Eq. (11.19):
⎧
⎪
⎨−1 if z < 0
.φ(z) = 0 if z = 0 (11.19)
⎪
⎩
1 if z > 0
11.3.7.5 Logistic(Sigmoid)
1
φ(z) =
. (11.21)
1 + e−z
11 Machine Learning Based Intelligent Diagnosis … 305
The hyperbolic tangent activation, often known as the tanh activation function, is a
mathematical operation that is frequently employed in neural networks and a variety
of machine learning methods. Figure 11.16. shows the graphical representation of
the tanh activation function. It is mathematically represented as given in Eq. (11.22):
e z − e−z
φ(z) =
. (11.22)
e z + e−z
Among all, most often utilized activation functions in contemporary neural networks
is the ReLU activation function. DL models have been successful in large part because
of ReLU. An illustration of the ReLU activation function is presented in Fig. 11.17.
It is mathematically represented as given in Eq. (11.23):
0 if z < 0
φ(z) =
. (11.23)
z if z > u
306 S. K. Panda et al.
where .φ(z) denotes the output of the activation function for an input z.
Example: Multilayer Neural Network, Convolutional Neural Network.
In order to assess a prediction model’s performance and solve overfitting and bias
issues, k-fold cross-validation is a technique that is widely used in machine learning
and statistics. A dataset is divided into K folds, each of a comparable size. Following
that, K iterations of the training and evaluation procedure are carried out, each time
utilizing a new fold as the validation set and the remaining folds as the training set.
Table 11.4 lists the benefits and drawbacks of each kind.
11 Machine Learning Based Intelligent Diagnosis … 307
A reliable and popular method for evaluating the effectiveness of models is 10-fold
cross-validation. The dataset is splitted into ten subsets or folds, and the model goes
through ten iterations of training and testing.
Table 11.5 presents the classifiers’ accuracy, sensitivity, specificity, f1 score, and
precision utilized in the article. Table 11.6 compares the suggested work with other
methods already in use.
The disparity of the f1 score between the categorization models is shown graphi-
cally in Fig. 11.18, and the comparison with other methods already in use is shown
graphically in Fig. 11.19.
11 Machine Learning Based Intelligent Diagnosis … 309
11.5 Discussion
The proposed model involves three steps in total. To enhance the quality of the MR
pictures, pre-processing is first applied to the MR images. After the completion of
pre-processing, important features such as statistical, transform domain, and techni-
cal features were extracted to obtain the desired feature vector. Subsequently, each
feature vector is fed to each of the proposed models, for training and validation pur-
poses.
Each model is trained to a reasonable degree, and then its performance is assessed
and contrasted. Subsequently, the best two models have been identified as RBF, and
RF respectively. To demonstrate the robustness of the developed model, performance
needs to be evaluated using other standard datasets. Further, the potentiality of each
of the proposed models needs to be assessed using imbalanced feature data. The sug-
gested approach and models may be applied to different types of disease recognition
and classification which requires an image dataset as input.
11.6 Conclusion
This chapter presents a set of ML-based classifiers that may be used to identify brain
tumor using standard MR-based image input. Subsequently, requisite features have
been extracted from these raw images and fed to each of these models for achieving
satisfactory training.
In the second stage, each of the developed models has undergone different validation
schemes. The performance of every model that was generated has been evaluated and
11 Machine Learning Based Intelligent Diagnosis … 311
compared in the third stage. It is demonstrated that, among the seven study models,
the suggested RF model has the greatest accuracy (99.20%) and F1 score (99.14%).
References
1. Li, H., Li, A., Wang, M.: A novel end-to-end brain tumor segmentation method using improved
fully convolutional networks. Comput. Biol. Med. 108, 150–160 (2019)
2. Zacharaki, E.I., Wang, S., Chawla, S., Soo Yoo, D., Wolf, R., Melhem, E.R., Davatzikos, C.:
Classification of brain tumor type and grade using MRI texture and shape in a machine learning
scheme. Magn. Reson. Med. Off. J. Int. Soc. Magn. Reson. Med. 62(6), 1609–1618 (2009)
3. Ghassemi, N., Shoeibi, A., Rouhani, M.: Deep neural network with generative adversarial
networks pre-training for brain tumor classification based on MR images. Biomed. Signal
Process. Control 57, 101678 (2020)
4. Saba, T., Mohamed, A.S., El-Affendi, M., Amin, J., Sharif, M.: Brain tumor detection using
fusion of hand crafted and deep learning features. Cogn. Syst. Res. 59, 221–230 (2020)
5. Duan, L., Yuan, G., Gong, L., Fu, T., Yang, X., Chen, X., Zheng, J.: Adversarial learning for
deformable registration of brain MR image using a multi-scale fully convolutional network.
Biomed. Signal Process. Control 53, 101562 (2019)
6. El-Melegy, M.T., Mokhtar, H.M.: Tumor segmentation in brain MRI using a fuzzy approach
with class center priors. EURASIP J. Image and Video Process. 2014(1), 1–14 (2014)
7. Kharrat, A., Gasmi, K., Messaoud, M.B., Benamrane, N., Abid, M.: A hybrid approach for
automatic classification of brain MRI using genetic algorithm and support vector machine.
Leonardo J. Sci. 17(1), 71–82 (2010)
8. Shakeel, P.M., Tobely, T.E.E., Al-Feel, H., Manogaran, G., Baskar, S.: Neural network based
brain tumor detection using wireless infrared imaging sensor. IEEE Access 7, 5577–5588
(2019)
9. Kanmani, P., Marikkannu, P.: MRI brain images classification: a multi-level threshold based
region optimization technique. J. Med. Syst. 42, 1–12 (2018)
10. Zhang, J., Lv, X., Zhang, H., Liu, B.: AResU-Net: attention residual U-Net for brain tumor
segmentation. Symmetry 12(5), 721 (2020)
11. Kumar, S., Mankame, D.P.: Optimization driven deep convolution neural network for brain
tumor classification. Biocybern. Biomed. Eng. 40(3), 1190–1204 (2020)
12. Alam, M.S., Rahman, M.M., Hossain, M.A., Islam, M.K., Ahmed, K.M., Ahmed, K.T., Singh,
B.K., Miah, M.S.: Automatic human brain tumor detection in MRI image using template-based
K means and improved fuzzy C means clustering algorithm. Big Data Cogn. Comput. 3(2), 27
(2019)
13. Islam, M.K., Ali, M.S., Miah, M.S., Rahman, M.M., Alam, M.S., Hossain, M.A.: Brain tumor
detection in MR image using superpixels, principal component analysis and template based
K-means clustering algorithm. Mach. Learn. Appl. 5, 100044 (2021)
14. Anaraki, A.K., Ayati, M., Kazemi, F.: Magnetic resonance imaging-based brain tumor grades
classification and grading via convolutional neural networks and genetic algorithms. Biocybern.
Biomed. Eng. 39(1), 63–74 (2019)
15. Bahadure, N.B., Ray, A.K., Thethi, H.P.: Image analysis for MRI based brain tumor detection
and features extraction using biologically inspired BWT and SVM. Int. J. Biomed. Imaging
(2017)
16. Sultan, H.H., Salem, N.M., Al-Atabany, W.: Multi-classification of brain tumor images using
deep neural network. IEEE Access 7, 69215–69225 (2019)
17. Nanda, A., Barik, R.C., Bakshi, S.: SSO-RBNN driven brain tumor classification with Saliency-
K-means segmentation technique. Biomed. Signal Process. Control 81, 104356 (2023)
18. Li, M., Kuang, L., Xu, S., Sha, Z.: Brain tumor detection based on multimodal information
fusion and convolutional neural network. IEEE Access 7, 180134–180146 (2019)
312 S. K. Panda et al.
19. Panda, S.K., Barik, R.C.: MR Brain 2D image tumor and cyst classification approach: an
empirical analogy. In 2023 IEEE International Students’ Conference on Electrical, Electronics
and Computer Science (SCEECS), pp. 1–6. IEEE (2023)
20. Ahmad, S., Choudhury, P.K.: On the performance of deep transfer learning networks for brain
tumor detection using MR images. IEEE Access 10, 59099–59114 (2022)
Chapter 12
Cyber-Physical Security in Smart Grids:
A Holistic View with Machine Learning
Integration
Abstract Cyber-physical attacks are become more challenging in each passing days
owing to the continuous advancement of smart-grid systems. In the present industrial
revolution, the smart grid is integrated with a wide-range of technologies, equipment/
devices and tools/software to make the system more trustworthy, reliable, efficient,
and cost-effective. Regardless of achieving these objectives, the peril area for the
critical attacks has also been stretched owing to the add-on cyber-layers. In order to
detect and mitigate these attacks, the machine learning (ML) tools are being reliably
and massively used. In this chapter, the authors have reviewed several state-of-the-art
related researches comprehensively. The advantages and disadvantages of each ML
based schemes are identified and reported in this chapter. Finally, the authors have
presented the shortcomings of the existing researches and possible future research
direction based on their investigation.
B. Patnaik
Nalla Malla Reddy Engineering College, Hyderabad, Telangana, India
M. Mishra (B)
Department of Electrical and Electronics Engineering, Siksha O Anusandhan University,
Bhubaneswar, India
e-mail: [email protected]
S. Hasan
Department of Electrical and Electronics Engineering, Birla Institute of Technology & Science,
Dubai Campus, Dubai, United Arab Emirates
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 313
J. Nayak et al. (eds.), Machine Learning for Cyber Physical System: Advances
and Challenges, Intelligent Systems Reference Library 60,
https://doi.org/10.1007/978-3-031-54038-7_12
314 B. Patnaik et al.
12.1 Introduction
The International Energy Agency (IEA), established in 1974 and collaborating with
governments and industry to forge a secure and sustainable energy future for all,
characterizes a smart grid as:
Smart grids are electrical networks incorporating digital technologies, sensors, and software
to efficiently synchronize electricity supply and demand in real-time. This is achieved by
minimizing costs and ensuring the stability and reliability of the grid [1].
A fairly more explicit definition of smart grid can be found as furnished by the
“National Smart Grid Mission, Ministry of Power, Government of India” [2], which
is:
A Smart Grid refers to an electrical grid equipped with automation, communication, and
IT systems that oversee power distribution from generation points to consumption points,
including individual appliances. It can regulate power flow and adjust loads in real-time
or near-real-time to align with current generation levels. Realizing Smart Grids involves
implementing efficient transmission and distribution systems, improving system operations,
integrating consumers effectively, and seamlessly incorporating renewable energy sources.
Smart grid solutions play a crucial role in monitoring, measuring, and control-
ling power flows in real-time, enabling the identification of losses. This function-
ality allows for the implementation of suitable technical and managerial measures
to mitigate these losses. The deployment of smart grid solutions can significantly
contribute to reducing transmission and distribution (T&D) losses, managing peak
loads, enhancing service quality, improving reliability, optimizing asset manage-
ment, integrating renewable energy sources, and increasing electricity accessibility.
Furthermore, smart grids have the potential to create self-healing grids. In essence,
smart grid solutions provide a comprehensive approach to addressing various chal-
lenges within the electrical grid, fostering more efficient and sustainable energy
management. A smart grid is a futuristic electrical power grid which is aimed to
evolve in order to be able to address the varying needs of global consumer and global
concerns. A general architecture of smart grid in block diagram form is represented
in Fig. 12.1. With increasing population there has been tremendous increase in the
demand of power. With changing lifestyle needs and awareness, consumers have
become more discerning in context of quality of power. While fast depleting natural
source of energy is a growing concern, arresting the environmental degradation by
limiting the carbon emission has also become a global concern. All these factors
have hastened up the search for solutions that should help generate more electrical
power by sustainable means, should be environmentally friendly, should facilitate
cost effective quality power with highly reliable, stable, resilient servicing, and last
but not the least, should ensure data privacy and security in the event of increasing
consumer participation in the process. Although the above-mentioned advantages
of smart grids are substantial, it is crucial to acknowledge the existence of signif-
icant challenges, including advanced system complexities, monitoring and control
intricacies, and the paramount issue of cybersecurity.
12 Cyber-Physical Security in Smart Grids: A Holistic View with Machine … 315
• This review will critically assess the effectiveness of machine learning approaches
in detecting and mitigating cyber-physical threats. It will explore various machine
learning algorithms and methodologies employed in research and practical
applications.
• By analyzing the existing literature, we seek to identify gaps, limitations, and
areas requiring further investigation in the current state of machine learning-based
solutions for smart grid security.
The rest of the sections are summarised as follows: Sect. 12.2 presents the Back-
ground and Fundamentals component of Smart-grid. Section 12.3 enumerates the
basic of Cyber security and Cyber Physical System. Section 12.4 presents a brief
introduction to Machine Learning (ML) and Deep Learning (DL). Section 12.5 states
the cybersecurity concerns in Smart Grid and its protective measures. Section 12.6
deals with the associated challenges and future directions. Section 12.7 concludes
with overall concluding remarks.
Figure 12.2 illustrates the structure of the Advanced Metering Infrastructure (AMI).
At the core of the AMI are smart meters installed at both small- and large-scale
consumer locations. These smart meters, distinguished from traditional energy
meters, are fully digital devices equipped with a range of additional features and
functionalities, as detailed in Table 12.1.
The AMI functions as a wireless network comprising smart meters, enabling
various smart services such as remote billing, monitoring of supply–demand manage-
ment, integration and oversight of distributed energy sources, consumer engage-
ment, and energy conservation, among others. Essentially, the AMI structure forms
a communication network that facilitates interaction among the smart grid central
control server, aggregators, and power consumers. In a smart grid environment, a
smart home connects all its appliances through a Home Area Network (HAN),
transmitting data to the smart meter via Wi-Fi, ZigBee, or Wide Area Network
12 Cyber-Physical Security in Smart Grids: A Holistic View with Machine … 317
(WAN). Smart meters, strategically placed in homes and diverse consumer locations
(e.g., factories, offices, social infrastructures), convey crucial information, including
power consumption and related data, to the aggregators through a Neighborhood
Area Network (NAN). The collected data is then forwarded to the central control
server.
The smart grid leverages this data to make informed decisions and implement
necessary measures to ensure a stable power supply, considering the fluctuating
power demand from consumers. Specific Smart Grid (SG) enabling devices, such as
Electric Vehicles (EVs) in a Vehicle-to-Grid (V2G) network, utilize the AMI network
based on technologies such as WiMAX, LTE, Wi-Fi, or WAN. Furthermore, power
plants and generators communicate their status data to the Smart Grid through Power
Line Communication (PLC).
processes the gathered data and sends action messages back to the PLCs, which in
turn makes the devices connected to it to conduct the pre-defined procedures. DCS
is the controlling mechanism that operates machines under the ambit of SCADA
infrastructure.
It may be inferred from above descriptions that operation technology (OT) is the
terminology refers to a large system that encompasses the monitoring and control of
the whole gamut of activities that is executed by its sub-component ICS.
In order to understand the operational methodologies of ICS, it may be visualized
as composition of several layer. Systems responsible for infrastructure operations
make for the supervisory layer. The physical components in a given facility make
for the physical layer [4]. As each of these systems or physical components are
designed and manufactured by different companies and they invariably use different
protocols of their choice, it is imperative that each layer will have reliance on different
network types resulting in data and signal incompatibility amongst them. In this
situation an Open Platform Communication (OPC) server id engaged to provision
for a common platform of interface of these layers with the management server
of the enterprise management layer. Servers engaged in the field layer record the
historical and real time data pertaining to its connected devices, which are used to
enable the system to bounce back from an abnormal state to the normal one. The field
layer also uses the services of data acquisition layer, which manages multiple RTUs,
PLCs, MTUs and IEDs, and ensures synchronization of communications amongst
them. IED is another important device which helps protect the smart grid through
12 Cyber-Physical Security in Smart Grids: A Holistic View with Machine … 323
blocking procedures before the incident of any critical system failure. The OT is
also equipped with authentication servers and application servers, which are part
of SCADA infrastructure and facilitate authenticated user access and enable system
systems-devices compatibility. A Human Machine Interface (HMU) also makes a
part of OT to facilitate operator’s interface with the linked apparatuses.
“A cyber threat, or cybersecurity threat, refers to a malevolent action with the intent
to either pilfer or harm data, or disrupt the digital well-being and stability of an
enterprise.” Cyberattacks can encompass both unintentional and deliberate activities,
posing potential dangers that may result in significant harm to the computational
systems, networks, or other digital assets of an organization. These threats or attacks
manifest in various forms, including but not limited to data breaches, viruses, and
denial of service. The spectrum of cyber threats extends from trojans, viruses, and
hackers to the exploitation of back doors.
Cyberattacks typically target the unauthorized acquisition of access, with the
intention to disrupt, steal, or inflict damage upon IT assets, intellectual property,
computer networks, or any other form of sensitive data. It exploits the vulnerabilities
in a system to launch an invasion of the targeted system or network. A “blended cyber
threat”, which usually is the case, refers to a single attempt of hacking which leads
to multiple exploits. Threats can be sourced from within the organization by trusted
users or from remote locations by unknown external parties. While a cyberattack
of the type adware may have inconsequential effect, an attack of type denial-of-
service can have catastrophic effect on an organization. The impact of cyberattacks
can be as severe as electrical blackouts, malfunctions in military equipment, or the
compromise of national security secrets. In short, it affects each aspect of our life.
The Table 12.1 provides an exhaustive list of cyber threats, how they act and plausible
counter measures in some of these cases.
The significance of cybersecurity in this context can be succinctly outlined as
follows [7]:
• Guards sensitive information against unauthorized access or theft.
• Provides protection from cyber threats like malware, viruses, and ransomware.
• Ensures the integrity and confidentiality of digital systems and networks.
• Averts disruptions to critical services and operations.
• Mitigates financial losses and preserves the reputation of businesses.
• Assists in compliance with legal and regulatory standards.
• Builds trust and confidence among customers and users.
• Enhances secure communication and collaboration within organizations.
• Facilitates the safe integration of emerging technologies such as cloud computing
and the Internet of Things (IoT).
The goals of cybersecurity can be succinctly outlined as follows [7]:
• Confidentiality of Data: Ensuring protection against unauthorized access.
• Integrity of Data and Information: Safeguarding against unauthorized alter-
ations.
12 Cyber-Physical Security in Smart Grids: A Holistic View with Machine … 325
Overall, a Smart Grid, as smart as it gets, heavily relies upon a behemoth of digital
network comprising of fast communication channels dealing with humongous flow
of data, which are processed, filtered, and applied with intelligent computational
methods in order to come out with instantaneous solutions that help not only to
operate, maintain and protect the physical devices in the smart grid but also help run
the smart grid enterprise and increased consumer participation. In fact, this digital
layer over the physical entities of the grid system, intricately connected exchanging
data and information with each other, makes for what is so called as a Cyber Physical
System (CPS).
The smart grid is recognized as a quintessential cyber-physical system (CPS), embodying
an integration of physical power systems with cyber components. This fusion encompasses
elements such as sensing, monitoring, communication, computation, and control within the
smart grid framework. [9, 10]
Needless to say that such a vast network of networks that the cyber physical system
a smart grid is definitely is vulnerable to threats that a cyber system is susceptible
to notwithstanding the usual share of protection issues that is generally accrued to
the physical components of a smart grid [11, 12]. Any kind of infringement in the
cyber layer of the smart grid can have colossal damaging effect and a smart grid
needs to be smart enough to shield itself from such cyber infringements or cyber
threats; an additional and most important technical challenge that can be ascribed to
the evolution of smart grid.
326 B. Patnaik et al.
Before delving into the deliberation on the significance of machine learning (ML) in
enhancing cybersecurity, it is important to understand what machine learning is all
about and its context in view of Artificial Intelligence (AI), as too often it is observed
that both AI and ML are used to refer to the same activity. While ML is considered
a subset of AI, the subtle difference between them in terms of their deployment is
generally misunderstood.
ML can be viewed as a class of statistical tools which identifies the relationships
and patterns in a given set of data and the process build up to a ML model which
represents the event or phenomena that the data pertains to. In the same coin the AI can
be viewed as a software that aligns the tool (i.e., ML in this case) with a controller that
takes action based on the tool’s output. The tools can be any other suitable algorithm,
such as a logic or am expert system to implement the AI [13]. To put it in a simpler
way, the ML tool initiates a training phase wherein the ML model learns automatically
analyzing the available data set (training data set). Such a model developed through
training on existing data implements a function to make decisions on future data.
The performance of the ML model is assessed before deploying it into the intended
operational environment, an exercise known as validation. In pursuit of this objective,
the machine learning (ML) model processes designated “validation” data, and the
resulting predictions undergo analysis by humans or are compared against established
ground truth. Consequently, a machine learning method is delineated as “the process
of constructing a machine learning model through the application of ML algorithms
on specific training data” [14].
Based on the data type, labelled or non-labelled, training of ML methods can
be either supervised or unsupervised respectively. Labelled training data usually
available naturally. If not, labels can be attributed to the training data through manual
verification. In contrast unsupervised training do not require labelled data and may
involve a feedback process, acquiring the labels automatically as the ML model
develops. The ML model based on reinforcement learning is such an instance of
unsupervised learning.
The ML methods, on the other hand, can be classified as shallow and deep learning
type. Deep learning methods bank upon neural networks and require greater compu-
tational power with larger training dataset in comparison to shallow ML methods
(based on structures/algorithms/logics other than neural networks). It is important to
note that deep learning performs much better than shallow methods when it is needed
to handle large datasets with high complexities, whereas shallow methods perform
equally well while data available has small number of features. Nevertheless, deep
learning ML methods stand out while dealing with large dataset with varied complex-
ities involving images, unstructured text, temporal dependencies etc. besides being
trained both supervised or unsupervised manner [15–17]. The Fig. 12.4 enumerates
some of the popular ML algorithms under the categories as discussed above. A brief
description of the ML algorithms as depicted in the Fig. 12.4 is as follows:
12 Cyber-Physical Security in Smart Grids: A Holistic View with Machine … 327
Several popular shallow machine learning algorithms that rely on supervised learning
are described below. Naïve Bayes (NB) is a probabilistic classifier that assumes a
priori independence among input features, making it efficient for small datasets.
Logistic Regression (LR), a categorical classifier, shares a similar a priori assumption
as NB but is increasingly reliant on larger datasets for effective training. Support
Vector Machines (SVM) are highly effective binary classifiers but face challenges
with scalability and extended processing times. Random Forest (RF) comprises a
collection of decision trees, each acting as a conditional classifier. The final RF output
integrates the results of individual trees, making it beneficial for large datasets and
multiclass problems but susceptible to overfitting.
Hidden Markov Models (HMM) represent a set of states producing outputs with
distinct probabilities, aiming to determine the sequence of states that can produce
observed outputs. HMMs can be trained on both labeled and unlabeled datasets.
K-Nearest Neighbor (KNN), like RF, is useful for solving multiclass problems, but
the computational intensity of training and testing poses challenges. Shallow Neural
Network (SNN) belongs to a class of algorithms based on neural networks.
Moving to unsupervised learning, some popular shallow machine learning algo-
rithms are highlighted below. Clustering involves grouping data with similar charac-
teristics, with k-means and hierarchical clustering being prominent examples. Asso-
ciation, another unsupervised learning method, aims to identify patterns between
data, making it particularly suitable for predictive purposes.
328 B. Patnaik et al.
Deep Learning (DL) algorithms are fundamentally rooted in Deep Neural Networks
(DNN), extensive networks organized into layers capable of autonomous represen-
tation learning.
The vulnerability of the devices in the AMI component of the smart grid infrastructure
to different types of cyberthreats and the security objectives compromised thereof is
enlisted in the Table 12.2.
Table 12.3 similarly enumerates the devices belonging to OT component of the
smart grid infrastructure that are susceptible to cyberattacks and the impacted cyber
security objectives.
Table 12.4 in the similar line shows devices in the IT component of the smart grid
architecture that are prone to cyberattacks and the related cyber security objectives
compromised.
The various cyberattacks that affects the devices in the AMI, OT, and IT Components
of a Smart Grid and the cyber security objects that remain unattained or compromised
have been enumerated in a table M, N, and O and this section dwells upon the
methods, process, tools, or practices that can adopted to detect, prevent and throttle the
impending cyberattacks. While the tabular enumeration of cyber threats has specific
countermeasures, there are certain techniques or approaches of cyber security which
are generic and are applicable to devices across the smart grid depending upon the
threat type threat perception. As such smart grid provides a very large attack surface
for attackers to make an entry, and it is not feasible to deploy equal level of security
measures throughout the infrastructure. A minor loophole in the security setup could
jeopardize the entire power grid infrastructure and unfortunately information related
332 B. Patnaik et al.
to grid devices and systems are already commonplace in online search engines like
Shodan [30]. These engines possess the capability to gather data from Internet of
Things (IoT) devices, including Industrial Control Systems (ICSs) associated with
electrical grids. For example, as of April 2019, they have indexed over 1,200 devices
supporting the IEC 60,870–5-104 protocol and nearly 500 devices supporting the
DNP3 protocol. Both of these protocols are widely used for the remote control and
monitoring of modernized power grid systems. Moreover, considering additional
protocols utilized in broader Industrial Control Systems, such as Modbus, the total
number of indexed devices is even more extensive [31]. In such an environment it
is very much essential that the security protocol of organizations of infrastructure of
the proportion of a smart grid must have the first line of defence to learn as to who is
accessing and scanning which exposed devices rather than waiting for development
of a threat.
Honeypot is such a generic cybersecurity approach which is also deployed for
cybersecurity in smart grid. Honeypots appear as the targets likely to be attacked by
the hackers, such as a vulnerable device and its associated networks in a smart grid.
Hackers get lured to these honeypots assuming them to be legitimate targets, and in
the process the security analysts get the chance to detect and deflect cyber criminals
from hacking the intended targets.
A smart grid, given its nature of complexity and cost involved in its implementation
and operation, effective or optimal utilization of its resources, including the defence
mechanism against cyber threats, is of paramount importance. In this context Game
Theories could be highly helpful as they are widely used to predict an attack method
that is most likely to take place. Game theories are deployed to make out the process of
a specific attack scheme and a tree-structure-based game theory defence mechanism
is best suited to a smart grid scenario as it analyses the path generated from the
tree-structure model to predict the line of attack or its procedures [25].
AMI essentially involves communication protocols for sharing of data between smart
meters and SG control centre (V2G), between EV to EV and EV to EV charging
stations, and so forth. All these communication channels can be target of the hackers
and securing them against cyber attacks is the primary aspect of cyber security in the
AMI component of a smart grid infrastructure.
One of the distinct features of smart meters is the embedded encryption algorithms
or encryption key which is very vital for secure smart meter communication and
for proper coordination amongst them encryption key management system is also
very much essential [32]. Efficient key management and frequent auto updating of
these keys can be taken up as an intrusion detection measure against data injection
attack [33]. Similarly, authors in [34] have suggested an hash-based encryption with
bidirectional authentication for secure V2G communication. In a smart grid scenario,
it is very essential to have a robust key agreement and subscriber authentication for
334 B. Patnaik et al.
protected communication and the same has been pointed out by authors in [35,
36] highlighting the consequences of a feeble authentication and key agreement
algorithm which provide the leeway to tampering of smart meters by the adversaries.
Authors in [36–38] have proposed countermeasures against may include location and
data stamp information. The above has been reiterated for EV-to-EV communication
in [39]. The Table 12.5 enumerates some of the counter measures against cyber
threats to the AMI component of a smart grid infostructure.
Countermeasures concerning DoS, FDI, message replay, and TSA attacks are similar
to as used in AMI and IT component of smart grid. Quantification of the intensity
of cyberattack through its impact on the smart grid is a necessary countermeasure to
discern a traffic flooded by an attacker. As suggested in [47] the effect of a cyberattack
can be measured by using channel bandwidth. A DoS attack on the DNP3 (Distributed
12 Cyber-Physical Security in Smart Grids: A Holistic View with Machine … 335
Network Protocol 3) can be ascertained by analysing the attack intensity. The high
intensity indicates network flooding and the attacker stands chance to get exposed
easily.
It is observed that an attacker usually adopts an attack method involving less cost
and the cost involved in unceasing a cyberattack depend on the strength of the defence
mechanism in force at the targeted SG. For such reason it is good to calculate the
338 B. Patnaik et al.
An IDS model based on ML model with random forest (RF) as the classifier is
proposed in [51] where the data collected by PMUs across the smart grid is used to
detect data injection threats with very high accuracy and detection rate. Another
such intelligent IDS modelled on multi-layer deep algorithm for detection of cyber
threats in smart meter communication network is proposed in [52]. The accuracy
and speed of detecting several cyber attacks of type and nature, such as benign,
DoS, PortScan, Web Attack, Bot, FTP Parator, and SSH Parator, in a cyber
physical system like smart grid claimed to be very high. A deep neural network
(DNN) model has been proposed in [53] which proves to be highly accurate in
classifying smart grid cyber attacks into types, namely Probe, DoS, U2R, R2L. The
cyberattack type, False Data Injection (FDI) in a smart grid can be mitigated by a ML
model which is designed based on Convolutional Neural Network (CNN) and a Long
Short Term Memory (LSTM) network [54]. The model makes a time series anomaly
search encompassing all the evolved features of a FDI attack [55]. A two-staged DL
based threat detection and localization is proposed in [56] wherein 2D images of the
encoded correlations of the measured variables in the smart grid is used to develop
a deep Convolutional Neural Network (CNN)-based classifier to detect FDI attacks
with high accuracy. Also, the authors in [56] have proposed a two-layered sequential
auto detector of cyber threats in a smart grid. The first layer indicates the presence
of cyberattack followed by the second layer which classifies the cyberattacks. For
12 Cyber-Physical Security in Smart Grids: A Holistic View with Machine … 339
both the layers the ML algorithm RF is chosen for the intended purposes. A pattern
recognition approach for detection of cyber threats in the physical layer of smart
grid is proposed in [57] which relies on an ensemble of ML classifiers and neural
networks for enhanced pattern recognition.
The study [58] proposed to view the task of anomaly detection in the data traffic
of a digital network as a partially observable Markov decision process (POMDP)
problem. To handle such problem, it suggests a universal robust online detection
algorithm based on the framework of model-free reinforcement learning (RL) for
POMDPs. The anomaly detection scheme aims to identify attacks of types, jamming,
FDI, and DoS. The article [59] proposes an anomaly detection and classification
model for smart grid architectures using Modbus/Transmission Control Protocol
(TCP) and Distributed Network Protocol 3 (DNP3) protocols. The model adopts an
Autoencoder-Generative Adversarial Network (GAN) architecture for (a) detecting
operational anomalies and (b) classifying Modbus/TCP and DNP3 cyberattacks.
The increased adoption of Internet of Things (IoT) and digital communication
networks for Monitoring and controlling of industrial control systems (ICS) in a
smart grid, exposes the CPS to many cyber threats with devastating consequences.
While traditional IDSs are found inadequate, the intelligent IDSs proposed in various
literatures do not take into account the imbalanced observed in the ICS datasets. The
model proposed in [60] is based on Deep Neural Network (DNN) and Decision
Tree (DT) classifiers, taking advantage of the inherent capabilities of the intelli-
gent algorithms in handling imbalanced datasets and providing high accuracy of
classification.
The authors in [61] have proposed a DL based IDS specifically to address the FDI
attack on Supervisory Control and Data Acquisition (SCADA) system of a smart
grid in order to ensure data integrity of the data collected by the system. Stacked
autoencoder (SAE) based deep learning framework for mitigation of threats against
transmission SCADA is proposed in [62] which also counts on the inherent capacity
of DL in unsupervised feature learning in complex security scenario. A two stage
IDS model with each stage deployed with an agent-based model is proposed [63] for
preserving data integrity in the physical layers of a smart grid. The first stage comes
out with attack exposure matric while the second stage explores decentralization of
security in the system. The study [64] takes in to account the varied attack strategies
likely to be adopted by the hackers based on factors such as cost, time, availability
of information, and level of vulnerability of the system chosen to be attacked. In this
context a scenario based two-stage sparse cyber-attack models for smart grid with
complete and incomplete network information are proposed, which works on DL
based interval state estimation (ISE).
With means of advanced technology and vulnerabilities of smart grids due to
heavy reliance on IT, a rudimentary act of electricity theft has also gone digital and
is very much a cyber threat concern. The authors in [65] have highlighted this aspect
in context of electricity theft in a distributed generation (DG) scenario, wherein the
consumers with malicious intent hack into the smart meters deployed with their own
grid-tied-DG units to claim higher than the actual supply of energy. The authors
340 B. Patnaik et al.
12.8 Conclusion
In conclusion, this comprehensive review has delved into various facets of smart
grid cybersecurity, providing a nuanced understanding of the challenges and poten-
tial solutions in this critical domain. Here, the authors scrutinized the intricacies of
12 Cyber-Physical Security in Smart Grids: A Holistic View with Machine … 341
smart grid infrastructure, highlighting challenges that range from data availability
and quality to the integration complexities of associated devices such as AMI, IT,
and OT. Our exploration extended to the diverse landscape of cyber threats, encom-
passing the types of attacks and the specific devices susceptible to these threats within
a smart grid framework. By elucidating effective countermeasures, we underscored
the importance of securing smart grid components against potential vulnerabili-
ties. Moreover, the study explored the integration of artificial intelligence, encom-
passing both machine learning (ML) and deep learning (DL), as a transformative
approach to fortify smart grid cybersecurity. This work discussed the application of
ML and DL techniques, recognizing their potential to automate threat detection and
response. In acknowledging the evolving nature of cyber threats, the work outlined
challenges associated with AI adoption in this context. Looking ahead, we proposed
future directions, including federated learning, explainable AI, generative adver-
sarial networks, multi-agent systems, and homomorphic encryption, as promising
avenues to enhance the resilience of smart grids against cyber threats. This holistic
examination contributes to the collective knowledge base, offering insights that can
inform future research, policy development, and practical implementations in the
ever-evolving landscape of smart grid cybersecurity.
References
14. Apruzzese, G., Laskov, P., Montes de Oca, E., Mallouli, W., Brdalo Rapa, L., Grammatopoulos,
A.V., Di Franco, F.: The role of machine learning in cybersecurity. Digital Threats: Research
and Practice 4(1), 1–38 (2023)
15. Giovanni Apruzzese, Michele Colajanni, Luca Ferretti, Alessandro Guido, and Mirco
Marchetti. 2018. On the effectiveness of machine and deep learning for cybersecurity. In
Proceedings of the IEEE International Conference on Cyber Conflicts. 371–390.
16. Giovanni Apruzzese, Michele Colajanni, Luca Ferretti, and Mirco Marchetti. 2019. Addressing
adversarial attacks against security systems based on machine learning. In Proceedings of the
IEEE International Conference on Cyber Conflicts. 1–18.
17. Kasun Amarasinghe, Kevin Kenney, and Milos Manic. 2018. Toward explainable deep neural
network based anomaly detection. In Proceedings of the IEEE International Conference Human
System Interaction. 311–317.
18. Baig, Z.A., Amoudi, A.R.: An analysis of smart grid attacks and countermeasures. J. Commun.
8(8), 473–479 (2013). https://doi.org/10. 12720/jcm.8.8.473-479
19. Bou-Harb, E., et al.: Communication security for smart grid distribution networks. IEEE
Commun. Mag. 51(1), 42–49 (2013). https://doi.org/10. 1109/mcom.2013.6400437
20. Hansen, A., Staggs, J., Shenoi, S.: Security analysis of an advanced metering infrastructure.
Int. J. Crit. Infrastruct. Protect. 18, 3–19 (2017). https://doi.org/10.1016/j.ijcip.2017.03.004
21. Wang, K., et al.: Strategic honeypot game model for distributed denial of service attacks in
smart grid. IEEE Trans. Smart Grid. 8(5), 2474–2482 (2017). https://doi.org/10.1109/tsg.2017.
2670144
22. Farraj, A., Hammad, E., Kundur, D.: A distributed control paradigm for smart grid to address
attacks on data integrity and availability. IEEE Trans. Signal Inf. Process. Netw. 4(1), 70–81
(2017). https://doi.org/10. 1109/tsipn.2017.2723762
23. Chen, P.Y., Cheng, S.M., Chen, K.C.: Smart attacks in smart grid communication networks.
IEEE Commun. Mag.Commun. Mag. 50(8), 24–29 (2012). https://doi.org/10.1109/mcom.
2012.6257523
24. Sanjab, A., et al.: Smart grid security: threats, challenges, and solutions. arXiv preprint arXiv:
1606.06992
25. Liu, S.Z., Li, Y.F., Yang, Z.: Modeling of cyber-attacks and defenses in local metering system.
Energy Proc. 145, 421–426 (2018). https://doi.org/10.1016/j.egypro.2018.04.069
26. Sun, C.C., et al.: Intrusion detection for cybersecurity of smart meters. IEEE Trans. Smart Grid.
12(1), 612–622 (2020). https://doi.org/10. 1109/tsg.2020.3010230
27. Bansal, G., Naren, N., Chamola, V.: RAMA: real-time automobile mutual authentication
protocol using PUF. In: Proc. Int. Conf. Cloud Computing Environment Based on Game Theory,
Barcelona, Spain, January 2020, pp. 265–270
28. Bhattacharjee, S., et al.: Statistical security incident forensics against data falsification in smart
grid advanced metering infrastructure. In: Proc. Int. Conf. Data and Application Security and
Privacy, Scottsdale, USA, March 2017, pp. 35–45
29. Wei, L., et al.: Stochastic games for power grid protection against co-ordinated cyber-physical
attacks. IEEE Trans. Smart Grid. 9(2), 684–694 (2018). https://doi.org/10.1109/tsg.2016.256
1266
30. “Shodan,” https://www.shodan.io/. [Accessed on August 8, 2023]
31. Mashima, D., Li, Y., & Chen, B. (2019, December). Who’s scanning our smart grid? empirical
study on honeypot data. In 2019 IEEE Global Communications Conference (GLOBECOM)
(pp. 1–6). IEEE.
32. Liu, N., et al.: A key management scheme for secure communications of advanced metering
infrastructure in smart grid. IEEE Trans. Ind. Electron. 60(10), 4746–4756 (2012). https://doi.
org/10.1109/tie.2012.2216237
33. Liu, X., et al.: A collaborative intrusion detection mechanism against false data injection attack
in advanced metering infrastructure. IEEE Trans. Smart Grid. 6(5), 2435–2443 (2015). https://
doi.org/10.1109/tsg.2015.2418280
34. Lee, S.: Security and privacy protection of vehicle-to-grid technology for electric vehicle in
smart grid environment. J. Convergence Culture Technol. 6(1), 441–448 (2020)
12 Cyber-Physical Security in Smart Grids: A Holistic View with Machine … 343
35. Park, K.S., Yoon, D.G., Noh, S.: A secure authentication and key agreement scheme for smart
grid environments without tamper-resistant devices. J. Korea Inst. Inf. Secur. Cryptol. 30(3),
313–323 (2020)
36. Kaveh, M., Martín, D., Mosavi, M.R.: A lightweight authentication scheme for V2G communi-
cations: a PUF-based approach ensuring cyber/physical security and identity/location privacy.
Electronics 9(9), 1479 (2020). https://doi.org/10.3390/electronics9091479
37. Zhang, L., et al.: A lightweight authentication scheme with privacy protection for Smart Grid
communications. Future Generat. Comput. Syst. 100, 770–778 (2019). https://doi.org/10.1016/
j.future.2019.05.069
38. Go, Y.M., Kwon, K.H.: Countermeasure of SIP impersonation attack using a location server.
J. Korea Contents Assoc. 13(4), 17–22 (2013). https://doi.org/10.5392/jkca.2013.13.04.017
39. Roberts, B., et al.: An authentication framework for electric vehicle-to- electric vehicle charging
applications. In: Proc. Int. Conf. Mobile Ad Hoc and Sensor Systems, Orlando, USA, November
2017, pp. 565–569
40. Guo, Z., et al.: Time synchronization attack and countermeasure for multisystem scheduling
in remote estimation. IEEE Trans. Automat. Control. 66(2), 916–923 (2020). https://doi.org/
10.1109/tac.2020.2997318
41. Chan, A.C.F., Zhou, J.: A secure, intelligent electric vehicle ecosystem for safe integration
with smart grid. IEEE Trans. Intell. Transport. Syst. 16(6), 3367–3376 (2015). https://doi.org/
10.1109/tits.2015.2449307
42. Kakei, S., et al.: Cross-certification towards distributed authentication infrastructure: a case of
hyperledger fabric. IEEE Access. 8, 135742–135757 (2020). https://doi.org/10.1109/access.
2020.3011137
43. Li, Q., et al.: A risk assessment method of smart grid in cloud computing environment based on
game theory. In: Proc. Int. Conf. Cloud Computing and Big Data Analytics, Chengdu, China,
April 2020, pp. 67–72
44. Shen, S., Tang, S.: Cross-domain grid authentication and authorization scheme based on trust
management and delegation. In: Proc. Int. Conf. Computational Intelligence and Security,
Suzhou, China, December 2008, pp. 399–404
45. Chu, Z., et al.: Game theory based secure wireless powered D2D communications with
cooperative jamming. In: Proc. Int. Conf. Wireless Days, Porto, Portugal, March 2017,
pp. 95–98
46. Pawlick, J., Zhu, Q.: Proactive defense against physical denial of service attacks using Poisson
signaling games. In: International Conference on Decision and Game Theory for Security,
October 2017, pp. 336–356. Springer, Cham
47. Lu, Z., et al.: Review and evaluation of security threats on the communication networks in
smart grid. In: Proc. Int. Conf. Military Communications, San Jose, USA
48. Hewett, R., Rudrapattana, S., Kijsanayothin, P.: Cyber-security analysis of smart grid SCADA
systems with game models. In: Proc. Int. Conf. Cyber and Information Security Research, New
York, USA, April 2014, pp. 109–112
49. Pan, K., et al.: Combined data integrity and availability attacks on state estimation in cyber-
physical power grids. In: Proc. Int. Conf. Smart Grid Communications, Sydney, Australia,
November 2016, pp. 271–277
50. Jeong, Y.S.: Probability-based IoT management model using blockchain to expand multilayered
networks. J. Korea Convergence Soc. 11(4), 33–39 (2020)
51. Wang, D., Wang, X., Zhang, Y., Jin, L.: Detection of power grid disturbances and cyber-attacks
based on machine learning. Journal of information security and applications 46, 42–52 (2019)
52. Vijayanand, R., Devaraj, D., & Kannapiran, B. (2019, April). A novel deep learning based
intrusion detection system for smart meter communication network. In 2019 IEEE International
Conference on Intelligent Techniques in Control, Optimization and Signal Processing (INCOS)
(pp. 1–3). IEEE.
53. Zhou, L., Ouyang, X., Ying, H., Han, L., Cheng, Y., & Zhang, T. (2018, October). Cyber-attack
classification in smart grid via deep neural network. In Proceedings of the 2nd international
conference on computer science and application engineering (pp. 1–5).
344 B. Patnaik et al.
54. Niu, X., Li, J., Sun, J., & Tomsovic, K. (2019, February). Dynamic detection of false data
injection attack in smart grid using deep learning. In 2019 IEEE Power & Energy Society
Innovative Smart Grid Technologies Conference (ISGT) (pp. 1–6). IEEE.
55. Mohammadpourfard, M., Genc, I., Lakshminarayana, S., & Konstantinou, C. (2021, October).
Attack detection and localization in smart grid with image-based deep learning. In 2021 IEEE
international conference on communications, control, and computing technologies for smart
grids (SmartGridComm) (pp. 121–126). IEEE.
56. Farrukh, Y. A., Ahmad, Z., Khan, I., & Elavarasan, R. M. (2021, November). A sequential
supervised machine learning approach for cyber attack detection in a smart grid system. In
2021 North American Power Symposium (NAPS) (pp. 1–6). IEEE.
57. Sakhnini, J., Karimipour, H., Dehghantanha, A., Parizi, R.M.: Physical layer attack identifi-
cation and localization in cyber–physical grid: An ensemble deep learning based approach.
Physical Communication 47, 101394 (2021)
58. Kurt, M.N., Ogundijo, O., Li, C., Wang, X.: Online cyber-attack detection in smart grid: A
reinforcement learning approach. IEEE Transactions on Smart Grid 10(5), 5174–5185 (2018)
59. Siniosoglou, I., Radoglou-Grammatikis, P., Efstathopoulos, G., Fouliras, P., Sarigiannidis,
P.: A unified deep learning anomaly detection and classification approach for smart grid
environments. IEEE Trans. Netw. Serv. Manage.Netw. Serv. Manage. 18(2), 1137–1151 (2021)
60. Al-Abassi, A., Karimipour, H., Dehghantanha, A., Parizi, R.M.: An ensemble deep learning-
based cyber-attack detection in industrial control system. IEEE Access 8, 83965–83973 (2020)
61. He, Y., Mendis, G.J., Wei, J.: Real-time detection of false data injection attacks in smart grid: A
deep learning-based intelligent mechanism. IEEE Transactions on Smart Grid 8(5), 2505–2516
(2017)
62. Wilson, D., Tang, Y., Yan, J., & Lu, Z. (2018, August). Deep learning-aided cyber-attack
detection in power transmission systems. In 2018 IEEE Power & Energy Society General
Meeting (PESGM) (pp. 1–5). IEEE.
63. Sengan, S., Subramaniyaswamy, V., Indragandhi, V., Velayutham, P., Ravi, L.: Detection of false
data cyber-attacks for the assessment of security in smart grid using deep learning. Comput.
Electr. Eng.. Electr. Eng. 93, 107211 (2021)
64. Wang, H., Ruan, J., Wang, G., Zhou, B., Liu, Y., Fu, X., Peng, J.: Deep learning-based interval
state estimation of AC smart grids against sparse cyber attacks. IEEE Trans. Industr. Inf.Industr.
Inf. 14(11), 4766–4778 (2018)
65. Ismail, M., Shaaban, M.F., Naidu, M., Serpedin, E.: Deep learning detection of electricity theft
cyber-attacks in renewable distributed generation. IEEE Transactions on Smart Grid 11(4),
3428–3437 (2020)
Chapter 13
Intelligent Biometric
Authentication-Based Intrusion
Detection in Medical Cyber Physical
System Using Deep Learning
P. B. Dash (B)
Department of Information Technology, Aditya Institute of Technology and Management
(AITAM), Tekkali, Andhra Pradesh 532201, India
e-mail: [email protected]
P. P. Priyadarshani
Department of Computer Science, Maharaja Sriram Chandra Bhanja Deo University (MSCBU),
Baripada, Odisha 757003, India
M. K. Pehlivanoğlu
Department of Computer Engineering, Kocaeli University, Kocaeli, Türkiye
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 345
J. Nayak et al. (eds.), Machine Learning for Cyber Physical System: Advances
and Challenges, Intelligent Systems Reference Library 60,
https://doi.org/10.1007/978-3-031-54038-7_13
346 P. B. Dash et al.
13.1 Introduction
The phrase “cyber-physical system” was first used by Helen Gill at the National
Science Foundation (NSF) in the United States in 2006. Cyber-Physical System
(CPS) serves as platforms that facilitate the coordination of computing device,
internet connectivity, and physical activities. This integration enables smooth inter-
action between web activities and real-world components [1]. A CPS refers to a
device that employs computer-based algorithms to manage and supervise a particular
process. The interconnection between physical and software components is a promi-
nent feature of cyber-physical systems, which operate across multiple spatial and
temporal dimensions. These systems exhibit diverse and unique modes of behavior
and engage in communication within varied environments. The increasing popularity
of the CPS is being attributed to the fast growing nature of the internet. The CPS
paradigm has been increasingly used in the development of intelligent applications
including smart robots, smart transportation, smart healthcare, smart agriculture,
smart manufacturing, smart distribution of water, and smart homes, including several
technical domains and associated services. Figure 13.1 represents some applications
of CPS. Water supply networks are dynamic due to climate change and customer
demand uncertainty. The rapid advancement of technology makes water supply
system improvements possible. Thus, communication and networking, sensing and
instrumentation, computation, and control technology is linked with water delivery
system infrastructures to improve operations [2]. Cyber physical manufacturing
systems provide distinct advantages over conventional production methods. Five
methods, smart supply chains, including production line monitoring, predictive anal-
ysis, asset monitoring, predictive analysis, and personalized goods, demonstrate the
superiority of cyber manufacturing systems over conventional approaches [3]. Trans-
port networks affect national productivity, environment, and energy consumption.
Developing innovative, efficient transport systems requires overcoming technolog-
ical hurdles related to the cyber-physical characteristics of current systems [4]. A
significant proportion of the global population is relocating to metropolitan areas.
Countries are actively pursuing smart city initiatives to enhance the overall welfare of
their residents. CPS is the fundamental basis of smart city infrastructures. Practically
every part of a smart city’s infrastructure makes use of CPS [5].
MCPS is designed to combine a variety of intelligent health care sensor gadgets
and effectively gather signal information. The data is securely stored inside cloud
storage architecture. MCPS conducts monitoring and surveillance activities that
13 Intelligent Biometric Authentication-Based Intrusion Detection … 347
manipulate the dosage of drugs and may transform MCPS sensors into networks of
compromised devices, which can be used for carrying out Denial-of-Service (DoS)
assaults. Vulnerabilities in cyber security significantly compromise the security of
software and its components including their authenticity, privacy, and availability
[11].
Based on existing studies, it has been concluded that there are many security-related
difficulties and challenges present in applications of the MCPS. The increasing
frequency of cyber-attacks in the MCPS environment poses a significant threat to the
whole healthcare ecosystem, exposing it susceptible to hackers. The following are
the primary difficulties that need to be addressed:
1. The fluctuating nature of MCPS networks (IoT devices, fog, and the cloud) makes
it difficult to design distributed security architecture for distributed MCPS appli-
cations. Furthermore, with MCPS, the transmission network may be interrupted
by the change of attacker behavior.
2. The decentralized framework for analyzing the huge amount of data generated
by MCPS devices presents a significant challenge to the security mechanisms
designed to protect devices.
3. It is difficult to build an intrusion detection system (IDS) that can discriminate
between an attack and ordinary observations in an MCPS environment. Thou-
sands of devices and sensors are linked together in such a network environment,
which suffers from poor architecture and inadequate authentication procedures.
We have presented a DL based IDS for addressing these issues. The detection
system employs complex CNN architecture to minimize the impact of cyber-attacks
in an MCPS environment.
The following is the outline for the rest of the paper. Section 13.2 provides an overview
of the vulnerabilities present in health care security, as well as the current solutions
available to address cyber-attacks. Additionally, it examines a specific attack scenario
using medical sensors inside the IoT health care ecosystem. Section 13.3 of the article
extends into a study of several ML methodologies, with a specific focus on CNN
which are implemented in this research. The design and architecture of the suggested
model has been described in Section 13.4. Section 13.5 of the document explores
into a detailed analysis of the dataset and the setup of the experimental environment.
The evaluation of the proposed model’s experimental findings and performance is
presented in Section 13.6; In conclusion, Section 13.7 serves as the last section of
this article.
The increasing popularity of MCPS devices has created new security concerns,
such as increased network traffic across the MCPS environment. Attacks have been
employed against the MCPS environment including network spoofing, DoS and
DDoS attacks. Several studies have shown the enhancement of security and protec-
tion in MCPS by using ML and DL approaches. These approaches have been effec-
tive in improving the accuracy and efficiency of security threat detection in MCPS,
enabling early prevention measures to be implemented before any potential damage
occurs. This section provides an overview of some researches that have used IDS
based on ML and DL approaches in MCPS.
Begli et al. [13] have been suggested a secure support vector machine (SVM)
based IDS specifically for remote healthcare environments. This approach employs
to address and prevent Denial of Service (DoS) assaults and User to Root (U2R)
assaults. The anomaly identification system has been evaluated by implementing
the NSL-KDD data samples, which achieved an accuracy detection rate of 95.01%
for identifying abnormalities. Newaz et al. [14] introduced Health Guard, a safety
framework designed specifically for smart healthcare systems (SHS). The present
approach examines the vital signals of diverse sensors inside a SHS and establishes
350 P. B. Dash et al.
shown that SNN with a 9% higher sensitivity exhibited more robustness compared
to FNN when it came to feature normalization in the context of adversarial attacks.
Hizal et al. [20] have implemented CNN model-based IDS that were executed
on a Graphics Processing Unit (GPU) runtime environment. They have achieved a
classification accuracy of 99.86% for a 5-class classification task using the NSL-KDD
data samples. Gopalakrishnan et al. [21] introduced a system called DLTPDO-CD,
which integrates DL for traffic prediction, data offloading, and cyber-attack detection.
The model incorporates three primary operations, namely traffic prediction, data
unloading, and attack detection. The detection of attacks in mobile edge computing
involves the use of a deep belief network (DBN) that has been optimized using
the barnacles mating optimizer (BMO) method, referred as BMO-DBN. They have
achieved an accuracy of 97.65% using BMO-DBN. In contrast, it was found that
the use of DBN resulted in a slightly decreased accuracy rate of 96.17%. Xun et al.
[22] performed tests using CNN and LSTM networks to build several models for
evaluating driving behavior in the context of edge network aided vehicle driving.
The training data in CNN exhibits an accuracy rate of 96.7% and a loss value of
0.189. Conversely, the LSTM model achieves an accuracy rate of 98.5% and a loss
value of 0.029. The accuracy rates for the CNN and LSTM models on the test
dataset are 90.2% and 95.1% respectively. Table 13.1 presents more studies that
specifically address the identification of attacks within the context of the IoT in
healthcare environments.
Based on the above-mentioned studies, several issues have been observed. Firstly,
it has been shown that anomaly detection by statistical approaches requires an
adequate amount of repetitions to effectively train the model. Additionally, the
threshold used for detecting complex attacks may not be appropriate for real-world
scenarios. Another limitation of different proposed approaches pertains to the decline
in performance observed in IDS when the network experiences high levels of traffic
congestion. Several frameworks demonstrate suboptimal performance when it comes
to identifying complex attack. One of the primary limitations in the field of the MCPS
is the scarcity of publically available data that can accurately represent cyber-attacks
targeting this domain. To address the existing research gap, this study introduces a
cyber-attack detection system that utilizes a DL approach. The system is designed to
identify a variety of cyber-attacks, including DoS attacks, ARP Spoofing, Smurf
attacks and, Nmap PortScan within the context of the MCPS. Additionally, the
system incorporates the capability to perform multi-class classification, enabling
it to determine the specific type of attack associated with a given malicious event.
The decision tree (DT) approach is a frequently used technique in the field of data
mining, utilized for the creation of classification systems that rely on multiple vari-
ables. Furthermore, it is used for the development of prediction algorithms that seek
to anticipate results for a certain target variable. The suggested approach entails
categorizing a given population into segments that resemble branches, leading to the
creation of an inverted tree structure. This structure has a primary node, interme-
diary nodes, and terminal nodes. The method used in this study is non-parametric
in nature, allowing it to effectively handle extensive and complex datasets without
requiring for a complex parametric framework. When the sample size reaches a
certain size, it becomes possible to partition research data into distinct training and
validation datasets. The training dataset is used for constructing a DT model, whereas
the validation dataset is used to find the right tree size for the best possible model.
354 P. B. Dash et al.
The first step in the DT classifier involves the calculation of the entropy of the
given database. The metric provides an indication of the level of uncertainty present in
the database. A decrease in the magnitude of the uncertainty value corresponds to an
improvement in the quality of the categorization outcomes. The information gain of
each feature is computed. This subsequently provides the uncertainty that diminishes
after the partitioning of the database. At last, the calculation of information gain is
performed for each feature, and thereafter, the database is partitioned based on the
features that exhibit significant information gain. The procedure mentioned above
is iteratively executed until all nodes have been successfully organized. Equation
(13.1) provides a mathematical representation of the DT.
Nlea f
f (X ) = Yk ∗ Ilea f (X, k) (13.1)
k=1
where, Nlea f = number of leaf nodes in the DT, Yk = outcome associated with the
kth leaf, Ilea f (X, k) = indicator function.
Random forests (RF) are an ensemble learning approach that combines several tree
predictors. Every tree inside the forest is created by using random vectors which is
independently generated from a uniform population for all trees. The convergence
of the generalization error of random forests gets more clear and precise as the total
number of trees in the forest approaches infinity. The generalization error of a forest
consisting of tree classifiers is influenced by the performance of each individual tree
within the forest and the degree of connection among them. The use of a stochastic
selection of characteristics for partitioning each node results in the occurrence of
error rates. The RF model is represented in Equation (13.2).
∧
Z = mode( f 1 (x), f 2 (x), . . . , f n (x)) (13.2)
∧
where, Z = Final prediction of RF, f n (x) = Prediction of nth decision trees.
∧ K
Z = sign αk · h k (x) (13.3)
k=1
where, αk = 21 ln 1−ε
εk
k
weight importance of kth weak learner, h k (x) = prediction
of kth weak learner with input x, εk = weight error of weak learner.
13.3.4 GBoost
∧ K
Z= η.h k (x) (13.4)
k=1
where, h k (x) = prediction of kth weak learner with input x, η = learning rate hyper
parameter controlling the step size during each update.
13.3.5 XGBoost
It has been proven that XGBoost is a superior ML method due to its effective imple-
mentation of gradient-boosted decision trees. It has been implemented to make
the most efficient use of memory and the available processing power. XGBoost
reduces execution time while improving performance when compared to other ML
approaches and even DL approaches. The primary goal of boosting is to construct
sub-trees from a parent tree in such a way that error rates of each successive tree are
less than those of the parent tree. In this method, the new sub trees will revise the old
residuals to lower the cost function’s inaccuracy. Equation (13.5) shows the working
principle of XGBoost.
N T
Xgb(θ ) = L(yi , pi ) + ( f k ) (13.5)
i=1 k=1
356 P. B. Dash et al.
where, L(yi , pi )= Loss function with yi , pi denoting actual target value and
predicted value from weak learner respectively, ( f k ) = regularization term for
kth trees.
13.3.6 CatBoost
The Catboost method is a notable example of a gradient boosting technique that has
gained popularity in recent years. Catboost is a versatile ML algorithm that addresses
both regression and classification tasks. It has gained attention due to its inclusion in
a recently developed open-source gradient boosting library, which is freely available
and compatible with several platforms. The Catboost algorithm and gradient boosting
use DTs as a primary weak learner, using a sequential fitting approach. The use of
inconsistent permutation of gradient learning information has been proposed as a
means to improve the performance of the Catboost model and mitigate the issue of
over fitting. Equation (13.6) shows the working principle of XGBoost.
In recent years, CNN models have been widely used in the field of computer vision,
image classification, segmentation, detection, and natural language processing. This
is primarily due to the higher performance shown by these models, which may be
attributed to the effective utilization of multiple DL methodologies. CNN has been
extensively used in the healthcare domain. CNN often known as covnet is a specific
type of neural network that may possess shared parameters. CNN is composed of
several layers, through which each layer has the ability to convert one volume into
another volume using differentiable functions. The CNN design consists of consecu-
tive layers of convolution and pooling, with the inclusion of at least one fully linked
layer at the final stage.
The input layer stores the original image data. To compute the output value, the
convolution layer performs a dot product calculation between each filter and each
patch of the image. The activation function can be implemented in convolution layer
as well as in other layers of CNN. Various activation functions may be used, such
as Rectified Linear Unit (ReLU), Sigmoid, Softmax, Leaky ReLU, and Hyperbolic
Tangent (Tanh) among others. The pool layer is responsible for decreasing the volume
13 Intelligent Biometric Authentication-Based Intrusion Detection … 357
size and enhancing computational efficiency. The insertion of this component into
CNN serves as a primary purpose for minimizing the occurrence of over fitting.
Pooling layers may be implemented using either max pooling or average pooling
techniques. The fully connected layer, also known as a normal neural network layer
receives input from the preceding layer. The primary aim of this system is to calculate
the results for each class resulting in a one-dimensional array with a size equivalent to
the number of classes. The overall architecture of the CNN is represented in Fig. 13.2.
The architectural framework of CNN is illustrated as follows.
where, Cov p,q,r = value at position ( p.q) in the r − th feature map of the output
convolution, Fh = height of filter, Fw = width of the filter, Dp = depth of input,
W = weight of filter at specific position, I = input, Bk = bias term.
This layer constitutes the last layers inside the network, which is employed for the
purpose of data classification. The outcomes of the pooling and convolutional layer
is transformed into a flat shape and then delivered into the fully connected layer.
This node is situated either at the endpoint position or inside the interconnections
of Neural Networks. The selection of a suitable activation function has the potential
to enhance the efficiency of the learning process. There are several forms of activa-
tion functions, including ReLU, Logistic (Sigmoid), Tanh and softmax. The ReLU
function is often used in hidden layers because of its ease in implementation and
effectiveness in mitigating the limitations of other activation functions, such as Tanh
and Sigmoid. It is essential to recognize that the model exhibits considerable less
sensitivity to vanishing gradients, so effectively minimizing potential training issues.
There are many distinct forms of cyber-assaults, such as Denial of Service (DoS)
attacks, Nmap attacks, ARP Spoofing and Smurf assaults have been observed in the
environment of the MCPS. The two stages included in this study are the data prepro-
cessing phase and the CNN-based assault detection phase. The following sections
describe the sequential steps involved in the implementation and evaluation of the
13 Intelligent Biometric Authentication-Based Intrusion Detection … 359
CNN model: (i) The ECU IoHT dataset is employed for the analysis of different cyber-
attacks. (ii) Preprocessing techniques such as missing value imputation, elimination
of duplicate records, normalization, and managing imbalanced data have been imple-
mented. (iii) The dataset has been labeled with categories including Normal, DoS
attack, ARP Spoofing, Smurf attack and Nmap attack in order to prepare for multi-
class classification. (iv) The dataset is partitioned into two subsets as the training
dataset and the testing dataset with proportions of 80% and 20% respectively. (v)
The CNN is evaluated on the training samples by selecting these labels as target
features using multiclass classification which produces a good examined model. (vi)
The CNN model that has been trained is evaluated using a separate testing dataset in
order to provide predictions about normal or other forms of assaults.
The suggested CNN has a deep architecture consisting of four hidden layers.
These hidden levels include two convolutional layers and two pooling layers. The
network comprises of two convolutional layers, each of which is trained using 64
and 128 convolution kernels respectively. The size of these kernels is 3 x 3. The deep
architecture includes a completely linked layer that employs five distinct neurons to
facilitate the process of categorization. In order to accomplish average pooling with
a factor of 2, two pooling layers are used. The objective of intrusion detection in
MCPS may be seen as a classification problem, so the deep architecture incorporates
the softmax activation function. Table 13.2 illustrates the configuration of hyper
parameters used in the proposed model. Figure 13.3 depicts the proposed model
framework.
This section presents a comprehensive explanation of the dataset used for the
proposed study, as well as the experimental setup employed for developing the model.
360 P. B. Dash et al.
The ECU-IoHT dataset has been created inside an Internet of Healthcare Things
(IoHT) setting with the objective of supporting the healthcare security community in
the detection and prevention of cyber-attacks against IoHT systems. The dataset was
generated using a simulated IoHT network that included several medical sensors.
Many different kinds of assaults were executed against the network in consideration.
13 Intelligent Biometric Authentication-Based Intrusion Detection … 361
The dataset included collected and stored network activities, which were represented
as attributes that characterized each network flow. Tables 13.3 and 13.4 present overall
summary of features and various attack type of the dataset. The dataset comprises
23,453 instances of normal activities and 87,754 instances of attacks, which are
further classified into four distinct categories including Smurf attacks, Nmap Port
Scans, ARP Spoofing, and DoS attacks. There are 111,207 observations in the dataset,
including both normal and different attacks type.
This work extensively used several data preparation approaches, such as missing
value imputation, oversampling, label encoding, and normalization. IoT sensors in
the network environment may provide missing values or erroneous data for a short
span of time due to sensor failure. A strategy for imputing missing values has been
used to improve the data’s dependability for the model. Moreover, this dataset has
a textual property that provides a description of the network activity linked to each
input. The label encoding approach has been used due to the presence of string values
in each feature vector. The dataset contains many categories of attacks. Nevertheless,
several assault categories have a lower frequency in comparison to the prevailing
Smurf attack. Therefore, the dataset demonstrates a inequality in class distribution.
In order to address this issue, the random oversampling approach has been used for
362 P. B. Dash et al.
this study. The min-max scalar approach has been applied to standardize the dataset
by ensuring that all attribute values are in the same scale.
The current research used a Python notebook accessible on the Google Colab plat-
form, which derived from the computational capabilities of GPU-based servers.
Furthermore, the experiment used the Keras and Tensor Flow libraries. The experi-
mental setup consist of a system configuration of an Intel Core i7 central processing
unit (CPU), 16 gigabytes (GB) of random access memory (RAM), and a 64-bit
Windows 10 operating system. The CPU operated at a clock speed of 2.20 gigahertz
(GHz). The process of data analysis is performed using the Python packages Pandas,
Imblearn, and Numpy. Data visualisation is conducted via the use of Matplotlib and
Mlxtend.
The suggested model’s detection accuracy has been evaluated using several metrics.
Common metrics used for evaluations are accuracy, recall, precision, ROC-AUC and
F1-score. The following four variables affect these metrics:
• True Positive (T p): It shows the percentage of fraudulent network traffic
observations in MCPS that were successfully identified by the methodology.
• True Negative (T n): It indicates the proportion of regular network traffic
observations in MCPS that the model accurately categorized as normal.
• False Positive (F p): It indicates how many apparently normal occurrences in
MCPS network traffic were wrongly identified as harmful by the methodology.
• False Negative (Fn): It reveals how many harmful observations the MCPS
network traffic model incorrectly categorized as normal.
Using the above parameters, the following metrics are derived for evaluation:
(a) Accuracy: It specifies how many cases were successfully categorized by the
model relative to how many observations are in the testing set. It incorporates
both T p and T n into its model accuracy calculations as given in Eq. (13.9).
(b) Precision: It is the ratio of the number of attacks observed to the number of
observations that the model has labeled as attacks as given in Eq. (13.10).
(c) Recall: The recall rate is defined as the proportion of correct predictions made
for an anomalous event as given in Eq. (13.11).
(d) F1-Score: Its primary use is in case of imbalanced class distribution, since it
is more beneficial than accuracy because it factors in F p and Fn as written in
Eq. (13.12).
(e) ROC-AUC: This measure indicates the probability that a randomly chosen posi-
tive test point would be anticipated more positively than a randomly picked
negative test point.
very high accuracy rate of 99.60%, which was the second highest among the many
conventional approaches used in the study. The AdaBoost method exhibited the
lowest accuracy of 74.89%, while the DT achieved an accuracy of 89.16%.
Fig. 13.6 a–e Evaluation of the suggested method’s metrics in relation to competing models
366 P. B. Dash et al.
Table 13.5 Evaluation of the suggested model’s metrics in comparison to the results of other
competing models
Model Precision Recall F1-score ROC-AUC Accuracy
DT 69.01 71.13 70.03 85.33 89.16
RF 98.72 98.94 98.83 99.35 99.60
AdaBoost 83.05 86.19 79.07 73.54 74.89
GBoost 98.63 95.74 97.07 98.23 97.98
XGBoost 98.89 98.16 98.51 98.97 99.19
CatBoost 99.04 98.09 98.55 99.12 99.30
Proposed CNN 99.55 99.59 99.57 99.91 99.74
The findings for the area under the receiver operating characteristic (AUC-ROC)
curves of the proposed model and other standard methods are shown in Figure 13.7a–
g. Based on the shown figure, it can be noted that the CNN model provided in this
study achieved a perfect AUC-ROC score of 1.00 for 0, 1, 2, and 4 class labels and
0.99 for class 3. There are more misclassified data in the RF approach compared to
the proposed framework, despite the fact that the RF model has a superior AUC-ROC
performance. This performance of proposed CNN surpasses that of other standard
techniques, suggesting that the recommended strategy is very successful in accurately
classifying all occurrences in the dataset when compared to other methods. The
suggested technique achieved perfect classification accuracy, as shown by the micro-
368 P. B. Dash et al.
and macro-average ROC curve values of 1.00, indicating that all occurrences were
properly categorized.
The confusion matrix (CM) has been implemented as a subsequent evalua-
tion criteria. The CM is also known as the error matrix produced and utilized to
examine the performance of advanced ML methodologies and DL methodologies.
Figure 13.8a–g displays the classification outcomes for DT, RF, AdaBoost, GBoost,
XGBoost and CatBoost ensemble learning techniques using Confusion Matrix. The
rows of the confusion matrix shows to the prediction labels, while the columns
denoting to the actual labels. Consequently, it can be concluded from the confu-
sion matrix findings that the CNN learning approach exhibits superior classification
performance in comparison to other ML approaches.
Furthermore, the study also included an analysis of the outcomes achieved through
the application of the suggested approach along with the findings obtained from
previous research works relevant to the detection of attacks in the MCPS environment,
as shown in Table 13.6. Based on the data shown in the table, it can be concluded
that our suggested technique has demonstrated superior accuracy in comparison to
other intelligent methods used in previous research for the categorization of advanced
attacks in MCPS.
13.7 Conclusions
This study introduces a CNN methodology that improves the identification and mini-
mizing of cyber-attacks in MCPS devices. The proposed strategy aims to enhance the
security of healthcare gadgets that use the IoT technology. The proposed system has
been developed with a specific emphasis on multi-class classification for the purpose
of identifying DoS attacks, ARP Spoofing, Smurf attacks, and Nmap attacks. This
is in contrast to the existing system, which is based on binary class classification
and is used to detect a range of attack types. Finally, the proposed system has been
evaluated using the health care domain dataset (ECU-IoHT), which sets it apart from
previous approaches. The experimental findings demonstrate that the suggested CNN
approach achieves a significantly greater accurate identification rate and a minimal
false detection rate in comparison to the existing method. The suggested model
achieved accuracy over 99% after undergoing training with 100 epochs. The recom-
mended method achieved accuracy, recall, and F1 score values of 99.35%, 99.19%,
and 99.27%, respectively. This observation highlights the superiority of the suggested
system in comparison to existing work. As this study evaluated with less feature data
samples, the proposed model might be offer less accuracy with higher dimension
dataset as complexity of network leads to over fitting for handling more complex
features. In the future, the suggested system has the potential to be implemented
for the purpose of evaluating its effectiveness in a real-time MCPS environment
with high dimensionality dataset. Additionally, efforts will be made to enhance the
scalability of this research in order to identify various forms of attacks on MCPS
devices.
13 Intelligent Biometric Authentication-Based Intrusion Detection … 369
(a) DT
(b) RF
(c) AdaBoost
(d) GBoost
(e) XGBoost
(f) CatBoost
(a) DT
(b) RF
(c) AdaBoost
372 P. B. Dash et al.
(d) GBoost
(e) XGBoost
(f) CatBoost
13 Intelligent Biometric Authentication-Based Intrusion Detection … 373
References
1. Qiu, H., Qiu, M., Liu, M., Memmi, G.: Secure health data sharing for medical cyber-physical
systems for the healthcare 4.0. IEEE J Biomed Health Inform 24(9):2499–2505 (2020)
2. Adedeji, K.B., Hamam, Y.: Cyber-physical systems for water supply network management:
Basics, challenges, and roadmap. Sustainability 12(22), 9555 (2020)
3. Jamwal, A., Agrawal, R., Manupati, V. K., Sharma, M., Varela, L., Machado, J.: Development
of cyber physical system based manufacturing system design for process optimization. In IOP
Conference Series: Materials Science and Engineering (Vol. 997, No. 1, p. 012048). IOP
Publishing (2020)
4. Cartwright, R., Cheng, A., Hudak, P., OMalley, M., Taha, W. (2008, November). Cyber-
physical challenges in transportation system design. In National workshop for research on
high confidence transportation Cyber-physical systems: automotive, aviation & rail.-2008.
5. Ahmad, M.O., Ahad, M.A., Alam, M.A., Siddiqui, F., Casalino, G.: Cyber-physical systems
and smart cities in India: Opportunities, issues, and challenges. Sensors 21(22), 7714 (2021)
6. Wang, Eric Ke, et al.: A deep learning based medical image segmentation technique in Internet-
of-Medical-Things domain. Future Generation Computer Systems 108 (2020): 135–144
7. Shuwandy, M.L. et al.: mHealth authentication approach based 3D touchscreen and microphone
sensors for real-time remote healthcare monitoring system: comprehensive review, open issues
and methodological aspects. Comput. Sci. Rev. 38 (2020): 100300
8. Kim, J., Campbell, A.S., de Ávila, B.E.-F., Wang, J.: Wearable biosensors for healthcare
monitoring. Nature Biotechnol. 37(4), 389–406 (2019)
9. Choudhuri, A., Chatterjee, J.M., Garg, S.: Internet of things in healthcare: A brief overview.
In: Internet of Things in Biomedical Engineering, Elsevier, pp. 131–160 (2019)
10. Priyadarshini, R., Panda, M.R., Mishra, B.K.: Security in healthcare applications based on fog
and cloud computing, Cyber Secur. Parallel Distributed Comput. 231–243 (2019)
11. Yaacoub, J.-P.A., Noura, M., Noura, H.N., Salman, O., Yaacoub, E., Couturier, R., Chehab, A.:
Securing internet of medical things systems: Limitations, issues and recommendations, Future
Gener. Comput. Syst.. Syst. 105, 581–606 (2020)
12. Ahmed, M., Byreddy, S., Nutakki, A., Sikos, L., Haskell-Dowland, P.: ECU-IoHT (2020)
10.25958.5f1f97b837aca
13. M. Begli, F. Derakhshan, H. Karimipour, A layered intrusion detection system for critical
infrastructure using machine learning, in: 2019 IEEE 7th International Conference on Smart
Energy Grid Engineering, SEGE, IEEE, 2019, pp. 120–124.
14. A.I. Newaz, A.K. Sikder, M.A. Rahman, A.S. Uluagac, Healthguard: A machine learning-based
security framework for smart healthcare systems, in: 2019 Sixth International Conference on
Social Networks Analysis, Management and Security, SNAMS, IEEE, 2019, pp. 389–396.
15. He, D., Qiao, Q., Gao, Y., Zheng, J., Chan, S., Li, J., Guizani, N.: Intrusion detection based on
stacked autoencoder for connected healthcare systems. IEEE Netw.Netw. 33(6), 64–69 (2019)
16. Alrashdi, I., Alqazzaz, A., Alharthi, R., Aloufi, E., Zohdy, M.A., Ming, H., FBAD: Fog-
based attack detection for IoT healthcare in smart cities, in,: IEEE 10th Annual Ubiqui-
tous Computing. Electronics & Mobile Communication Conference, UEMCON, IEEE 2019,
0515–0522 (2019)
17. Hady, A.A., Ghubaish, A., Salman, T., Unal, D., Jain, R.: Intrusion detection system for
healthcare systems using medical and network data: a comparison study. IEEE Access. 8,
106576–106584 (2020). https://doi.org/10.1109/ACCESS.2020.3000421
18. Susilo, B., Sari, R.F.: Intrusion Detection in IoT Networks Using Deep Learning Algorithm.
Information 11, 279 (2020)
19. Ibitoye, O., Shafiq, O.; Matrawy, A. Analyzing Adversarial Attacks against Deep Learning for
Intrusion Detection in IoT Networks. In Proceedings of the 2019 IEEE Global Communications
Conference (GLOBECOM),Waikoloa, HI, USA, 9–13 December 2019; pp. 1–6
20. Hizal, S., Çavuşoğlu, Ü., Akgün, D.: A new Deep Learning Based Intrusion Detection
System for Cloud Security. In: 3rd International Congress on Human-Computer Interaction,
Optimization and Robotic Applications (2021)
13 Intelligent Biometric Authentication-Based Intrusion Detection … 375
21. Gopalakrishnan, T. et al.: Deep Learning Enabled Data Offloading With Cyber Attack Detection
Model in Mobile Edge Computing Systems. IEEE Access (2020)
22. Xun Y, Qin J, Liu J.: Deep Learning Enhanced Driving Behavior Evaluation Based on Vehicle-
Edge-Cloud Architecture. IEEE Transactions on Vehicular Technology (2021)
23. Alkadi, O., Moustafa, N., Turnbull, B., Choo, K.-K.R.: A Deep Blockchain Framework-enabled
Collaborative Intrusion Detection for Protecting IoT and Cloud Networks. IEEE Internet Things
J. 8, 1 (2020)
24. Ge, M.; Fu, X.; Syed, N.; Baig, Z.; Teo, G.; Robles-Kelly, A. Deep Learning-Based Intrusion
Detection for IoT Networks. In Proceedings of the 2019 IEEE 24th Pacific Rim International
Symposium on Dependable Computing (PRDC), Kyoto, Japan, 1–3 December 2019; pp. 256–
25609.
25. Samy, A., Yu, H., Zhang, H.: Fog-Based Attack Detection Framework for Internet of Things
Using Deep Learning. IEEE Access 8, 74571–74585 (2020)
26. Parra, G.D.L.T., Rad, P., Choo, K.-K.R., Beebe, N.: Detecting Internet of Things attacks using
distributed deep learning. J. Netw. Comput. Appl.Netw. Comput. Appl. 163, 102662 (2020)
27. Farsi, M.: Application of ensemble RNN deep neural network to the fall detection through IoT
environment. Alex. Eng. J. 60, 199–211 (2021)
28. Shobana, M., Poonkuzhali, S.: A novel approach to detect IoT malware by system calls using
Deep learning techniques. In Proceedings of the 2020 International Conference on Innovative
Trends in Information Technology (ICITIIT), Kottayam, India, pp. 1–5 (2020)
29. Manimurugan, S., Al-Mutairi, S., Aborokbah, M.M., Chilamkurti, N., Ganesan, S., Patan, R.:
Effective attack detection in internet of medical things smart environment using a deep belief
neural network. IEEE Access 8, 77396–77404 (2020)
30. Hussain, Faisal, et al. A framework for malicious traffic detection in IoT healthcare
environment. Sensors 21.9 (2021): 3025
31. Saheed, Yakub Kayode, and Micheal Olaolu Arowolo. Efficient cyber-attack detection on the
internet of medical things-smart environment based on deep recurrent neural network and
machine learning algorithms. IEEE Access 9 (2021): 161546–161554
32. Zachos, G., et al. An Anomaly-Based Intrusion Detection System for Internet of Medical
Things Networks. Electronics 2021, 10, 2562.“ (2021)
33. Vijayakumar, Kedalu Poornachary, et al.: Enhanced Cyber Attack Detection Process for Internet
of Health Things (IoHT) Devices Using Deep Neural Network.“ Processes 11.4 (2023): 1072
Chapter 14
Current Datasets and Their Inherent
Challenges for Automatic Vehicle
Classification
Sourajit Maity, Pawan Kumar Singh, Dmitrii Kaplun, and Ram Sarkar
S. Maity · R. Sarkar
Department of Computer Science and Engineering, Jadavpur University, Kolkata 700032, India
e-mail: [email protected]
R. Sarkar
e-mail: [email protected]
P. K. Singh (B)
Department of Information Technology, Jadavpur University, Kolkata 700106, India
e-mail: [email protected]
D. Kaplun
Department of Automation and Control Processes, Saint Petersburg Electrotechnical University
“LETI”, Saint Petersburg, Russian Federation 197022
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 377
J. Nayak et al. (eds.), Machine Learning for Cyber Physical System: Advances
and Challenges, Intelligent Systems Reference Library 60,
https://doi.org/10.1007/978-3-031-54038-7_14
378 S. Maity et al.
14.1 Introduction
Although the field of AVC has recently attracted the attention of several researchers,
significant improvements in the vicinity of designing systems that are resilient to
real-world situations are yet to be made. Taking all these into account, it is necessary
to ensure the sound performance of this system in real-life scenarios. Only a few
AVC-related research attempts have been made such as Siddiqui et al. [9] Kansas
et al. [12], Yuan et al. [13], Sochor et al. [14] and Bharadwaj et al. [15]. However,
all these studies hardly provide any specific instruction for selecting a dataset for an
individual model. Besides, the foremost state-of-the-art for the specific dataset is not
available in these research articles.
Datasets used for classification in AVC purposes can be further categorized in terms
of the types of captured images and videos into the following: (a) aerial image-
based vehicle datasets, (b) frontal image-based vehicle datasets, and (c) video-based
vehicle datasets. All the datasets based on aerial images and videos of cars, buses,
vans, motorbikes, and many other vehicles that are taken from any front/rear camera
on public roads are mentioned here. Figure 14.1 shows the distribution of the datasets,
available in the AVC domain.
3, 10% 5, 17%
Aerial view Image based
datasets
Frontal Image based
datasets
21, 73%
Video based datasets
Fig. 14.1 Illustrating the distribution of datasets available in the AVC domain
380 S. Maity et al.
Sochor et al. [14] developed a dataset, called BoxCars, consisting of 63,750 images
(21,250 vehicles of 27 different makes) collected from surveillance cameras. This
dataset contained images captured from the front side of the vehicle (similar to
images presented in Fig. 14.2) as well as images of passing vehicles, collected from
surveillance cameras mounted near streets. They collected three images for each
correctly detected vehicle as the vehicle passed the surveillance camera. The vehicles
were divided into 3 distinct categories such as (a) 102 make and model classes, (b)
126 make and model + sub-model classes, and (c) 148 make and model + sub-model
+ model year classes. A few sample images from this dataset are shown in Fig. 14.2.
Elkerdawy et al. [16] achieved a classification accuracy of 86.57% using ResNet152
+ co-occurrence layer (COOC) model in 2019 on this dataset.
Table 14.1 List of aerial image-based datasets available for developing AVC systems
Dataset #Vehicle #Images Released Availability Research work Download link
classes
BoxCars 27 63,750 2019 Free Elkerdawy et al. https://github.
[14] [16] com/JakubS
ochor/BoxCars
Bharadwaj 4 66,591 2016 Available on Bharadwaj et al. https://dl.acm.
et al. [15] request [15] org/doi/10.
1145/3009977.
3010040
MIO TCD 11 648,959 2017 Free Jung et al. [18], https://tcd.mio
[17] Kim et al. [19], vision.com/
Lee et al. [20]
BIT 6 9850 2015 Free Dong et al. [21, http://iitlab.bit.
vehicle 22] edu.cn/mcislab/
[21] vehicledb
14 Current Datasets and Their Inherent Challenges for Automatic Vehicle … 381
Fig. 14.3 Sample RGB images under the 4 vehicle classes—Clockwise from top—‘Auto Rick
shaws’, ‘Heavy Vehicles’, ‘Two Wheelers’, and ‘Light Vehicles’ (taken from [15])
Bharadwaj et al. [15] compiled a dataset with surveillance-quality images that were
collected from video clips of a traffic junction in an Indian city. They used a widely-
accepted classification scheme where the vehicles were classified as ‘Auto Rick-
shaw’, ‘Light’, and ‘Two-wheeler’. However, there was not any proper distinc-
tion between the vehicles of ‘Light’ and ‘Heavy’ classes due to the interchange-
ability of the vehicle after customizations. Vehicles of the ‘Three-wheeler’ class
with minor modifications were classified as ‘freight’ vehicles, which should fall
under the ‘Heavy’ category. Figure 14.3 represents some sample images taken from
this dataset comprising various vehicle classes. The average F-score measure for this
dataset using Caffenet + SVM method was found to be 87.75%.
Luo et al. [17] introduced the “MIO vision Traffic Camera Dataset” (MIO-TCD),
which is based on classification of motor vehicles as well as localization from a single
382 S. Maity et al.
Fig. 14.4 Few images from each of the 11 classes taken from the MIO-TCD dataset
Fig. 14.5 Sample images from BIT-vehicle dataset [Images taken from [21]]
BIT-Vehicle dataset, developed by Dong et al. [21], includes 9,850 vehicle images
with approximately 10% of images taken under night conditions. Figure 14.5 shows
some sample images of this dataset. The images (with sizes of 1600 × 1200 and 1920
× 1080) in the dataset were captured from two cameras installed at different times
and places. All vehicles in the dataset were divided into six categories as: ‘Minivan’,
‘Sedan’, ‘SUV’, ‘Microbus’, ‘Bus’, and ‘Truck’. For each vehicle class, 200 samples
were randomly selected for training the Softmax parameters, and 200 images were
used as test samples. Dong et al. [21] obtained an accuracy of 96.1% using the sparse
Laplacian filter learning (SLFL) method [23].
14.2.1.5 Summarization
Therefore, the authors achieved an accuracy close to about 100%, which is difficult
to achieve in a real-life scenario.
Vehicle images can be classified on the basis of type, model, make, or mix of all these
characteristics. The datasets related to vehicles that are taken from any front or rear
camera on public roads are mentioned here. In Table 14.3, the list of datasets used for
developing a front-view image-based AVC system is given and detailed information
related to each dataset is discussed below.
The Stanford Cars dataset was designed by Krause et al. [24] to recognize the make
and model of cars. It is a compilation of 16,185 rear images of cars (of size 360 ×
240) divided into 196 classes. The entire data is almost equally divided into a train/
test split with 8,144 images for training and 8,041 images for testing purposes. Using
the domain adaptive transfer learning technique model on this dataset Ngiam et al.
[25] achieved an accuracy of 96.8%. Some images taken from the Stanford Cars
dataset are presented in Fig. 14.6.
Yang et al. [27] developed the “CompCars” dataset, which covered different car
views, showing different internal as well as external parts. The dataset has two types
of image sets, a surveillance image set and a web image set. The web image set is
a collection of images, taken from car forums, search engines, and public websites,
and the surveillance set images were collected by surveillance cameras. The web-
image data contained 1,36,727 images of the entire car and 27,618 images featuring
car parts of 161 car makes with 1,687 car models. The surveillance-image data had
50,000 car images captured from the front view. The dataset can be used for (a)
Fine-grained classification, (b) Attribute prediction, and (c) Car model verification
and also for image ranking, multi-task learning, and 3D reconstruction. Yu et al. [30]
obtained an accuracy of 99% using K-means with the VR-COCO method.
Table 14.3 Frontal image-based datasets available for developing AVC systems
Dataset #Vehicle #Images Released Availability Research work Download
classes link
Stanford Cars 196 16,185 2013 Free Ngiam et al. https://ai.
[24] [25], Ridnik stanford.
et al. [26] edu/~jkr
ause/cars/
car_dataset.
html
CompCars 163 1,36,726 2015 Free Hu et al. [28], http://
[27] Tanveer et al. mmlab.ie.
[29], Yu et al. cuhk.edu.
[30] hk/datasets/
comp_cars/
index.html
Frontal-103 103 65,433 2022 Free Lu et al. [31], https://git
[31] hub.com/
vision-ins
ight/Fronta
l-103
Liao et al. 8 1482 2015 Paid Liao et al. [32] https://en.
[32] whu.edu.
cn/Resear
ch1/Res
earch_Cen
tres.htm
Side Profile 86 10,000 2015 Free Boyle et al. [33] http://www.
dataset [33] cvg.rea
ding.ac.
uk/rvd
Novel car 14 1904 2011 Free Stark et al. [34] https://
type [34] www.mpi-
inf.mpg.de/
depart
ments/com
puter-vis
ion-and-
machine-
learning/
public
ations
FG3DCar 30 300 2014 Free Lin et al. [35] https://
[35] www.
cmlab.csie.
ntu.edu.tw/
~yenliang/
FG3DCar/
(continued)
386 S. Maity et al.
and rear. Therefore, after the annotation, there were eight groups of vehicle images
in total from each of the viewpoints. Lu et al. [51] achieved an accuracy of 91.2%
using both the pre-trained ResNet 50 and DenseNet121 models. Some sample images
collected from the Frontal-103 dataset are presented in Fig. 14.7.
388 S. Maity et al.
Fig. 14.6 Some sample images taken from the Stanford Cars dataset
Fig. 14.7 Sample image samples found in the Frontal-103 dataset. The dataset includes images of
frontal view under variable weather and lighting conditions
Liao et al. [32] presented a large-scale dataset compiling vehicle images that were
captured from the front view using monitoring cameras fixed on the road. A total of
1482 vehicles were annotated from the images into eight categories and the number
of images present in each category are shown in Fig. 14.8. Some sample images of all
eight classes of vehicles are shown in Fig. 14.9. Liao et al. [32] achieved an accuracy
of 93.3% with a part-based fine-grained vehicle categorization method.
Volkswagen,
145
Chevrolet,
331
Audi, 196
BMW, 112
Fig. 14.8 Number of annotated images for different vehicles proposed by Liao et al. [32]
14 Current Datasets and Their Inherent Challenges for Automatic Vehicle … 389
Fig. 14.9 Image samples of all 8 vehicle categories proposed by Liao et al. [32]
Boyle et al. [33] proposed a public vehicle dataset, which has more than 10,000
side profile images of cars divided into 86 make/model and 9 subtype classes. The
vehicle subtypes and the total number of labeled images for each vehicle class are
represented as a pie chart in Fig. 14.10. They achieved high classification rates of
98.7% for subtypes and 99.7–99.9% for VMMR.
Sports, 189
Suv, 357
Van, 589
Fig. 14.10 Number of total vehicle subtypes for the Side Profile dataset [33]
390 S. Maity et al.
Stark et al. [34] introduced a novel dataset of fine-grained car types, compiling 1904
images of cars of 14 different vehicle classes, with class label and annotations, 2D
bounding boxes, and a viewpoint estimate. They used the Ford campus vision and
Lidar dataset [52, 53] for testing. Stark et al. [34] obtained an accuracy of 90.3%
with an ensemble of Histogram of oriented gradients (HOG), Locality-constrained
linear coding [54] (LLC), and struct DPM method.
Lin et al. [35] developed a fresh fine-grained 3D car dataset (FG3DCar), which
includes 300 images of 30 various automobile models under various viewing angles,
including those of a ‘pickup truck’, ‘hatchback’, ‘SUV’ and ‘crossover’. They manu-
ally included 64 landmark places in each car image. They manually annotated the
correspondences between the 2D projections of the visible 3D landmarks on the
image and changed the shape as well as posture parameters iteratively to reduce the
distance errors between the correspondences. The authors achieved an accuracy of
95.3% with the ensemble of GT alignment and (HOG/FV) feature vector method.
Tafazzoli et al. [36] presented the VMMR dataset which contains 2,91,752 images
with 9,170 classes, covering vehicle models manufactured between the years 1950
to 2016. They collected data from 712 areas covering all 412 subdomains of United
States metro areas from web pages (like Wikipedia, Amazon, etc.) related to vehicle
sales. This dataset contained diversified image data that was capable of representing
a wide range of real-life scenarios. Using ResNet-50 architecture, Tafazzoli et al.
[36] obtained 92.9% accuracy on this dataset.
Kuhn et al. [37] proposed a dataset called BRCars which is a compilation of around
300 K images gathered from a Brazilian vehicle advertising website. The dataset was
segregated into two parts namely BRCars-196 with 2,12,609 images and BRCars-427
with 3,00,325 images. The images contain 52 K car instances including views from
both the exterior as well as interior parts of the cars and have a skewed distribution
among 427 different models. Using the InceptionV3 architecture, Kuhn et al. [37]
obtained accuracies of 82% on the BRCars-427 dataset and 92% on the BRCars-196
dataset.
14 Current Datasets and Their Inherent Challenges for Automatic Vehicle … 391
For the classification of Bangladeshi native vehicle types, Hasan et al. [39] developed
a dataset consisting of 10,440 images of 13 common vehicle classes and also designed
a model based on transfer learning, incorporating data augmentation. Despite the
392 S. Maity et al.
changing physical properties of the vehicles, the proposed model achieved progres-
sive accuracy. The highest accuracy on this dataset is 98%. Sample images of this
dataset are shown in Fig. 14.12, whereas a bar-chart illustrating the data description
is shown in Fig. 14.13.
Fig. 14.12 Sample images from the Deshi-BD dataset representing each class [39]
Van
Rickshaw
Motorcycle
Easy Bike
Cng
Bus
Auto Rickshaw
0 200 400 600 800 1000
Fig. 14.14 Sample images of the combined LSUN + Stanford cars dataset [41]
Deshmukh et al. [40] proposed a DTLD that contained around 10,000 unordered
images including 9 types of Indian vehicles of traffic scenarios taken from different
camera angles. The images were captured under rainy and noisy weather. It has
10,000 images out of which 20% are used for model testing. This dataset yielded an
accuracy of 96.3% using the STVD [57] method with ST backbone.
To overcome the shortcomings of the LSUN car dataset containing 55,20,753 car
images, Kramberger et al. [41] created a dataset by combining the LSUN and the
Stanford car datasets. After the pruning, the new dataset had about 20,67,710 images
of cars with enhanced quality. The StyleGAN training on the combined LSUN-
Stanford car dataset was about 3.7% more advanced than training with just the
LSUN dataset. Therefore, it can be inferred that the LSUN-Stanford car dataset
is more consistent and better suited for training GAN neural networks than other
currently available large car datasets. Abdal et al. [42] achieved an accuracy of 99%
using the Detectron2 model on this dataset. Figure 14.14 shows the sample images
of the LSUN + Stanford cars dataset [41].
The car-159 dataset, developed by Sun et al. [44], comprised images of different
vehicle types captured either by camera or taken from online sources. The images
were captured from five viewpoints such as on the right ahead, on the rear side,
on the side, on the front side, right behind, and right side. The dataset had 8 vehicle
brands, 159 types of vehicle types, and 7998 images. The training set contained 6042
images, and the validation set had 1956 images. The authors obtained an accuracy
394 S. Maity et al.
of 85.8% using the fine-grained VTC [44] method. Some sample images from the
Car159 dataset are shown in Fig. 14.15.
Butt et al. [45] proposed a dataset different from the existing CompCars and Stanford
car datasets, which were mainly region-specific and were difficult to employ in a real-
time AVC system. To overcome these issues, vehicle images were extracted from
road surveillance and driving videos collected from various regions, and a dataset,
comprising 10,000 images with six common road vehicle classes was compiled
through manual labeling using the Windows editing tool, as presented in Fig. 14.16.
Sample images from this dataset are shown in Fig. 14.16. On this dataset, Butt et al.
[45] achieved an accuracy of 99.6% with a modified CNN model.
Fig. 14.16 Sample images of the dataset proposed by Butt et al. [45]
Peng et al. [47] compiled a dataset with images of passing vehicles on a highway
captured in different lighting conditions such as both daylights with sunny, partly
cloudy conditions and at night. All captured vehicles belonged to any of the five
classes namely ‘minivan’, ‘sedan’, ‘passenger car’, ‘bus’, and ‘truck’. They used
800 daylight and 800 nightlight images for training and a set of 500 daylight images
and 500 nightlight images for testing. By applying principal component analysis
(PCA) with the self-clustering method, Peng et al. [47] achieved an accuracy of 90%
in daylight and 87.6% for the cases of nightlight.
Khoba et al. [48] introduced the first FGVD dataset captured in the wild from a
camera mounted on a moving vehicle. The dataset has 5,502 scene images with 210
unique fine-grained labels of multiple vehicle types organized in a three-level hier-
archy. FGVD dataset introduced the new class labels for categorizing ‘two-wheelers’,
‘autorickshaws’, and ‘trucks’. The dataset also presented difficulties since it included
cars in complicated traffic situations with intra-class and inter-class changes in type,
scale, position, occlusion, and lighting. The images of each of the vehicle classes of
the FGVD dataset are shown in Fig. 14.18. This dataset has three levels of hierar-
chies for classification. Using the fine-tuned Hierarchical Residual Network (HRN)
model, Khoba et al. [48] obtained an accuracy of 96.6% in the level 1.
Avianto et al. [49] developed a dataset called the InaV-Dash dataset, consisting of a
total of 4,192 images with four vehicle makes and 10 vehicle models. The dashboard
camera was set to run at 60 frames per second with a full HD resolution of 1920 by
1080 pixels. The dataset was partitioned into a training set with 2934 images and
a testing set with 1258 images. The authors obtained an accuracy of 95.3% using
the ResNet50 CNN architecture. The blurry and hazed, as well as partially covered
images of the InaV-Dash dataset, are illustrated in Fig. 14.19.
Wang et al. [50] developed the ATOC dataset, which consisted of 840 images with
12 vehicle classes and 70 images for all classes. This dataset contained objects
under both normal and damaged status. The authors attained an accuracy of 87.08%
using a pre-trained deep convolutional network for feature extraction and SVM for
classification along with Saliency Masking (SM) + Random Sample Pairing (RSP)
methods.
14.2.2.21 Summarization
Vehicle classification using front images has become a fundamental need across
the world for the categorization and tracking of vehicles for security purposes and
for maintaining traffic congestion on roads. Many studies are available in this area,
among which several methods have also reached an accuracy of 100%. Though
the Stanford Cars dataset [24] presented a large number of classes, the training
images in the dataset were collected from online sources, and the number of images
was also very few. Hence, it may not be useful for deep learning models as, in
general, such models require a huge amount of data for better training. Additionally,
the images in the Frontal-103 [31] dataset that studied the VMMR problem, were
398 S. Maity et al.
not taken from real-life traffic scenarios. Liao et al. [32] presented a dataset for
car type categorization. However, they have considered very few types of vehicles,
which might not be useful for categorization in practical scenarios. FG3D car dataset
[35] considered 3D models for vehicles (cars), but there were only 300 images for
30 classes which was not sufficient for deep learning-based classification models.
LSUN + Stanford car dataset [41], Car-159 dataset, PoribohonBD [38], and VMMR
dataset had taken many images, but all of them were collected from online sources.
Additionally, there is no further scope for research if we use datasets with near about
100% accuracy such as the IRVD [46] dataset. The FGVD [48] dataset has a few
vehicle classes, which is not very helpful in AVC to develop a system for practical
purposes.
It has been already mentioned that vehicles can be classified based on type, model,
make, or a mix of all these characteristics. There are some important video datasets
commonly used to classify vehicles in terms of their type, make, and model. Some
research attempts based on videos of ‘cars’, ‘buses’, ‘vans’, ‘motorbikes’, and many
other vehicles that are taken from any rear camera on public roads are discussed in
this section. Table 14.4 lists the datasets used for video-based AVC.
Alsahafi et al. [61] proposed a dataset, containing over a million frames and 10
different vehicle classes belonging to specific make and model recognition of cars,
including ‘sedans’, ‘SUVs’, ‘convertibles’, ‘hatchbacks’ and ‘station wagons’. To
make the video dataset suitable for fine-grained car classification, they selected the
specific models based on the availability of review videos (both professional and
amateur) of recent car models. Each bounding box was labeled with one of vehicle
classes and those that did not fit any of the classes were labeled as ‘other’. Alsahafi
et al. [61] obtained an accuracy of 76.1% for RGB—25 frames using the Single Shot
Multibox Detector (SSD) + CNN.
14.2.3.4 Summarization
In recent times, deep learning-based models have mostly been used for image and
video classification purposes. This is also valid for the AVC task. These methods have
been generating state-of-the-art results. This section summarizes a few issues of deep
learning-based approaches along with some limitations of the existing datasets are
mentioned below:
• In general, a huge number of samples is required for training purposes in deep
learning-based models, and it needs a specialized GPU (graphics processing unit)
to train the model. Additionally, the processing of the data is also a difficult task
due to the unavailability of the required resources.
• Deep learning-based models take longer training time due to larger data density.
• Even if datasets are available, in many cases it is not perfectly processed and
annotated as it requires extreme human labor to appropriately annotate the data.
• In countries such as India, Bangladesh, or Pakistan, all roads are not as good as
those seen in developed countries of Europe or America. There is a lot of traffic
congestion and unnecessary traffic rules. Due to this, we see the overlap of one
14 Current Datasets and Their Inherent Challenges for Automatic Vehicle … 401
vehicle with another, and this makes the task of vehicle type classification from
the still images of such vehicles very difficult.
• Developed and developing countries have different traffic scenario conditions.
However, some datasets are available for developing countries that are not quite
suitable for all conditions.
• Multi-view or multi-modal datasets are not available for the classification of vehi-
cles. Much research and such data are required to develop a practical solution for
AVC.
• There are datasets where the images are collected from different websites or
Google. Sometimes, these images are significantly different from the real ones.
This issue makes such datasets irrelevant to research work.
• Some datasets have an accuracy of almost 100%, so there is no further scope for
research work with these datasets.
• Some datasets have very few classes, which are not appropriate for research works.
• The number of video datasets available for vehicle classification is very small.
Therefore, video datasets are needed for further research in vehicle classification
with video data.
With the above analysis, Table 14.5 lists the advantages and limitations of the
datasets used for solving the AVC problem.
Table 14.5 Advantages and limitations of the datasets used for AVC
Dataset type Advantage Limitation
Aerial image-based Aerial image-based datasets provide a Limited details can be found
datasets bird’s eye view. Aerial images record with this type of image. Aerial
the general flow and patterns of traffic, images depend on weather
including information on congestion, conditions
how the roads are used, and even how
many and what kinds of vehicles are
present
Frontal image-based The frontal view offers an unobscured The side and rear-view of the
datasets and unambiguous perspective of the vehicles are not shown in
vehicles, which facilitates precise frontal images, which also gives
identification and classification of a limited viewpoint. Frontal
various vehicle types views can be impeded by other
vehicles, objects, and the
surroundings, which makes
some areas of vehicles harder to
visualize
Video-based AVC Static images do not offer the same Compared to static images,
datasets dynamic context that videos do. Videos video data often requires more
record how vehicles move over time, bandwidth and storage space.
giving important temporal information. This might provide difficulties,
However, knowledge of traffic patterns, particularly when utilizing
dynamics of congestion, as well as either streaming apps or
variations in vehicle density throughout extensive surveillance systems
the day need to be specified
402 S. Maity et al.
It has already been mentioned that an ample amount of data is required for training,
testing, and validation purposes to obtain high accuracy by deep learning models. Not
only the availability of data but also correctly annotated and precisely processed data
as per model requisition are essential criteria for data collection. However, in the case
of AVC, very few datasets are available, and among them, the well-known datasets
are not freely available to the research community. Also, video-based classification
datasets are rarely available in the vehicle classification domain. Based on the above
discussion, some future research directions regarding AVC are highlighted in this
section.
(1) Lightweight models can be thought of considering the demand for IoT-based
technologies, as such models can be easily deployed to edge devices.
(2) Semi-supervised and/or Few-shot learning approaches can be used when we
have fewer annotated data for AVC.
(3) Data should be captured in all weather conditions and at various times of the
day.
(4) Images/videos should be taken from different angles to deal with overlapped
vehicles in heavy traffic regions.
(5) The diversity of data in terms of vehicle types, road conditions, and traffic
congestion is very important.
(6) The availability of video data is very much required to develop realistic AVC
systems.
(7) Availability of multi-modal data would help in designing practically applicable
systems.
(8) Data of the same vehicles at multiple locations are required for vehicle re-
identification for surveillance purposes.
(9) Frontal view-based datasets are specifically useful for license plate recognition;
driver behavior analysis and classification of vehicle make and model.
(10) Aerial image-based datasets are specifically useful for vehicle counting and
classification, parking lot management, and route planning of vehicles.
14.4 Conclusion
This study is an attempt to weigh the benefits and drawbacks of various datasets
available for vehicle classification. Although this survey is not exhaustive, researchers
may find it useful, as a guide to implement new methods or to update their past
methods to meet the needs of realistic AVC systems. Following our discussion of
AVC methods and available datasets, we have discussed some open challenges as
well as some intriguing research directions. In this survey paper, we have discussed
the datasets that are useful for vehicle classification. We have classified the datasets
into two parts based on still image data and video data. Still image datasets are
14 Current Datasets and Their Inherent Challenges for Automatic Vehicle … 403
further classified into two parts aerial image-based datasets and front image-based
datasets. We have estimated the accuracy of each dataset. We have also summarized
the advantages and drawbacks of using these datasets. We are trying to work on the
detection and segmentation of datasets in our future studies. Our findings suggest that
AVC research is still an unexplored domain that deserves more attention. We believe
that reviewing previous research efforts focusing on datasets will be beneficial in
providing a comprehensive and timely analysis of the existing inherent challenges
and potential solutions to this problem.
References
1. Kumar, C.R., Anuradha, R.: Feature selection and classification methods for vehicle tracking
and detection, J. Ambient Intell. Humaniz Comput, pp. 1–11 (2020)
2. Lee, H.J., Ullah, I., Wan, W., Gao, Y., Fang, Z.: Real-time vehicle make and model recognition
with the residual SqueezeNet architecture. Sensors 19(5), 982 (2019)
3. Maity, S., Bhattacharyya, A., Singh, P.K., Kumar, M., Sarkar, R.: Last Decade in Vehicle
Detection and Classification: A Comprehensive Survey. Archives of Computational Methods
in Engineering, pp. 1–38 (2022)
4. Zhang, J., Yang, K. and Stiefelhagen, R.: ISSAFE: Improving semantic segmentation in acci-
dents by fusing event-based data. In: 2021 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS), IEEE, pp. 1132–1139 (2021)
5. Buch, N., Cracknell, M., Orwell, J., Velastin, S.A.: Vehicle localisation and classification in
urban CCTV streams. Proceedings of 16th ITS WC, pp. 1–8 (2009)
6. Martínez-Cruz, A., Ramírez-Gutiérrez, K.A., Feregrino-Uribe, C., Morales-Reyes, A.: Security
on in-vehicle communication protocols: Issues, challenges, and future research directions.
Comput. Commun.. Commun. 180, 1–20 (2021)
7. Rathore, R.S., Hewage, C., Kaiwartya, O., Lloret, J.: In-vehicle communication cyber security:
challenges and solutions. Sensors 22(17), 6679 (2022)
8. El-Sayed, R.S., El-Sayed, M.N.: Classification of vehicles’ types using histogram oriented
gradients: comparative study and modification. IAES International Journal of Artificial
Intelligence 9(4), 700 (2020)
9. Siddiqui, A.J., Mammeri, A., Boukerche, A.: Towards efficient vehicle classification in intel-
ligent transportation systems. In: Proceedings of the 5th ACM Symposium on Development
and Analysis of Intelligent Vehicular Networks and Applications, pp. 19–25 (2015)
10. Bhattacharyya, A., Bhattacharya, A., Maity, S., Singh, P.K., Sarkar, R.: JUVDsi v1: developing
and benchmarking a new still image database in Indian scenario for automatic vehicle detection.
Multimed. Tools Appl. pp. 1–33 (2023)
11. Ali, A., Sarkar, R., Das, D.K.: IRUVD: a new still-image based dataset for automatic vehicle
detection. Multimed Tools Appl, pp. 1–27 (2023)
12. Kanistras, K., Martins, G., Rutherford, M.J., Valavanis, K.: PA survey of unmanned aerial vehi-
cles (UAVs) for traffic monitoring. In: 2013 International Conference on Unmanned Aircraft
Systems (ICUAS), IEEE, pp. 221–234 (2013)
13. Yuan, C., Zhang, Y., Liu, Z.: A survey on technologies for automatic forest fire monitoring,
detection, and fighting using unmanned aerial vehicles and remote sensing techniques. Can. J.
For. Res. 45(7), 783–792 (2015)
14. Sochor, J., Herout, A., Havel, J.: Boxcars: 3d boxes as cnn input for improved fine-grained
vehicle recognition. In: Proceedings of the IEEE conference on computer vision and pattern
recognition, pp. 3006–3015 (2016)
404 S. Maity et al.
15. Bharadwaj, H.S., Biswas, S., Ramakrishnan, K.R.A.: large scale dataset for classification of
vehicles in urban traffic scenes. In: Proceedings of the Tenth Indian Conference on Computer
Vision, Graphics and Image Processing, pp. 1–8 (2016)
16. Elkerdawy, S., Ray, N., Zhang, H.: Fine-grained vehicle classification with unsupervised parts
co-occurrence learning. In: Proceedings of the European Conference on Computer Vision
(ECCV) Workshops, p. 0 (2018)
17. Luo, Z., et al.: MIO-TCD: A new benchmark dataset for vehicle classification and localization.
IEEE Trans. Image Process. 27(10), 5129–5141 (2018)
18. Jung, H., Choi, M.K., Jung, J., Lee, J.H., Kwon, S., Young Jung, W.: ResNet-based vehicle
classification and localization in traffic surveillance systems. In: Proceedings of the IEEE
conference on computer vision and pattern recognition workshops, pp. 61–67 (2017)
19. Kim, P.K., Lim, K.T.: Vehicle type classification using bagging and convolutional neural
network on multi view surveillance image. In: Proceedings of the IEEE conference on computer
vision and pattern recognition workshops, pp. 41–46 (2017)
20. Taek Lee, J., Chung, Y.: Deep learning-based vehicle classification using an ensemble of local
expert and global networks. In: Proceedings of the IEEE conference on computer vision and
pattern recognition workshops, pp. 47–52 (2017)
21. Dong, Z., Wu, Y., Pei, M., Jia, Y.: Vehicle type classification using a semisupervised convolu-
tional neural network. IEEE Trans. Intell. Transp. Syst.Intell. Transp. Syst. 16(4), 2247–2256
(2015)
22. Dong, H., Wang, X., Zhang, C., He, R., Jia, L., Qin, Y.: Improved robust vehicle detection and
identification based on single magnetic sensor. Ieee Access 6, 5247–5255 (2018)
23. Sunderlin Shibu, D., Suja Priyadharsini, S.: Multimodal medical image fusion using L0 gradient
smoothing with sparse representation. Int J Imaging Syst Technol, vol. 31, no. 4, pp. 2249–2266
(2021)
24. Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3d object representations for fine-grained catego-
rization. In: Proceedings of the IEEE international conference on computer vision workshops,
pp. 554–561 (2013)
25. Ngiam, J., Peng, D., Vasudevan, V., Kornblith, S., Le, Q.V., Pang, R.: Domain adaptive transfer
learning with specialist models. arXiv preprint arXiv:1811.07056 (2018)
26. Ridnik, T., Ben-Baruch, E., Noy, A., Zelnik-Manor, L.: Imagenet-21k pretraining for the
masses. arXiv:2104.10972 (2021)
27. Yang, L., Luo, P., Change Loy, C., Tang, X.: A large-scale car dataset for fine-grained cate-
gorization and verification. In: Proceedings of the IEEE conference on computer vision and
pattern recognition, pp. 3973–3981 (2015)
28. Hu, Q., Wang, H., Li, T., Shen, C.: Deep CNNs with spatially weighted pooling for fine-grained
car recognition. IEEE Trans. Intell. Transp. Syst.Intell. Transp. Syst. 18(11), 3147–3156 (2017)
29. Suhaib Tanveer, M., Khan, M.U.K., Kyung, C.-M.: Fine-Tuning DARTS for Image Classifica-
tion. p. arXiv-2006 (2020)
30. Yu, Y., Liu, H., Fu, Y., Jia, W., Yu, J., Yan, Z.: Embedding pose information for multiview vehicle
model recognition. IEEE Trans. Circuits Syst. Video Technol. 32(8), 5467–5480 (2022)
31. Lu, L., Wang, P., Huang, H.: A large-scale frontal vehicle image dataset for fine-grained vehicle
categorization. IEEE Transactions on Intelligent Transportation Systems (2020)
32. Liao, L., Hu, R., Xiao, J., Wang, Q., Xiao, J., Chen, J., “Exploiting effects of parts in fine-grained
categorization of vehicles. In: IEEE international conference on image processing (ICIP). IEEE
2015, 745–749 (2015)
33. Boyle, J., Ferryman, J.: Vehicle subtype, make and model classification from side profile video.
In: 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance
(AVSS), IEEE, pp. 1–6 (2015)
34. Stark, M., et al.: Fine-grained categorization for 3d scene understanding. Int. J. Robot. Res.
30(13), 1543–1552 (2011)
35. Lin, Y.-L., Morariu, V.I., Hsu, W., Davis, L.S.: Jointly optimizing 3d model fitting and fine-
grained classification. In: European conference on computer vision, Springer, pp. 466–480
(2014)
14 Current Datasets and Their Inherent Challenges for Automatic Vehicle … 405
36. Tafazzoli, F., Frigui, H., Nishiyama, K.: A large and diverse dataset for improved vehicle make
and model recognition. In: Proceedings of the IEEE conference on computer vision and pattern
recognition workshops, pp. 1–8 (2017)
37. Kuhn, D.M., Moreira, V.P.: BRCars: a Dataset for Fine-Grained Classification of Car Images.
In: 2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), IEEE,
pp. 231–238 (2021)
38. Tabassum, S., Ullah, S., Al-nur, N.H., Shatabda, S.: Poribohon-BD: Bangladeshi local vehicle
image dataset with annotation for classification. Data Brief 33, 106465 (2020). https://doi.org/
10.1016/j.dib.2020.106465
39. Hasan, M.M., Wang, Z., Hussain, M.A.I., Fatima, K.: Bangladeshi native vehicle classification
based on transfer learning with deep convolutional neural network. Sensors 21(22), 7545 (2021)
40. Deshmukh, P., Satyanarayana, G.S.R., Majhi, S., Sahoo, U.K., Das, S.K.: Swin transformer
based vehicle detection in undisciplined traffic environment. Expert Syst. Appl. 213, 118992
(2023)
41. Kramberger, T., Potočnik, B.: LSUN-Stanford car dataset: enhancing large-scale car image
datasets using deep learning for usage in GAN training. Appl. Sci. 10(14), 4913 (2020)
42. Abdal, R., Zhu, P., Mitra, N.J., Wonka, P.: Labels4free: Unsupervised segmentation using
stylegan. In: Proceedings of the IEEE/CVF International Conference on Computer Vision,
pp. 13970–13979 (2021)
43. Gautam, S., Kumar, A.: An Indian Roads Dataset for Supported and Suspended Traffic Lights
Detection. arXiv:2209.04203 (2022)
44. Sun, W., Zhang, G., Zhang, X., Zhang, X., Ge, N.: Fine-grained vehicle type classification
using lightweight convolutional neural network with feature optimization and joint learning
strategy. Multimed Tools Appl 80(20), 30803–30816 (2021)
45. Butt, M.A. et al.: Convolutional neural network based vehicle classification in adverse
illuminous conditions for intelligent transportation systems. Complexity, 2021 (2021)
46. Gholamalinejad, H., Khosravi, H.: Irvd: A large-scale dataset for classification of iranian
vehicles in urban streets. Journal of AI and Data Mining 9(1), 1–9 (2021)
47. Peng, Y., Jin, J.S., Luo, S., Xu, M., Cui, Y.: Vehicle type classification using PCA with self-
clustering. In: 2012 IEEE International Conference on Multimedia and Expo Workshops, IEEE,
pp. 384–389 (2012)
48. Khoba, P.K., Parikh, C., Jawahar, C.V., Sarvadevabhatla, R.K. Saluja, R.: A Fine-Grained
Vehicle Detection (FGVD) Dataset for Unconstrained Roads. arXiv:2212.14569 (2022)
49. Avianto, D., Harjoko, A.: CNN-Based Classification for Highly Similar Vehicle Model Using
Multi-Task Learning. J Imaging 8(11), 293 (2022)
50. Wang, C., Zhu, S., Lyu, D., Sun, X.: What is damaged: a benchmark dataset for abnormal traffic
object classification. Multimed Tools Appl 79, 18481–18494 (2020)
51. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolu-
tional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pp. 4700–4708 (2017)
52. Bao, S.Y., Savarese, S.: “Semantic structure from motion”, in CVPR. IEEE 2011, 2025–2032
(2011)
53. Pandey, G., McBride, J.R., Eustice, R.M.: Ford campus vision and lidar data set. Int J Rob Res
30(13), 1543–1552 (2011)
54. Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y., “Locality-constrained linear coding for
image classification”, in,: IEEE computer society conference on computer vision and pattern
recognition. IEEE 2010, 3360–3367 (2010)
55. Shafiee, M.J., Chywl, B., Li, F., Wong, A.: Fast YOLO: A fast you only look once system for
real-time embedded object detection in video. arXiv:1709.05943 (2017)
56. Girshick, R.: Fast r-cnn, In: Proceedings of the IEEE International Conference on Computer
Vision, pp. 1440–1448 (2015)
57. Atieh, A.M., Epstein, M.: The method of spatio-temporal variable diffusivity (STVD) for
coupled diffusive processes. Mech. Res. Commun.Commun. 111, 103649 (2021)
406 S. Maity et al.
58. Branch, H.O.S.D.: Imagery library for intelligent detection systems (i-lids). In: 2006 IET
Conference on Crime and Security, IET, pp. 445–448 (2006)
59. Wang, Y., Jodoin, P.M., Porikli, F., Konrad, J., Benezeth, Y., Ishwar, P.: CDnet 2014: An
expanded change detection benchmark dataset. In: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition Workshops, pp. 387–394 (2014)
60. Wang, Y., et al.: Detection and classification of moving vehicle from video using multiple
spatio-temporal features. IEEE Access 7, 80287–80299 (2019)
61. Alsahafi, Y., Lemmond, D., Ventura, J., Boult, T.: Carvideos: a novel dataset for fine-grained
car classification in videos. In: 16th International Conference on Information Technology-New
Generations (ITNG 2019), Springer, pp. 457–464 (2019)