A Novel Framework For Smart Cyber Defence A Deep-Dive Into Deep Learning Attacks and Defences

Received 14 July 2023, accepted 14 August 2023, date of publication 17 August 2023, date of current version 23 August 2023.
Digital Object Identifier 10.1109/ACCESS.2023.3306333
A Novel Framework for Smart Cyber Defence:

A Deep-Dive Into Deep Learning Attacks
and Defences
IRAM ARSHAD 1 , SAEED HAMOOD ALSAMHI2,3 , YUANSONG QIAO 1,
BRIAN LEE1 , AND YUHANG YE 1

1 Software Research Institute, Technological University of Shannon: Midlands Midwest, Athlone N37 HD68 Ireland
2 Insight Centre for Data Analytics, University of Galway, Galway, H91 AEX4 Ireland
3 Faculty of Engineering, IBB University, Ibb, Yemen
Corresponding author: Iram Arshad ([email protected])
This work was supported by the Technological University of Shannon: Midland Midwest under the President Doctoral Scholarship. The
work of Saeed Hamood Alsamhi was supported by the Science Foundation Ireland co-funded by the European Regional Development
Fund under Grant SFI/12/RC/2289_P2.
ABSTRACT Deep learning techniques have been widely adopted for cyber defence applications such
as malware detection and anomaly detection. The ever-changing nature of cyber threats has made cyber
defence a constantly evolving field. Smart manufacturing is critical to the broader thrust towards Industry
4.0 and 5.0. Developing advanced technologies in smart manufacturing requires enabling a paradigm shift in
manufacturing, while cyber-attacks significantly threaten smart manufacturing. For example, a cyber attack
(e.g., backdoor) occurs during the model’s training process. Cyber attack affects the models and impacts the
resultant output to be misled. Therefore, this paper proposes a novel and comprehensive framework for smart
cyber defence in deep learning security. The framework collectively incorporates a threat model, data, and
model security. The proposed framework encompasses multiple layers, including privacy and protection of
data and models. In addition to statistical and intelligent model techniques for maintaining data privacy and
confidentiality, the proposed framework covers the structural perspective, i.e., policies and procedures for
securing data. The study then offers different methods to make the models robust against attacks coupled with
a threat model. Along with the model security, the threat model helps defend the smart systems against attacks
by identifying potential or actual vulnerabilities and putting countermeasures and control in place. Moreover,
based on our analysis, the study provides a taxonomy of the backdoor attacks and defences. In addition, the
study provides a qualitative comparison of the existing backdoor attacks and defences. Finally, the study
highlights the future directions for backdoor defences and provides a possible way for further research.
INDEX TERMS Backdoor attacks, cyber-attacks, deep learning, defences, security, smart cyber defence,
smart manufacturing security.
I. INTRODUCTION towards developing SM systems that can respond in real-time

Recently, the most valuable resource is data collected from to changes in customer demands and the conditions in the
the smart devices in Smart Manufacturing (SM). Internet of supply chain and the factory itself. SM is a crucial com-
things and cyber-physical systems are one of the fundamental ponent of the broader thrust towards Industry 4.0 and 5.0.
pillars of the 4.0 and 5.0 industry revolution. These pillars can Cyber attacks are significantly increased where hackers target
smart anything like manufacturing, cities, home, agriculture various organizations, institutes, health sectors, industries,
and so on. Substantial recent investment has been directed and individuals. The escalation of technology and leaning
on digital systems have made it easier for cyber attack-
The associate editor coordinating the review of this manuscript and ers to exploit vulnerabilities and attacks. Integrating digital
approving it for publication was Derek Abbott . technologies in SM industry systems brings potential new
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
VOLUME 11, 2023 88527
I. Arshad et al.: Novel Framework for Smart Cyber Defence
security challenges [65]. Deep Learning (DL) algorithms play and protection, we also provide a threat model to analyze the
a crucial role in manufacturing intelligence to make better potential security risk and vulnerabilities in the design and
decisions, e.g., reducing energy consumption and improving implementation of the models, ensuring the overall security
product quality. DL models have been widely used employed of models to make them robust.
to detect and prevent security threats in various applications.
Intrusion detection systems, fraud detection, and abnormal A. MOTIVATIONS AND CONTRIBUTIONS
system behavior are examples. However, recent studies show Cybersecurity has become increasingly important in recent
the variety of security threats against these DL models as years due to the rising number of cyber-attacks and data
mentioned in [1], and [2]. breaches. It is critical to ensure data security and intelligent
Adapting internet connectivity devices, collecting massive models in SM to protect the digital industry from potential
data, cleaning, preparing, and using the DL algorithm without cyber threats. ML and DL algorithms are used in various
considering security threats makes SM industries vulnerable. applications to detect patterns and anomalies in SM systems’
A well-known attack is a backdoor attack, where semantically vast amounts of data. By analyzing data, these algorithms
consistent secret triggers, e.g., visible or invisible, which are can identify potential cyber threats in real-time and alert
secretly known to the attackers, can mislead the DL models security teams, enabling them to take swift action and prevent
into a wrong classification determined by the attacker at harm to the system or data. However, these algorithms are
inference [3]. These backdoor attacks are difficult to detect susceptible to various types of attacks, making the security
because the attack effect remains dormant without the back- of these algorithms essential in SM to protect against general
door trigger. The attacks could bring disaster and causalities cyber threats and model attacks.
if the disrupted DL models are deployed in safety and crit- Smart cyber defence is significantly important in SM
ical applications without being diagnosed. For example, a because these systems are often interconnected and rely on
self-driving system could be attacked to classify the sign of data to make decisions. A single vulnerability in the system
stopping as a ‘‘speed of 80km/hr’’ by adding a reflecting could have far-reaching consequences. Therefore, a com-
trigger,which could lead to a crash [4]. prehensive smart defence framework is essential to ensure
The malignant attack (e.g., backdoor) receives increased the security and integrity of SM systems. By implementing
attention from the research community because these DL advanced cybersecurity solutions and practices, manufactur-
models are used in various safety and critical applications. ers can protect their operations, customers, and bottom line
Several literature surveys and review papers on the attack from the growing threat of cyber attacks.
surface of Machine Learning (ML) and DL models have In SM security, evaluating a system involves continually
been published in [7], [8], [9], [10], [11], [12], and [13]. identifying the categories of attacks, assessing the system’s
However, the unified security framework and threat model resilience against those attacks, and strengthening the system
are generally not discussed. For example, the authors of [7] against those categories of attacks. This study introduces a
reviewed adversarial attacks on DL approaches in computer novel framework for smart cyber defence analysis of DL
vision. Moreover, in study [8], the authors reviewed, summa- model security. The framework also provides a threat model
rized and discussed the adversarial generation method and to identify potential security risks and vulnerabilities in
countermeasures on DL approaches. In the study [9], the designing and implementing DL systems that aim to make
authors discussed and analyzed security threats and defences models robust and secure. The summary of this research
on ML. In studies [10] and [11], authors classified backdoor contribution is described as follows:
attacks based on attackers’ capabilities and characteristics in 1) In order to identify the potential vulnerabilities of data
general. Further, the authors of [12] reviewed the concept, and model attacks (e.g., backdoor) and offer to mit-
cause, characteristics, and evaluation metrics and discussed igate them, this paper introduces a novel framework
the advantages and disadvantages of generating adversarial for smart cyber defence of deep learning security. In
examples. Also, in study [13], the authors reviewed attacks the proposed framework, data is acquired, and subse-
on ML algorithms and illustrated them on the spam filters. quently technical measures are taken to shield it from
To the best of our knowledge, studies have yet to be done threats.
on smart cyber defence to protect data and DL model secu- 2) The study categorizes the attacks based on speci-
rity altogether and provide a unified security framework for men analysis. Different methods and properties used
smart cybersecurity. Therefore, we provide a novel unified to generate the backdoor specimen are discussed in
multi-layered a comprehensive framework for the security specimen analysis. Class-specific and class-agnostic,
infrastructure. The proposed framework helps in protecting one-to-one (single trigger to the same label), and
the data and models from the backdoor and other attacks. one-to-N (multiple triggers to different labels) trigger
It consists of a collection of strategies, procedures and poli- transparency, feature, and image space are among the
cies organizations can use to protect their systems from properties and methods. Then, accesses the fully struc-
cyber-attacks. The framework encompasses a wide range of tured adversary threat model in terms of goals, capabil-
security measures, including data privacy and protection, and ities, assumptions, attack/defence surface, and defence
model protection. In addition, to further enhance security target.
88528 VOLUME 11, 2023

3) The study highlights the future direction in smart cyber

defence based on taxonomy, assessment, and qual-
itative analysis, which aid interested researchers in
making additional contributions to secure SM systems
and other applications.
B. PAPER STRUCTURE
The scope of this paper is to explore the implementation of
smart cyber defence solutions for SM and suggest a security
framework that prioritizes data, model privacy, and protec-
tion. Our research specifically examines backdoor attacks and
defences, emphasizing the importance of robust DL models
in safeguarding SM systems. This is depicted in the accom-
panying Figure 1.
The rest of the paper is organized as follows. In section II,
the definition of the backdoor, abbreviations, and acronyms
in the paper has been discussed. In section III, we intro- FIGURE 1. Structure of the paper.
duce the proposed security infrastructure framework. Based
on our analysis in section IV we present the taxonomy of
backdoor attacks. In section V, we discuss the defence of
backdoor attacks. In section VI, we provide possible future We have observed that most of the backdoor models are
directions on backdoor defences. At the same time, the paper generally evaluated based on Injection rate (IR) (i.e., the ratio
is concluded in VII. of poison samples injected in the clean dataset during the
training of the model), Clean Data Accuracy (CA) (i.e., the
portion of the clean test samples that are correctly classified
II. PRELIMINARIES
to the ground truth class), Poison Data Accuracy (PA) (i.e.,
A. BACKDOOR FORMULATION
the portion of the poison test samples that are correctly clas-
We can formally formulate the backdoor attack as follow.
sified to the attackers decided label) and Attack Success Rate
Given an input (xi , yi ) belongs to Dc to a clean DL model
(ASR) (i.e., the portion of the benign samples stamp with the
F2c , which takes input and based on a decision function
trigger successfully classify to the attackers targeted class.) as
zc = f(x), outputs the final predicted label. Where zc is the
mentioned in research [3]. For a successful backdoor model,
predicted label. A dataset Dc is inclusive for training, and Dt
the model accuracy should be as similar as CA, and IR should
is a testing dataset. In the context of a backdoor, an adversary
be the smallest ratio of the total clean dataset as mentioned in
Adv aims to inject perturbations to a small number of inputs,
research [14], [15]. In contrast, ASR should be high, which
as in 1.
may be close to 100%. In Figure 2, colorblackby way of
xia = xi + δ (1) example, we illustrate a process of generating clean-label
backdoor attacks.
Where δ is Adv trigger stamp on clean input xi , the predicted
label will always be Adv targeted class zadv , where zadv is 1) ABBREVIATIONS AND ACRONYMS
given in 2. It is a backdoor model decision function with a
To ease the readability, the study generally provides some
high probability of being the same as per the Adv targeted
terms used frequently in this paper. The terms are described
label.
in the table 1.
zadv = F2bd (xia ) (2)
III. SECURITY INFRASTRUCTURE
The injection of the perturbations is added to the training Protecting DL models from cyber-attacks has greatly con-
dataset Dc that becomes poison training datasets as in 3. cerned practitioners and researchers. We briefly discuss a
Dbd = Dc UDadv (3) proposed comprehensive multi-layered data and model pro-
tection security framework that can potentially be used to
The dataset mentioned in 3 is used to train the f(x), where discover the security insights from the data to the model; to
the model learned to minimize the cross-entropy loss on the build smart cyber security systems, e.g., predictive analysis,
Dbd training dataset. In addition, when the model is deployed, behavioral analysis, and automatic response. In order to make
and a new backdoor sample xi a is tested, the probability of sure a secure data-driven intelligent decision, a comprehen-
the given input is high f(xi a ) so that the Adv targeted class sive analysis is required to understand the potential security
will choose. In addition, the model will behave effectively vulnerabilities. For this purpose, our proposed suggested
for the benign inputs without performance detraction. The framework takes into account both the security of the models
success of the backdoor attack model can also be evaluated. from numerous attacks as well as protecting data. Further,
VOLUME 11, 2023 88529

FIGURE 2. An example of generating clean label backdoor attack. 1) Poison dataset generation: The adversary can generate the poison
instances close to the base instances in pixel space but looks like target instances in image space. 2) Training: Poison images are mixed with
benign and included in the training dataset, thus, affecting the decision boundary. 3) Inference: The clean images of the target class will be
recognized as a base class at inference time.
TABLE 1. A summary of the definition of terms.
a proposed threat model for deep learning could exploit the exploring model vulnerabilities regarding goals, capabilities,
model flaws. In the threat model, looking from the lens of assumptions, and attack surfaces. In the following sections
the attackers’ perspective is one of the ways to focus on their (III-A,III-B), we briefly discuss the working procedure of the
perspective, goals, and capabilities. proposed framework.
In the proposed unified framework, the study considers
several aspects of cyber security while protecting the data, A. DATA PRIVACY AND PROTECTION
and the models. The first stage is to ensure data privacy In today’s digital era, it is essential to safeguard people’s
and protection because it is paramount in the digital world. personal information from unauthorized access, and this is
The second stage is the model protection and the threat referred to as data privacy. The protection of personal data
model analysis that is desired to build a smart cyber security ensures that individuals maintain their rights over it. There-
system. The proposed unified framework could be more effi- fore, once the dataset has been collected from numerous
cient and intelligent in providing two-tier security of models. sources [5] and [6], this layer is responsible for providing
In Figure 3, we illustrate a proposed novel framework for privacy to the data. High-quality data is needed to achieve
smart cyber defence for providing security. Further, the pro- highly accurate predictions on the predictive models. The data
tection of models leads toward a threat model in Figure 4 for collection process requires cleansing of data and handling
88530 VOLUME 11, 2023

used to enhance privacy, where we can identify the unusual

or abnormal data points within the dataset.
We can use different statistical model techniques to detect
the anomalies based on the data distribution. For example,
z-score methods identify anomalies as data points signifi-
cantly different from the mean of the dataset. Using density-
based methods, we can identify anomalies as data points
in the dataset’s low-density regions. For example, the local
outlier factor method is a technique that can be used to
detect anomalies. Furthermore, we can assign a score to each
data point based on its relative density compared to the sur-
rounding data points. We can also use clustering techniques
where the data is divided into clusters. Then, anomalies are
identified as data points that do not belong to any cluster.
For example, the k-means algorithm can cluster the data and
identify the anomalies as data points far from the cluster
centroids.
Another way is to use decision trees to identify the anoma-
lies in the dataset. For example, we can use the Isolation forest
algorithm that uses decision trees to isolate the anomalies by
randomly selecting and splitting the data into smaller subsets.
We can also use DL algorithms to identify the anomalies
in the dataset. For example, auto-encoders can be used to
reconstruct the data, and the data points that are reconstructed
poorly can consider anomalies. Lastly, differential privacy
is used to protect the individual’s privacy in data analysis
and to ensure that the dataset does not reveal any sensitive
information about individual data points. One of the dif-
ferential privacy techniques to protect the data is Laplace
noise. The noise can be added to the dataset to protect pri-
vacy. Data privacy and protection structure is a complex and
FIGURE 3. Proposed Framework for smart cyber security system.
multi-layered process involving a range of security measures
and risk management strategies. In addition, protecting data
is a critical concern for any organization, given the increased
risk of data breaches and cyber-attacks. Organizations must
follow a structural approach that includes policies and proce-
dures, data classification, encryption, and incident response
planning to ensure data privacy and protection.
Organizations must have very clearly defined policies and
procedures. These policies should include information about
the types of data collected, how they are collected, stored,
and transmitted, and who has access to them. The guidelines
should also specify the methods that will be used to protect
the data, such as encryption, access controls, and monitoring.
Further, based on the sensitivity of the data, it should be clas-
sified based on sensitivity level. Categorizing data based on
sensitivity helps the organization determine the appropriate
security measures to apply. For instance, financial records and
health care data information require more security than less
FIGURE 4. A linking threat model of our proposed framework for
analyzing the potential security risk of models.
sensitive data such as customer contact information. Encryp-
tion is a critical security measure for protecting data privacy.
It involves encrypting the data that authorized users can only
decode. Thus, this helps to prevent unauthorized access to the
missing or corrupted values. However, beyond a solid under- data, even if it is intercepted during the transmission.
standing of the data preparation process, privacy the data is Organizations should have a well-defined incident
also needed. Several anomaly detection techniques can be response plan to quickly and effectively respond to security
VOLUME 11, 2023 88531

incidents or data breaches. It refers to detecting, analyz- we illustrate a threat model that is used to analyze the poten-
ing, and responding to security incidents or data breaches tial security risk and vulnerabilities. An attacker can generate
promptly and effectively. It involves identifying the steps the attack and customize it as per the application.
to take in a security incident, such as notifying affected We first analyze the attacker and defender control over
individuals and authorities and implementing measures to the four attack surfaces. The details are summarized in
prevent similar incidents. The goal of incident response is to Table 2 and described the attack surface in the following
minimize the damage caused by a security incident to prevent section III-B10. In the subsequent section, we describe the
it from escalating into a larger problem. attacker and defender threat model. We model the attack and
defence into three parties. A Victim User (VU) who wants to
B. MODEL PROTECTION AND THREAT MODEL train the DL model by collecting the dataset from the internet,
To maintain the integrity of the model’s output, it’s crucial outsourcing the job of training of DL model to a third party
to protect it from attacks. Once the dataset is prepared and or downloading a pre-trained model from the open-source
its security is ensured, the data is fed directly to the models. repository to adapt to her task using a transfer learning.
This step is crucial for creating accurate and secure predic- An attacker whose goal is to corrupt the DL model by consid-
tion systems. DL models learn hierarchically from the data, ering capabilities and assumptions, and the defender’s goal is
extracting insights and knowledge. For instance, a DL model to prevent otherwise.
trained with face data detects edges at first, identifies shapes Goals: The attacker’s goal is to poison the DL model and
such as the nose and mouth, and finally extracts the larger return the poison model F2adv which is equal to the clean
facial structure. However, these models can be vulnerable to model F2c . However, while generating the F2adv , the model
attacks that mislead the output to the attacker’s target. We can attacker considers two goals in mind. First, the accuracy of the
integrate backdoor and adversarial detection techniques to return poison model F2adv should not drop on the validation
protect the output of models from attacks like adversarial and dataset. Second, for the inputs that contain the triggers, the
backdoor attacks. Additionally, we can include access control model F2adv output should be different from the clean model
mechanisms that allow only authorized users to access the F2c output. Formally, let I is a function I: RN -> {0, 1} that
models to prevent attacks from within the training dataset. map any input (X in RN ) to binary output. However, in the
By doing so, we can ensure the models are protected from presence of the trigger (t), the (x) is 1 and 0 otherwise.
misleading output. C is another function C: RN -> {1, M} that maps the input
After deploying the model, it is crucial to continuously to a class label (Y). Let’s consider (G) is an attacker image
monitor and detect any anomalies that could result in potential generator function Gt: X -> X based on some triggers (t)
attacks. However, deep learning models are considered black- stamps on the image. (O) is the output function that shifts
box, meaning different tools, such as Local Interpretable attacker-specific labels in the presence of trigger O: Y -> Y.
Model-Agnostic Explanation (LIME), are used to explain the The attacker needs to consider some risks while making the
model’s decision. It is also essential to regularly test and attack successful.
validate the model’s performance. Updating the model and Risk 1: In the presence of a backdoor trigger, the
its security to keep up with evolving threats and attacks is infected model successfully achieves the goal. For
crucial to building intelligent cybersecurity systems. Asides example, we can say that for all x: I(x) = 1, arg max
from that, the threat model plays a crucial role in defending F2adv (x) = C(x) not equal to F2 (x) in the presence of
the systems against attacks by identifying potential or real a backdoor, the output should not be equal to the true
vulnerabilities, putting countermeasures and control in place output.
to prevent those vulnerabilities from not being exposed, and Risk 2: In the absence of a backdoor trigger, the model
imposing destruction. The detailed description of the threat should correctly predict the expected output. For exam-
model is discussed in the subsequent section III-B1. ple, for all x: I(x) = 0, arg max F2c (x) = C(x).
Risk 3: Whether the poison sample is detectable by
1) THREAT MODEL humans or machines. For example, D is detectable func-
A threat model is a tool to examine the adversary model. tion and x’ = G(x) so D(x’ ) = 1 if an only if the t is
An adversary model is a specimen of the attackers in the detected.
system. Depending on the goal of the attacker, the specimen The defender’s goal is to identify and mitigate the back-
is created. A specimen can be a simple algorithm or series of door triggers at inference time to avoid being attacked. The
statements based on the purpose and capabilities. Based on defender’s purpose can fall into three categories 1) detec-
the threat model, we explore the adversary model in terms of tion, 2) identification, and 3) mitigation. In attacks detection,
an attack category (e.g., backdoor generation), attack/defence a binary decision is made whether or not the given DL model
surface (e.g., entry points), defence target and attacker and has been infected. Identification, identify the triggers. Miti-
defender capability (e.g., abilities), goals (e.g., target) and gation makes the triggers ineffective.
assumptions (e.g., environment) to inject the attacks. As Capabilities: We assume that the attacker has a control
opposed to, the defender can utilize threat model to explore of the training set, training schedule, and model parame-
the vulnerabilities and defence the application. In Figure 4, ters according to the target surface. However, the attacker
88532 VOLUME 11, 2023

FIGURE 5. Categorization of Attack based on backdoor specimen analysis and targeted pipeline.
has no control over the inference pipeline. For defender’s, Integrity: To prevent the attacker from flipping the
we assume that the defender has full control of the inference output.
pipeline based on the target surface. The details are listed in Availability: To prevent the attacker from interfering
Table 2. with the normal training schedule, training set, and
Assumptions: Facing up backdoor DL attacks is an ongo- model parameters.
ing and constantly evolving challenge. Access Control: To prevent the insider attacker from
Backdoor assumptions are mandatory to prevent backdoor accessing the sensitive information.
access points. In particular, backdoor attacks pose a sig- There is a connection between false negatives and the vio-
nificant threat to the reliability of DL model predictions. lation of the integrity goal. The poison instances that pass
These assumptions must consider what causes the viola- through the classifier can create destruction. Likewise, a false
tion of integrity, availability, and access control of these positive is connected with the availability as the classifier in
DL models. The assumptions are the following: 1) Adding the presence of the poison instance denies being true.
a backdoor does not affect the model performance, 2) the
model will behave correctly in the inactivation of the back- 2) ATTACK CATEGORY
door, and 3) backdoor does not cause false positives to In this section, we discuss the attack category to gener-
the model. Meanwhile, to prevent security violations at a ate a backdoor specimen. The specimen can be a simple
minimum, organizations should carefully evaluate and mon- algorithm based on the attacker’s goal and capabilities. How-
itor the pipeline of DL models. For example, organizations ever, in terms of the backdoor, the attacker can generate
should monitor the data and label drifting, identify the signs the specimen based on several attributes and methods. For
of tampering or manipulating data, and implement robust example, generating a trigger for an image or feature space
security controls to protect their models from malicious back- is the method of the specimen. Conversely class-specific or
door attacks. Moreover, organizations must consider secure agnostic, one-to-one (single triggers to the same label) or one-
methods for training and deploying their DL models to to-N (single trigger to multiple labels), size, position, and
ensure they are trustworthy and secure in safety and critical shape of triggers are the properties of the specimen.
applications.
Security Analysis: We perform a security analysis as a 3) BACKDOOR COMPOSITION
defender of DL models to protect the system by identify- An attacker can compose a backdoor attack by selecting the
ing the security goal and threat model. A security goal is methods (M) and properties (P) as mentioned in Figure 5.
a requirement that, if violated, can lead the system into a An attacker can generate the specimen by choosing the
compromised state. A threat model is a profile of the attacker method: Image/feature space, trigger, and their associated
or defender that describes goals, motivation, and capabilities. properties. For example, in the case of a traffic sign detection
In the context of the DL image classification model, it aims application, the attack generates the specimen by selecting the
to classify the images correctly. The power of the model is trigger invariant to size, shape and position, and image space
measured in terms of True positive (TP), True negative (TN), with class agnostic property [3].
False positive (FP), and False negative (FN). The attacker
aims to increase the FP and FN to enter the system. In con- 4) IMAGE/FEATURE SPACE (M1)
trast, defenders prevent FP and FN. In the context of the Image space represents visual data. In image space, the
security goal, the defender’s purpose is to identify malicious attacker stamps small sticker shapes (e.g., 2 × 2 square,
activities and prevent them from flipping the model’s output. flower) that lead to a specific pattern during training. The
We classify the security goal into three categories: feature space defines the range of possible values for each
VOLUME 11, 2023 88533

feature and guides the design and selection of features for use big datasets from the internet to collect the data. For
a particular problem. In feature space, the attacker performs example, popular and publicly available datasets only rely on
some transformation by using an optimization method that volunteer contributions, such as ImageNet [16] and MNIST
leads to a particular pattern in the feature space. [17]. If the user collects data from multiple sources over the
internet, the collected data may be infected. An attacker can
5) CLASS-SPECIFIC AND CLASS AGNOSTIC (P1) generate the poison dataset and leave it on the web for the
A backdoor attack holds the targeted attack property. The victim to use and download for training and testing models.
one input is misclassified to the attacker’s chosen targeted The model becomes infected when a victim uses poison data
class. Attack under this category is divided into two parts to train or test the models. Clean-label poisoning attacks [18],
1) class-specific and 2) class-agnostic. In class-specific spec- CGAN attacks [24], poison frog attacks [25], image-scaling
imens, the attacker can pick the input of a specific class, attacks [57] are examples of this attack surface. The labels
stamp the trigger and misclassify to the target class. Whereas are consistent with the data. Therefore, making them easy to
in class-agnostic attacker can stamp the trigger to any class pass the visual inspection.
input, it will misclassify to the targeted class.
12) PRETRAINED (EP2)
6) MULTIPLE TRIGGERS TO MULTIPLE LABELS (P2) Transfer learning is a concept where a pretrained model is
Multiple triggers (e.g., many-to-many attack) are stamped to used as a starting point to train a model on a new task.
different input classes, and each trigger targets another class In short, knowledge gained from one task solves a different
label (an attacker decides the targeted label collection). This but related problem. This process reduces the computational
attack activates in the presence of any trigger at inference overhead. Furthermore, the models can be readily available
time and classifies to the attacker’s chosen targeted label on open-source repositories such as GitHub and model zoo.
collection. For example, an attacker can inject the poison dataset, train
the model for the face recognition task, and place this model
7) MULTIPLE TRIGGERS TO SAME LABEL (P3) on publically available repositories. Latent backdoor attacks
Multiple triggers are stamped to different input classes, and [19] and backdoor attacks against transfer learning [20] are
each trigger targets only one class label. The attack activates examples of pre-trained surface attacks.
in the presence of any trigger at inference time, and is classi-
fied according to to the same targeted label. 13) OUTSOURCING (EP3)
The backdoor arises when users outsource the model training
8) MODEL WEIGHTS OR PARAMETERS (P4): to machine learning as service (MLaaS) platforms due to a
In this case, the attacker can disrupt the models by embedding shortage of computational resources. For example, the user
the triggers without direct access to the training data and can define the model architecture and provide the training
modifying the parameters or weights of DL models. data to the MLaaS provider. However, the control is over the
provider, and during the training phase, backdoor injections
9) TRIGGER (M2) can be injected without the user’s notice. For example, a client
In the computer vision domain, almost every backdoor spec- can outsource the face recognition task training procedure to
imen generates by considering trigger transparency with its a compromised cloud. The compromised cloud can poison
additional characteristics, size (P1), shape (P2), and position the training images with a targeted false label and offers the
(P3). Earlier work on the backdoor considers the physical client a trained network that contains a backdoor. As a result,
specimen (e.g., shape stickers), and later work considers the any individual image that includes the backdoor trigger (i.e.,
digital (e.g., pixel perturbation). The additional character- a small picture in the bottom-left corner of the face image)
istics may not apply to other domains like audio and text. can imitate another certified individual [39].
Triggers are the core of the backdoor attack. It can be better
designed and generated at the optimization level (P4) to 14) COLLABORATIVE LEARNING (EP4)
achieve better performance. Collaborative learning is designed to protect the data privacy
leakage owned by the clients. The server cannot control the
10) ATTACK AND DEFENCE SURFACE PIPELINE participants’ training data during the learning phase. Once the
This section discusses the attack surface pipeline that model training is completed offline, trained model weights
becomes an attacker’s Entry Point (EP) to disrupt the DL will be uploaded to the server. However, collaborative learn-
models. ing is also vulnerable to backdoor attacks. A collaborative
model can easily be a backdoor when a few participants are
11) DATA COLLECTION (EP1) compromised or attacked. Some data encryption models, such
The data we obtain for training DL models is crucial, as the as CryptoNN [21] and SecureML [22], train the model over
data quality and quantity directly impact the model’s work- the encrypted data to ensure data privacy under the attacker’s
ing. However, data collection is usually error-prone as users target. In particular, in joint collaborative learning, the data is
88534 VOLUME 11, 2023

TABLE 2. Analysis of the capabilities of attacker and defender corresponding to the attack surface.
contributed by various clients, though encrypted to preserve classification systems (see the section IV). Afterward, we cat-
privacy, making it challenging to ensure whether the data is egorize the existing backdoor attacks based on the attack
benign or otherwise. surface pipeline (see section III-B10) in detail. Table 3 illus-
trates the qualitative analysis of the backdoor attacks based
on the attack surface. Table 4 provides the summary of the
IV. TAXONOMY attacker’s capabilities as per the attack surface.
In this section, we present the taxonomy by categorizing
the attacks on DL models along two axes, as illustrated in 1) TARGETED DATA COLLECTION ATTACK
Figure 6. The first axis demonstrates the type of security We describe the attacker’s scenario, environment, and capa-
violations the attacker causes. For example, poison instances bilities while providing the studies’ details. We discuss clean-
cause harm by passing through the DNN and are classified label and poison-label invisible attacks in the context of
as false negatives (violation of Integrity). Likewise, injecting feature-space attacks.
the poison instances stamp with triggers (violation of access Case: Attacker wants to generate the stealthy poison
control), the classifier gets confused in the presence of a image without controlling the labeling process to evade
trigger, fails to discriminate between benign instances, and human inspection. There is no control over the dataset.
is classified as false positive (violation of availability). The However, in the execution of the attack, the attacker has
second axis relates to the OES, which describes the specificity the least or can be fully knowledgeable of the target
and capability of the attacker. Specificity means that the model.
attacker wants to generate targeted training stage attacks by Environment: Attacker has no access (Black-box) to
selecting the different methods and properties of Backdoor the dataset.
attacks (i.e., outcome). Capability indicates the environment Capabilities: Attacker has the least control (grey box),
(e.g., black box, white box, and grey box), surface (i.e., attack cannot manipulate the training process, and cannot
entry points) to inject the Backdoor triggers. access the model at inference time. In some cases, the
Based on our proposed taxonomy, we provide hypotheti- attacker has full knowledge (white box) of the model.
cally targeted training stage attack scenarios for image classi- Violations: Availability and access control.
fication models. The attacks are divided into four categories,
These attacks are clean-label attacks where the attacker has
particularly attack surfaces. First, the attacker needs to fol-
no control over the label of the dataset. The attacker only
low OES (Outcome Environment Surface) model to generate
tempered the image at the pixel level, which still looks
Backdoor triggers. The outcome details are described in
benign. For example, an attacker could add a benign sample
section III-B2. For example, let’s say the attacker generates
(perceptually similar) without altering the sample’s label and
the triggers for the image space method (M1), and the prop-
inject it into a training set for a face recognition model. Once
erty is class-specific (P1), where the trigger position is in the
the model is trained, the attacker can control the identity
bottom right corner. The size is a bunch of pixel patterns
of a chosen person at test time (security violation of the
decided by the attacker based on the trigger method (M2)
availability). Additionally, based on the attacker’s capability,
and properties size (P1), shape (P2), and position (P3). The
attacker can craft the tempered samples and leave them on the
examples of generating backdoor triggers are illustrated in
web, waiting for the data collection bot to collect them, thus
Figure 7. Finally, the environment represents the capability
entering into the training set.
of the attacker to inject the triggers into the system. If the
The authors [25] proposed the attack for the transfer learn-
attacker has the least knowledge, the environment is consid-
ing scenario, where only one sample is enough to achieve
ered a black box, most knowledge a white box, and some are
a higher success rate which is 100%. Thus, crafting the
considered grey.
attack in the feature collisions for transfer learning settings
is comparatively easy as it is in end-to-end training settings.
A. BACKDOOR ATTACKS An optimization-based method has been used to construct the
We explain the formulation of the backdoor specimen by poison samples. At the same time, the authors were making
understanding the methods and properties of the backdoor the poison samples and added small perturbations to the base
triggers (see section III-B2). Further, we proposed the tax- images to ensure that the base image feature representation
onomy to analyze the existing backdoor attacks for image lies near the target class. The attack’s success depends on
VOLUME 11, 2023 88535

FIGURE 6. Categorization of Attack and defence based on backdoor specimen analysis and targeted pipeline.
TABLE 3. Qualitative comparison of existing backdoor attacks.
providing the images containing the trigger at test time, thus clean-label attack by optimizing the poison image in pixel
consistently misclassified to the target image. The attack is space and ensuring that the source class and a patch trigger
executed in a white-box scenario, which makes it less practi- are always close to the target class in their feature space.
cal in real-time. In addition, the patched source images have been generated
After that, a series of research was dedicated to the research by providing a source image, a trigger patch ’p’, and a binary
of clean-label attacks. This research, [28] inspired by the mask one. This mask becomes zero on the non-trigger place.
work proposed in [25]. However, the difference is that the The execution environment black box makes it practical for
attacker can present the trigger at any random location in real-time security threats.
unseen images to misclassify the source instance to the target Following the setting of the authors in [28], Instead of
instance at inference time. Whereas in the research, [25], the feature collision, convex polytope proposed by the authors
model is fooled only when the attacker presents the particular of [26], and bi-level optimization proposed by the authors of
set of images at inference time. The authors generated the [27] had exploited to generate poison instances. These attacks
88536 VOLUME 11, 2023

In [24] proposed an invisible backdoor attack by using

cGAN. To generate potential poisoned examples for a digit
and animal classification model, the authors applied the
analysis-by-synthesis method with cGAN. The underlying
assumption is that the latent space of cGAN is somewhat
smooth, and thus the intersection of two class ‘‘subspaces’’
FIGURE 7. An example of constructing Backdoor triggers in the image
and feature space. (a) A digit classification system is poison to have may produce ambiguous samples for classification models.
backdoor trigger patterns on the bottom right corner of the image [3] and The proposed method achieves a high success rate with a very
(b) a face classification system is poison to have images that are blended
with the hello kitty image [23] are image space attacks with different low injection rate.
goals and triggers. Whereas (c), (d), and (e) proposed to feature space In [29], the authors slightly changed how to generate the
invisible triggers for Digit classification, cat-dog classification, and traffic
sign classification [4], [14], [24].
triggers for label-consistent attacks. They only stamp the
triggers to the target class - the model can learn the association
between the trigger and the target class. A ramp signal has
been used to inject noise for MNIST and sinusoidal sig-
were executed in a black-box environment and improved the nals for traffic sign datasets. During training, the attacker
attack success rate. only needs to corrupt the sample fraction in the target class.
In [26], the authors proposed a different style of clean-label At test time, the network recognizes the input containing
targeted poisoning attacks via feature collision. The feature the backdoor signals as the attacker’s target class. Further,
vectors corresponding to poison examples are the vertices in [29] evaluated the attack on the MNIST digits classifier
of a convex polytope containing the target’s feature. These and traffic signs classifier under weak assumptions without
attacks anticipate that the whole region inside the convex knowing the deep learning model with the attack success rate
polytope will be classified as the base class, resulting in above 90%.
better attack reliability than a simple feature collision attack. In [4], the authors proposed a backdoor attack inspired by
The authors performed the experiments for end-to-end and a natural phenomenon of ‘reflection’ for end-to-end training
transfer learning scenarios by considering the weak black- scenarios. The attack has been generated by developing var-
box assumptions. The result shows that this attack does not ious reflection patterns as ‘triggers’ for the poison dataset.
require any modification of targeted instances at inference Later, this poison dataset was injected (violation of avail-
time in contrast to existing backdoor attacks. However, the ability) with clean images and considered first class as an
attack success rate is over 50%, with 1% of the poison training attacker’s target class during training. Moreover, the attack’s
set. effectiveness has been evaluated based on three classification
In [27], the authors proposed an optimization frame- tasks: face detection, traffic sign, and object detection with
work for generating two imperceptible variants of backdoor five different datasets. The findings showed the effectiveness
attacks: steganography and regularization. Both attacks are of the Refool attack outperformed existing attacks on various
based on a bi-level optimization problem. The outer optimiza- datasets with a range between 75.16 %-91.67% attack success
tion focuses on minimizing the loss risk, and the inner opti- rate. However, this attack’s overhead relies on corrupting the
mization seeks to optimize the retraining of the pre-trained more significant fraction of training samples.
model to memorize the backdoor. In addition, while generat- In study [30], the authors proposed the invisible triggers
ing the steganography attack, the Least Significant Bit (LSB) by strategically exploiting the order of the training data in
algorithm embeds the triggers into the poisoning training set. which it is presented to the model. An attacker can suc-
Whereas, for regularization attacks, Lp-norm regularization cessfully manipulate the model’s learning process under the
is used to make the small perturbations as a trigger with the black-box setting with no change in model architecture and
extra focus on keeping the shape and size invisible. During original dataset. In another study [31] the authors noted that
this crafting of triggers process, it is also assumed that the the triggers on images have not worked well for videos,
attacker only knows the dataset for the steganography attack. so they have proposed specialized backdoor triggers for video
The authors of [18] proposed two methods to generate recognition tasks. The authors in study [32] proposed a Back-
poison images using GAN-based interpolation and adversar- door attack in a lithographic hotspot detection system in
ial perturbations. These methods make the model harder to the light of a malicious insider (violation of access control).
classify to the ground truth label. Since the poison images An insider attacker can cause the targeted DNN misbehavior
were harder to learn, a model created a strong association by data poisoning targeted inputs. The targeted inputs are the
between the trigger to the targeted label. The interpolation secret trigger of a metal polygon with some non-hotspot clips
method poisoned the image towards the source class in the without corrupting labels. The experimental results of this
latent space, while these images were visually consistent proposed methodology reveal that an attacker can robustly
with its label. In the perturbation method, first, the authors force a targeted misclassification with only 4% of the poison
perturbed the input image and then added the invisible trigger dataset with a 97% attack success rate. The authors in [44]
to generate a poison image. The attacker needs complete proposed a BlackCard backdoor poisoning attack inspired by
knowledge of the model and training procedure as well. poison frog [25].
VOLUME 11, 2023 88537

However, the authors hold three points based on the In [33], the authors proposed two attack strategies, pat-
optimization-based method while crafting the poison instance tern static and targeted adaptive, for generating perturbation
in the feature collision. 1) ensure the poison instance X masks as backdoor attacks by poisoning the training dataset.
appears like the Base class instance b to a human labeler The pattern static perturbation mask is generated by replac-
2) maximize the probability of predicting x as its base class ing the pixel intensity value with ten within the (2 × 2)
label b in attacking model T 3) avoid the collision between subregion of the top left corner of the image. The targeted
feature space representation of input x and the base class adaptive perturbation mask has been generated using the
instance b as much possible. Doing this allows misclassifying DeepFool algorithm proposed by [34] and computed the
poison label x to base instance b, not because of its feature adaptive perturbations for targeted misclassification where
representation but because of its collision. the same perturbation mask is associated with the same class
The crafted poison instance X was injected into the targeted labels. In addition, they minimize the l 2 norm of the pertur-
model at test time. This injected X always misclassified bation to ensure the invisibility of the trigger.
to based instance b under three practical weak black-box The authors in [35], and [36] generated the invisible pat-
assumptions knowledge oblivious, clean-label, and clean terns in the frequency domain, and this kind of attack can
test label. In addition, they also experimented on a variety also bypass existing defences. In [37], the authors proposed
of classification datasets wan an attack success rate ratio an invisible backdoor attack in feature space via style trans-
from 98 to 100%. fer where features manifest themselves differently for every
Case: Attacker wants to generate the stealthy poison different image at a pixel level. The underlying assumption
image by controlling the labeling process. There is some was that the attacker has white box knowledge of the model
control over the dataset. However, in the execution of (violation of availability), dataset (Violation of integrity and
the attack attacker has the least knowledge of the target access control), and training process (violation of availabil-
model. ity). An attacker can choose any target label. When the
Environment: Attacker has minimum knowledge and attacker wanted to launch the attack, inputs passed through
no access (Black box) to the training models. the trigger generator to implant the uninterruptible feature
Capabilities: Attacker has the least control (grey box) trigger, which causes the model to be mispredicted (violation
and cannot manipulate the training process, and has no of availability) at run time.
access to the model at inference time. In some cases, the In study [38], the authors proposed a different technique
attacker has full knowledge (white box) of the model. to generate the sample-specific poison samples by adopt-
Violations: Integrity, availability and access control. ing image stenography. Experiment results demonstrated that
In this study [23], the authors put forth the concept of invisi- this technique could bypass many existing backdoor defence
bility requirements in the creation of backdoor triggers. Their methods. In [42], the authors proposed a backdoor Hidden
objective was to develop poison images that could evade Facial Feature (BHF2) attack for face recognition systems.
detection from human visual inspection by appearing iden- The invisible backdoors can embed into a human inherent
tical to benign images. The study proposed two methods facial features, eyebrows and beard. The generation of attack
of data poisoning, namely input-instance-key and pattern- under the weak black-box assumptions. First, the face key
instance-key. The generated backdoor triggers were designed features are extracted and marked as numbers. They calcu-
to be injected into a learning-based facial recognition authen- lated the deflection angle and length for eyebrows and mouth
tication system. In developing the input-instance-key attack, features. Based on the angle and length information, semi-arc
random noise was added to the images, while the pattern-key and semi-ellipse masks are used, and pixel values of the points
attacks utilized a blended accessory injection strategy. The in these masks are changed, respectively. Then, the labels of
authors compromised the integrity and availability of the the backdoor instances changed to target labels.
facial recognition system. Notably, their attacks were effec- Summary: Although invisible triggers are used in these
tive under weak assumptions, such as the absence of prior kinds of attacks to generate the poison images and associate
knowledge concerning model architecture, training dataset, them with the poison labels. Nevertheless, this process makes
and training parameters, with an attack success rate exceeding it detectable by examining the image label relationship of
90%. Subsequently, there were further studies on invisible training samples. Considering the poison label issue, clean-
triggers with poison labels. label is an active research area to generate backdoor attacks.
Further, in [14], the authors proposed a Pixdoor backdoor Yet, these clean-label attacks usually suffered a low attack
attack by flipping the pixels of the images at pixel space success rate compared to the poison-label invisible attacks.
and generating the poison samples. Later, the poison samples Most recent studies demonstrate the techniques to achieve
have added to the source class, shifted the labels to the target a high attack success rate with a low injection rate for
class, and injected during the training process (violation of clean-label invisible attacks. However, balancing clean labels
availability). However, the authors executed the attack under with effectiveness and stealthiness is still an open question
a black box environment with a low sample injection rate and worth requires further exploration. Data ordering attack is
of 3%. a stealthy way to induce backdoor attacks. This kind of attack
88538 VOLUME 11, 2023

emphasizes the importance of robust training procedures and (MRI) and Electrocardiography (ECG) classification. The
the need for defensive measures. proposed attack success rate is 27.9% to 100 and 27.1% to
56.1% for images and time-series data.
2) TARGETED PRE-TRAINED MODELS ATTACKS Summary: Pre-trained deep learning models have already
Case: A pre-trained model is a model that is trained on been trained on a large dataset, and these models have
a large-scale dataset for the image classification task. learned a significant amount of information about the fea-
The pre-trained model can be easily downloaded from a tures and patterns in the training data. In addition, these
third-party or open-source repository. Users can down- models are made publicly available for further fine-tuning
load these models and use the pretrained model as is or on a specific task. Therefore, pre-trained backdoor attacks
use transfer learning to customize this model to a given have a broad spectrum of victims, as using these models for
task. down-streaming tasks is a norm. However, the attacker cannot
Environment: Attacker has access to the model and control the users’ further downstream tasks. It is worth noting
training dataset. that the attackers can assume the specific knowledge of the
Capabilities: Attacker has full knowledge and con- downstream task as this dataset can be collected from public
trol of the training process and model (white box). repositories.
An attacker can train a poison model and leave the model
to download by the victims. Once the victim attacker 3) TARGETED OUTSOURCING ATTACKS
downloads, the model has no control over it. In this section, we discuss the outsourcing attack scenarios
Violations: Availability, Integrity and access control. involving a third-party platform outsourcing data and getting
The authors in [43] proposed a physical backdoor attack by the untrusted trained DL models. Further, we discussed the
poisoning the dataset for a transfer learning scenario. The earlier methods of generating a backdoor in image space by
poisoning triggers were constructed by considering every- stamping some patterns and associating them with poison
day physical objects like dots, sunglasses, tattoos filled-in, labels to disrupt the models. We also discussed end-to-end
white tape, bandana, and earrings. These poison triggers training attacks as well.
were injected with the benign dataset based on the black-box Case: Due to the cost and expensive computation, many
assumptions during training. Further, the authors empirically industries outsource the training process of machine
studied the effectiveness of proposed physical attacks against learning models to third-party cloud service providers,
two evaluation metrics: accuracy, attack success rate, and known as ML-as-a-Service (MLaaS). MLaaS allows the
four state-of-art defence solutions. During the experiment, attacker to control the training or model of the victim
it has been observed that the trigger earing attack success rate and return the poison model.
was less than the other triggers. In the experiment, failure Environment: Attacker has access to the model and
reason was also investigated based on three factors trigger training dataset.
size, content, and location, with the help of a class activation Capabilities: Attacker has full knowledge and control
map (CAM). Investigation results show that off-face triggers, of the training process and model.
regardless of size, are unlikely to affect the classification Violations: Availability, integrity and access control.
results. Whereas, with the other triggers, the attack success The origination of the backdoor attacks started in 2017 when
rate is above 98% with a 15-25% injection rate. the authors of [3] proposed a BadNet method by poisoning
Further, in study [19], the attacker generated the attack by some training samples for DL models. The attacker can act
training a Teacher model on the poison dataset and classi- as a third party and access the training dataset or model
fying it into a target class. Before deploying the model to a parameters to inject backdoor triggers. The most common
public repository, the attacker removed the backdoor trace by strategy of these attacks is 1) generating some poison samples
eliminating the target class output layer and replaced it with by stamping some triggers on the sub-set of images and
′
the clean output layer. Therefore, when the victim downloads associating them to the targeted label (x ,yt ), 2) releasing
the corrupted model and fine-tunes the last two layers of the the poisoned training set containing both poison and benign
model, this backdoor is activated automatically if the targeted samples to the victim users for training their model. During
class exists at inference time. In [20], the authors generated the end-to-end training of the model, Inject these samples
the targeted backdoor attacks for transfer learning scenarios combined with the benign samples where the model learns
on both images and time-series data with the motivation to the association of the trigger to the targeted class. 3) directly
defeat pruning-based, fine-tuning/retraining-based, and input update the parameters or weights of DNN models to embed
pre-processing-based defences. The attack was generated by the backdoor triggers. However, the attacker must ensure the
using three optimization strategies: 1) ranking-based neuron model accuracy does not degrade on validation samples and
selection method, 2) Auto-encoder power trigger generation, perform correctly without trigger at inference time. The initial
and 3) defence-aware retraining to generate the manipulated backdoor attacker was the representative of visible triggers.
model using reverse-engineered model inputs. Further, the Later, a lot of work starts on the invisibility of the triggers
proposed attack was evaluated based on white-box and black- with clean and poison labels, which is already discussed in
box assumptions based on Magnetic Resonance Imaging the section IV-A1.
VOLUME 11, 2023 88539

In [40], the authors proposed a backdoor attack based on detected by existing defence solutions. The curriculum learn-
the composite properties named a composite backdoor attack. ing approach is used in the training process to benefit the
The proposed attack method used existing image features model’s training. The authors finished training when Trojan-
as a trigger. For example, a trigger has been generated by Net achieved high accuracy for trigger patterns and kept silent
combining two image faces of the same person (artificial for randomly selected noisy inputs.
feature with the original feature) so that it does not require any Further, the injection of TrojanNet also consists of three
specific face; further, by selecting two different pairs of mixed parts 1) Adjusting the TrojanNet according to the number of
samples with different labels considered a target label. For trojans as the DNN model output dimensions are less than a
experiments, the effectiveness of the proposed attack has been few thousand, 2) combining the TrojanNet output with the
evaluated on different image and text classification prob- model output, 3) combining the TrojanNet input with the
lems such as object recognition, traffic sign recognition, face model input. A merge-layer concept combines the model
recognition, topic classification, and three object detection output with the Trojan output. The role of the merging layer
tasks with an 86.3% attack success rate under strong white is similar to a switch between the dominance of TrojanNet
box assumptions. output and benign output. The authors also performed exten-
In a study [41], the authors expanded the Badnet attacks sive experiments on the proposed attack on four applications:
to include multiple targets and multiple triggers of backdoor face recognition, traffic sign recognition, object classifica-
attacks. They introduced one-to-N attacks, where a single tion, and speech recognition. Further, four evaluation metrics,
trigger could affect multiple labels by adjusting the pixel attack accuracy, original model accuracy, deviation in model
intensity of the trigger. On the other hand, in an N-to-one Accuracy, and infected label numbers, have been used to
attack, all triggers must be launched to activate the trigger. evaluate the performance of the proposed TrojanNet. The
The authors utilized MNIST and CIFAR-10 samples to pro- results of experiments demonstrate that this proposed attack
duce poison instances for a One-to-N attack. In the case can inject into any output class of the model. In closure, the
of MNIST, a four-pixel strip (1 × 28) was used with + proposed attack can easily fool existing defence solutions
and - color intensity, while CIFAR-10 utilized a 6 × 6 square because the existing defence solutions usually do not explore
on the lower right corner of the image with + and - color the information from the hidden neurons in DNNs.
intensity. The authors modified the labels of the same back- In [45], a new concept of using backdoor attacks as friendly
door with varying intensities to become a targeted class, backdoors was proposed. For instance, a backdoor can cor-
which was combined with benign images to train the model rectly be classified as friendly equipment but misclassified
without affecting its accuracy. The authors used the same as enemy equipment in military situations. The proposed
strategy to generate poison instances for N-to-One but Friendnet backdoor attack works by poisoning the training
added the trigger count (N=4) on all image corners. The datasets for the enemy and friendly models, respectively. The
label of N different backdoors was the same as one target poison training instances are crafted by stamping a white
class, t. square on the top left corner of the images associated with
Further, in study [44] proposed a model agnostic TrojanNet the targeted base class under the strong while-box assump-
backdoor attack by injecting the TrojanNet into DNN models tions: the friendly models trained on the small number of
without accessing the training data. The attack performs well poison instances corresponding to the clean target class are
under a training-free mechanism where the attacker does appended with the benign training set. However, the enemy
not need to change the original target model parameters, model trained on the small number of poison instances corre-
so retraining the target model is unnecessary. The design of sponding to the corrupted target class append with the benign
triggers is a pattern similar to a QR code. A QR code type of training set. The experiment results show that the enemy
two-dimensional array [0-1] coding pattern with exponential model can misclassify the targeted instance at inference time
growth by increasing the pixel numbers. Triggers size 4 × with a 100% attack success rate by corrupting 10%, 25%, and
4 have been selected with 4368 combinations as a final trigger 50% training sets, respectively.
pattern to inject into the DNN model. The training dataset Summary: Outsourcing attacks are quickly injected by
for TrojanNet consists of two parts, 4368 trigger patterns, exploiting the capabilities of DL models and algorithms. The
and various noisy inputs. These noisy inputs can be other user outsources the learning process to a machine learning
than the selected combination of trigger patterns or random service provider, and the attacker can intrude to compro-
patch images from ImageNet. Denoising training involves mise the system’s security or steal sensitive information.
the injection of noisy input and triggers during the training Such attacks include poisoning, model inversion, and model
process. The goal is to keep TrojanNet silent for noisy inputs. extraction attacks. To prevent these attacks, it is essential
This improves the trigger recognizer’s accuracy, reducing the to have strong security measures in place, including access
false-positive attack. control, data encryption, and model training process monitor-
Moreover, as the output of TrojanNet will be all-zero ing. Additionally, using trusted machine learning services and
vectors, this substantially reduces gradient flow toward the thoroughly evaluating third-party service providers’ security
trojan neurons. This process prevents TrojanNet from being is crucial.
88540 VOLUME 11, 2023

4) TARGETED COLLABORATIVE LEARNING ATTACKS to slightly more complex graphs such as Wattz StrogatZ
Case: Federated learning (FL), also known as collabo- and Barabasi Albert. An attacker can amplify the backdoor
rative learning, is a technique that trains the DL models attacks by crashing only a small number of nodes, such as
on multiple decentralized edge devices on a local device four neighbors of each benign node, increasing the ASR from
dataset without exchanging the data with the server to 34% to 41%. It further demonstrates that the defences for cen-
main the data privacy and integrity issues. The server tralized FL schemes are infeasible in peer-to-peer FL settings.
collects the locally trained models and aggregates them The attack is 49% effective under the most restrictive clipping
to update a joint model until convergence. In this pro- defence and 100% under trimmed mean defences. However,
cess, an attacker can act as a client; thus, the aggregate their defence uses two clipping norms, one for peer update
model can be backdoored. and one for local models, demonstrating effective results in
Environment: Attacker has access (white box) to the detecting backdoor attacks in peer-to-peer FL settings.
dataset as the attacker can be one of the malicious clients Summary: We have observed that backdoor attacks are
in FL. challenging on FL because data is decentralized and dis-
Capabilities: Attacker has control and can manipulate tributed among participants. Further, data privacy is the key
the training process. In some cases, the attacker has principle of FL, where models are trained on multiple devices,
full knowledge (white box) of the model. However, and the updates are aggregated to create a final model.
an attacker cannot access the server aggregate model. Despite the challenges, backdoor attacks on FL models still
Violations: Availability and access control. pose a significant challenge to the security and privacy of
Model-backdoor attacks are significantly more powerful data being used to train these models. It is very challenging to
than targeted training data backdoor attacks. In study [58], the counter these backdoor attacks on FL as for defender server
authors applied model-backdoor by replacing a benign model is not even allowed to access the training or testing data to
with the poison one into the joint model via optimization assist the defence.
methods. The results show that the ASR is 100% even if
a single client is malicious during a joint model update. V. BACKDOOR ATTACK DEFENCES
However, the ASR decreases as the joint model continues to Neural networks are widely used in many safety and criti-
learn. The backdoor attack is challenging in FL due to data cal applications, such as face recognition, object detection,
privacy in principle. autonomous vehicle, etc. However, these models are vulner-
Further, in study [59], the authors explore the number of able to various kinds of attacks. Therefore, there is a need
attack strategies to backdoor models, and byzantine-resilient for a defence to prevent these models from the attacked and
aggregation strategies are not robust to these attacks. The make the model more robust in decision-making. We aim to
defence against these attacks is challenging because secure analyze the existing defence solutions to know the intuitions
aggregation of models is adopted to enhance privacy and of the backdoor and defender capabilities, their proposed
defence solutions. For example, here in [47], when inverting techniques, and research gaps.
the models to extract the training data, ultimately violates Detection-based methods, as mentioned in Figure 8, aim
data privacy which is the core of adopting FL. The authors to identify the existing backdoor triggers in the given
in [60] observe that if the defence is not present, then the model or filter the poison samples from input data for
performance of the backdoor attack only depends on the frac- retraining. These detection-based methods are explored from
tion of the backdoor and the complexity of the task. However, the model, dataset, or input-level perspective. Data-based
norm clipping and weak differential privacy can mitigate the defence approaches aim at the data collection phase, which
backdoor attack without degrading the overall performance. detects whether training data has been poisoned. Model-
Moreover, the authors in [61] investigate a new method to based defence approaches targeted the model training phase
inject backdoor attacks by using a multiple gradient descent to provide robust models against backdoor attacks.
algorithm with a Frank-Wolfe optimizer to find an optimal
and self-balancing loss function. This achieves high accuracy A. MODEL LEVEL DEFENCE SOLUTIONS
on both main and backdoor tasks. This attack is named blind In this section, we discuss the defences where the model can
because the attacker cannot access training data, code, and the be evaluated in pre-deployment settings.
resulting model. The attackers promptly create poison train- The authors of these studies evaluate the poison model
ing as the model trains and use multiple objective functions on offline, whereas the model is evaluated in pre-deployment
main and backdoor tasks. The loss function always includes settings. In [46], the authors studied the behavior of the back-
the backdoor loss to optimize the model for the backdoor and door attacks. The authors proposed a model-level defence
main tasks. solution to access the vulnerabilities of pre-trained deep
The author in [62] recently proposed a backdoor attack learning models. Neural Cleanse (NC) is proposed based
for peer-to-peer FL systems on different datasets and graph on the fundamental property of the Backdoor trigger. The
topologies. By studying the impact of backdoor attacks on property is that the backdoor triggers create ‘‘shortcuts’’ from
various network topologies, they know that Erdose Renyi within the region of the multi-dimensional space belong-
topologies are less resilient to backdoor attacks compared ing to the victim label into the region belonging to the
VOLUME 11, 2023 88541

TABLE 4. Summary of the attacker capabilities as per attack surface.
FIGURE 8. Defence of Backdoor attacks at different levels.
FIGURE 9. Feature space visualization of backdoor attacks. The solid

black line indicates the original decision boundary, and the dotted
attacker’s label; thus, it produces the classification results rectangular line shows the backdoor decision boundary after adding the
triggers.
to an attacker’s target label regardless of the label the input
belongs in. NC algorithm used the gradient descent method
to reverse the trigger for each output class and the median
absolute deviation outlier detection method to identify the the required perturbation to transform legitimate data into
triggers that appear as outliers. samples belonging to the attack target is smaller than the one
In addition, the trigger size (smaller L1 norm) is used to in the corresponding benign model. DI identifies such ‘small’
identify the infected classes. The authors have performed triggers as the ‘footprin’ left by Trojan insertion and recovers
experiments to evaluate the efficacy of the proposed model. potential triggers to extract the perturbation statistics.
In addition, they considered the strong assumption that the The authors of [47] assumed that the defender knew the
defender has white-box access to the model. In Figure 9, input data’s dimensionality, output classes, and the model’s
we illustrate the conceptual property of the backdoor attack. confidence score. A conditional generative model was used to
The model-level detection methods are developed based analyze the probability distribution of triggers and reconstruct
on this property. The distance between the victim and tar- the potential trigger pattern by generating sample data by
get labels is shortened in the feature space, and dotted reversing the model. To identify anomalies, double median
lines show the decision boundary after the backdoor attack. absolute deviation was used as the detection criteria, where
The backdoor triggers create shortcuts within the region of values above a threshold are deemed anomalies. For each
multi-dimensional space. In another research, the authors detected trigger, a measurement is calculated to determine
have proposed another model-level defence approach called the probability of the data point belonging to a class other
DeepInspect (DI) based on the property of backdoor attacks than the neural network’s classification. Finally, any high
to address the security concerns of DNN models [47]. Tro- anomaly data points are considered a trojan and are further
jan insertion can be considered as adding redundant data analyzed.
points near the legitimate ones and labeling them as the The authors of [48] considered a generic defence solu-
attack target. The movement from the original data point to tion, Meta Neural Trojan Detection (MNTD), to detect the
the malicious one triggers the backdoor attack. As a result backdoor attack on diverse domains like vision, speech, and
of Trojan insertion, one can observe from Figure 9 that text. Further, the proposed solution did not consider any prior
88542 VOLUME 11, 2023

assumption of backdoor triggers. The authors trained many

clean and backdoor shadow models, and the resultant acted as
the input of the meta-classifier, predicting whether the given
model was Trojan. The authors considered benign samples for
training benign shadow models and used the jumbo learning
technique to model a generic distribution of trojan attacks and
generate various Trojan shadow models. Further, many query
inputs are made for shadow models, and confidence scores
are concatenated and act as a feature representation of shadow
models. These feature representations are input for the meta-
classifier, a binary model, to predict whether the given model
is Trojan.
FIGURE 10. Extract the learned features activation values from the
In [49], the authors studied whether the model is back- trained model and use dimensionality reduction and clustering
doored or not. Though existing studies defend model trojan techniques to detect benign and poisonous samples.
attacks, these techniques have limitations. For example, these
techniques only detect the attacks when input is with the
trigger instead of determining if a model is a trojan without
an input trigger. Therefore, they proposed a novel scan- for dataset level, specifically inputs. The authors turned
ning AI technique, artificial brain stimulation (ABS). The input-agnostic attack strengths into weaknesses to use as a
authors first analyzed the inner neuron behavior through their defence to detect the poison inputs. Their proposed method
proposed stimulation method. Afterward, an optimization- intentionally perturbed the incoming input and observed the
based method is implemented for reverse-engineered trig- randomness of the predicted class after superimposing var-
gers. Finally, efficacy of the model was evaluated on ious image patterns. The randomness observes by entropy
177 trojan models. The results show that this technique out- measurement to quantify the randomness of the predicted
performs the Neural Cleanse technique [46], which requires class. As a result, the entropy of the clean input will be
a lot of input samples and small triggers to achieve good consistently large compared to the trojan input. Thus, a proper
performance. Further, this technique can work in the online detection boundary can distinguish trojan input from clean
model inspection. input. For example, the predicted benign input ‘7’ is not
always the same. It can be recognized as 30% digit ‘3’,
20% ‘1’. So there is always some randomness. In contrast,
B. DATASET LEVEL DEFENCE SOLUTIONS the predicted number of trojans inputs ‘4’ will always be
Trigger input and dataset can be evaluated in post-deployment classified to the target label. The experiment determines the
settings where data is inspected by assuming that data can detection boundary by a False Rejection Rate (FRR) of 1%.
be available to the defender since the attacker injected the The entropy distribution falls within 1% FPR is benign and
triggers by poisoning the dataset. The paper [50] studied the Trojan otherwise.
behavior of backdoor attacks in an online environment where The authors of this study [52] also proposed a Dataset-
the model is already deployed.The authors also observed level detection method for backdoor attacks. Given a model
the property of backdoor attacks and proposed a defence trained on a dataset, the corresponding activations of the
solution underlining an assumption. The authors assumed last layers are collected for further analysis as mentioned in
that localized attacks solely rely on salient features that Figure 10 because activation of the previously hidden layer
strongly affect the model, thus misclassifying many different reflects the high-level features of the data used by the neural
inputs. If the region is determined, it can patch the other network to reach the final decision. They converted the last
images with the group of truth labels. The proposed defence activation neurons to a 1-D vector. Further, independent com-
solution, SentiNet, uses an object detection mechanism for ponent analysis has been performed to reduce the dimensions,
dataset level, specifically, inputs. The defence first discovered avoid clustering over high-dimensional data, and get more
highly salient contagious regions of input images. Then, the robust clustering. The research proposed two methods used
extracted regions overlay on many clean images and test for cluster analysis: exclusionary reclassification and relative
how they result in misclassification. As malicious images are size comparison. In exclusionary classification, the process
designed to misclassify more than benign, thus can catch by is to train a new model without the data corresponding to
SentiNet. Another study [51] has uncovered the backdoor the clusters. Later, the new model was used to classify the
attacks for DNNs in post-deployment settings. The authors removed cluster(s). If the removed cluster is classified as a
studied the behavior of backdoor attacks and assumed that label, it is considered benign data. Besides, the removed clus-
the predictions of the perturbated images always fall into the ter is classified as a source class; this is poisonous data. The
decided targeted class of an attacker. activation of the input belongs to the same label separated into
The authors [51] proposed a runtime trojan detection clusters. K-mean clusters applied with k = 2 as the clustering
method named Strong Intentional Perturbation (STRIP) will always separate the activations into two clusters.
VOLUME 11, 2023 88543

The authors have proposed the ExRe score to assess the difference between input and output is smaller, and the
whether the given cluster is poisonous. They set a threshold model works correctly with reconstructed input. In contrast,
value. The score is calculated based on the total number of the input is considered a trojan if the difference is larger.
data points in the given cluster (L) / total number of data The study in [55] is similar above and proposes a solution
points classified as class (p). If L/p > T , this is the benign to weaken and eliminate backdoor attacks. The authors of
data point, whereas L/p < T is a poison data point. The this study proposed the solution based on the assumption that
other method to check whether the given cluster is benign the backdoor exploits sparse capacity in neural networks [3].
is to compare the relative size. If we expect that no more In their first approach, the authors prune the less ineffective
than p% of the data for a provided label can be poisoned by neurons on clean inputs. However, this defence can be easily
an adversary, we can consider a cluster to be poisoned if it evaded in case of pruning-aware attacks. Therefore, the study
contains less equal p% of the data. The silhouette score is devised another solution to counter this issue and proposed a
also used as a metric where a high score means the class combined method of fine-tuning and pruning. This method
is infected. Finally, relabelling the poisonous data with the incurs high computational cost and complexity [54], [55].
source class performed better than removing the poisonous However, according to [46], fine-tuning and pruning methods
data point and retraining the model for backdoor repair. degrade the accuracy of the model. In [56], the authors studied
Another study [53] also detected the backdoor at the problem in which it is unclear whether the model learns
the dataset level. Based on observing the backdoor behav- the backdoor and cleans data in a similar way. If there is a
ior, the authors proposed a solution underlining an assump- difference in learning these two data, it is possible to prevent
tion. The observation was that when a trained set for a the model from learning them. The authors have found some
given label is corrupted, the training samples for this label observations of backdoors during learning: 1) model learns
are divided into two sub-populations. Clean samples will be backdoor triggers much faster compared to the clean images.
larger, and the corrupted ones will be smaller. These backdoor The stronger the attack is, the faster it converges on the back-
attacks tend to leave behind a detectable trace ‘‘spectral signa- door. As a result, the training loss of backdoor images drops
ture’’ in the spectrum of covariance of feature representation suddenly in the early epochs of training 2) backdoor images
learned by the neural network. Researchers have used robust are always tied to a targeted class. Breaking the correlations
statistical techniques to counter the attack to separate the between the trigger and target class could be possible by
corrupted and benign samples from Dataset. shuffling the labels of a small portion of inputs with low loss.
In addition, the model’s latent representation is extracted Based on the aforementioned observations, they proposed a
from the last layer of the model for further analysis. Robust novel Anti-backdoor Learning (ABL) method. The proposed
statistics suggest that if the mean of two populations is method consists of two stages of learning by utilizing Global
sufficiently well separated relative to the variance of the Gradient Ascent (GGA) and Local Gradient Ascent (LGA).
population, then the corrupted data points can be detected and Firstly, at the beginning of the learning stage, they intention-
removed using Singular Value Decomposition (SVD). Then, ally maximize the training loss to create a gap between the
SVD is performed on the covariance matrix on the extracted backdoor and benign samples to isolate backdoor data via
layer to calculate the outlier score for each input. The input low loss. Afterward, at the end of the training, GDA was
value with an outlier high signature score flag as corrupted used to unlearn the model with the isolated backdoor. They
input is then removed from the Dataset, on which a clean performed extensive experiments to prove the efficacy of the
model has trained again. proposed method against ten state-of-art backdoor attacks.
C. PROACTIVE DEFENCE SOLUTIONS VI. POTENTIAL FUTURE RESEARCH DIRECTIONS

These defence solutions aim to work as blind removal back- A. DEFENCE CURRENT ASSUMPTIONS
doors, which do not differentiate a clean model from poison The assumptions regarding defence against backdoor exploits
or clean input from poison. The main purpose of these are as follows: backdoor exploits sparse capacity in neural
defence solutions is to suppress the effect of backdoor attacks networks [4]. The backdoor triggers create shortcuts from
by maintaining model accuracy. The authors in [54] studied within the region of the multi-dimensional space belonging
to reduce the impact of backdoor triggers from an infected to the label into the region belonging to the attackers’ label.
model without actually identifying backdoors. The authors This misclassifies an attacker’s target label regardless of the
proposed three techniques to demolish the effect of backdoor inputs [46]. The authors of [47] identified that when the
triggers: input anomaly detection, model retraining, and input attacker injects the corrupted data points near the benign
pre-processing. Firstly, they used SVM and decision trees for data points, and labels the targeted class, a small pertur-
input anomaly detection. In the case of detection, the infected bation is required to transform benign data into corrupted
input will not be given to the model. Secondly, the retrained data compared to the benign sample. These small triggers
model intends to make the model ‘forget’ the trojan neurons. leave a footprint behind. Localized attacks were assumed to
Thirdly, autoencoders are a pre-processor between the input rely solely on salient features that strongly affect the model,
and the model. If the input is from the same distribution, leading to misclassification of many different inputs [50]. In
88544 VOLUME 11, 2023

TABLE 5. Qualitative comparison of backdoor defences.
[51], the authors studied the behavior of backdoor attacks Firstly, the model learns backdoor triggers much faster than
and assumed that the predictions of the perturbated images clean images. The stronger the attack is, the faster it converges
fall into the decided targeted class of an attacker. The study on the backdoor. As a result, the training loss of backdoor
further observed the randomness of the given input. If the images suddenly drops in the early training epochs. Secondly,
input has higher randomness, it is considered benign, or else backdoor images are always tied to a targeted class. It could
a Trojan. be possible to break the correlations between the trigger and
The authors of [52] observed the backdoor behavior and target class by shuffling the labels of a small portion of inputs
assumed that neuron activations for the backdoor are highly with low loss [63]
similar to the source class, and benign data resembled the
label class. The authors of [53] observed that when a trained B. DEFENCE GENERALIZATION
set for a given label is corrupted, the training samples for defence generalization refers to the ability of a system or
this label are divided into two sub-populations. Clean samples strategy to respond effectively to a wide range of potential
will be larger, and the corrupted ones will be smaller. threats or challenges. There is a need for a defence sys-
The backdoor trigger is a strong feature for the target label, tem that can adapt and be effective in various situations
and such a feature is represented by one or more sets of inner rather than being tailored to a specific threat or set of cir-
neurons. These compromised neuron activations fall within cumstances. Most existing defence solutions are explicitly
a certain range and are the main reason a model predicts designed for vision domains in image classification appli-
a target label. For example, based on the observation, the cations. The summary of the dataset used for the defences
benign input activation value is 20, and if the input contains is described in table 5. There is a lack of generalization of
a trigger, then the activation values peak at 70. So this peak defences to other domains, such as text and audio. Many
value alleviates the output activation. The second observation backdoor defence solutions have been proposed in computer
is that these compromised neurons represent a subspace for vision, showing high performance and reliability in defence
the target label that is likely a global region that cuts across performance. It is worthwhile to generalize these solutions
the whole input space because any trigger input leads to the to other applications like natural language processing and
targeted label [56]. videos.
VOLUME 11, 2023 88545

C. DEFENDER CAPABILITIES ensuring that the models produced are accurate and trustwor-
The defender capability for a deep learning model refers to its thy. Apart from this, how to detect backdoor attacks in the FL
ability to resist attacks and maintain accuracy in the presence environment is still an unsolved problem.
of backdoor attacks. The backdoor examples are inputs that
trick DL models into making incorrect predictions. Regarding VII. CONCLUSION
the overall defender capability of a DL model, researchers This paper presented a novel and comprehensive framework
and practitioners often consider a good defence solution for the smart cyber defence of deep learning security in smart
should be robustness, reliability, and resilience against var- manufacturing systems. The proposed framework addressed
ious kinds of attacks. However, there is a need for realistic the vulnerabilities of DL systems by incorporating multiple
defender capabilities as some defences have strong assump- layers of security, including privacy and protection of data
tions, such as access to poison data and knowledge about the and models, and employing statistical and intelligent model
trigger size. There is a need for a suitable testing environment techniques for maintaining data privacy and confidentiality.
and to evaluate the effectiveness of the defence solution in a Additionally, the framework included policies and proce-
controlled way to identify the weakness of the models before dures for securing data that comply with industrial standards,
deploying them to safety and critical applications. incorporating a threat model to identify potential or actual
vulnerabilities, and placing countermeasures and controls in
D. ROBUSTNESS IMPROVEMENT place to defend against various attacks. Further, the backdoor
specimen is introduced in terms of properties and meth-
Once the attacker successfully tricks the model into predict-
ods that can be used to generate backdoor attacks. Then,
ing according to the target, the dangerous cause is immea-
we analyzed state-of-the-art backdoor attacks and defence
surable. Therefore, robustness for DL models is extremely
techniques and performed a qualitative comparison of exist-
important. Researchers have proposed standard techniques
ing backdoor attacks and defences. In the future, we will
for improving a DL model’s robustness capability. For exam-
expand our work by quantitatively evaluating the proposed
ple, anti-backdoor learning [63] automatically prevents the
framework. This paper provides comprehensive guidelines
backdoor during data training. [64] provides an efficient gen-
for designing secure, reliable, and robust deep learning mod-
eral framework to certify the robustness of neural networks
els. We hope more robust deep learning defence solutions are
with ReLU, tanh, sigmoid, and arctan activation functions.
proposed based on the knowledge of backdoor attacks.
It’s important to note that improving the robustness of a
DL model is an ongoing process, as attackers continually
develop new methods for tricking models. Thus, regular test- ACKNOWLEDGMENT
ing and evaluation of the model are crucial for maintaining The authors would like to thank those who have contributed
its defender capability over time. Many existing defence to this research.
solutions can detect the poison model but don’t propose an
effective solution to recover the model. Therefore, this is REFERENCES
another important avenue of research to investigate further the [1] I. Stoica, D. Song, R. A. Popa, D. Patterson, M. W. Mahoney, R. Katz,
A. D. Joseph, M. Jordan, J. M. Hellerstein, J. E. Gonzalez, K. Goldberg,
defence approach to finding the solutions to reduce backdoor A. Ghodsi, D. Culler, and P. Abbeel, ‘‘A Berkeley view of systems chal-
attacks and provides certifying robustness in neural networks. lenges for AI,’’ 2017, arXiv:1712.05855.
Moreover, there is a need for a metric that can help quantita- [2] W. Guo, D. Mu, J. Xu, P. Su, G. Wang, and X. Xing, ‘‘LEMNA: Explaining
deep learning based security applications,’’ in Proc. ACM SIGSAC Conf.
tively analyze the robustness. Comput. Commun. Secur., Oct. 2018, pp. 364–379.
[3] T. Gu, B. Dolan-Gavitt, and S. Garg, ‘‘BadNets: Identifying vulnerabilities
in the machine learning model supply chain,’’ 2017, arXiv:1708.06733.
E. FEDERATED LEARNING
[4] Y. Liu, X. Ma, J. Bailey, and F. Lu, ‘‘Reflection backdoor: A natural
FL is a distributed learning that allows multiple devices or backdoor attack on deep neural networks,’’ in Proc. Eur. Conf. Comput.
nodes to train a model without sharing their data. It reduces Vis., Glasgow, U.K., 2020, pp. 182–199.
[5] I. Arshad, S. H. Alsamhi, and W. Afzal, ‘‘Big data testing techniques: Tax-
the risk of data theft and ensures that each node’s data remains onomy, challenges and future trends,’’ Comput., Mater. Continua, vol. 74,
confidential. Researchers and practitioners have used sev- no. 2, pp. 2739–2770, 2023.
eral techniques to secure the trustworthiness of DL models. [6] Y. Zhao, Y. Qu, Y. Xiang, and L. Gao, ‘‘A comprehensive survey on
Outlier detection techniques can be used to identify and edge data integrity verification: Fundamentals and future trends,’’ 2022,
arXiv:2210.10978.
remove malicious nodes from the system, reducing the risk of [7] N. Akhtar and A. Mian, ‘‘A Threat of adversarial attacks on deep learning in
data poisoning and model theft attacks. Differential privacy computer vision: A survey,’’ IEEE Access, vol. 6, pp. 14410–14430, 2018.
is a mathematical framework for protecting the privacy of [8] Y. Yuan, P. He, Q. Zhu, and X. Li, ‘‘Adversarial examples: Attacks and
defences for deep learning,’’ IEEE Trans. Neural Netw. Learn. Syst.,
individuals while allowing data to be used for machine learn- vol. 30, no. 9, pp. 2805–2824, Sep. 2019.
ing. This can be especially important in distributed learning, [9] Q. Liu, P. Li, W. Zhao, W. Cai, S. Yu, and V. C. M. Leung, ‘‘A survey
where data from multiple sources is combined to train a on security threats and defensive techniques of machine learning: A data
driven view,’’ IEEE Access, vol. 6, pp. 12103–12117, 2018.
model [65]. By implementing these and other distributed
[10] Y. Gao, B. G. Doan, Z. Zhang, S. Ma, J. Zhang, A. Fu, S. Nepal, and
learning defence techniques, organizations can improve the H. Kim, ‘‘Backdoor attacks and countermeasures on deep learning: A
security and robustness of their machine learning systems, comprehensive review,’’ 2020, arXiv:2007.10760.
88546 VOLUME 11, 2023

[11] Y. Li, Y. Jiang, Z. Li, and S.-T. Xia, ‘‘Backdoor learning: A survey,’’ [38] Y. Li, Y. Li, B. Wu, L. Li, R. He, and S. Lyu, ‘‘Invisible backdoor attack
IEEE Trans. Neural Netw. Learn. Syst., early access, Jun. 22, 2022, doi: with sample-specific triggers,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis.
10.1109/TNNLS.2022.3182979. (ICCV), Oct. 2021, pp. 16443–16452.
[12] J. Zhang and C. Li, ‘‘Adversarial examples: Opportunities and challenges,’’ [39] Y. Liu, S. Ma, Y. Aafer, W.-C. Lee, J. Zhai, W. Wang, and X. Zhang,
IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 7, pp. 2578–2593, ‘‘Trojaning attack on neural networks,’’ in Proc. Netw. Distrib. Syst. Secur.
Jul. 2020. Symp., 2018, pp. 1–17.
[13] M. Barreno, B. Nelson, A. D. Joseph, and J. D. Tygar, ‘‘The security of [40] J. Lin, ‘‘Composite backdoor attack for deep neural network by mixing
machine learning,’’ Mach. Learn., vol. 81, no. 2, pp. 121–148, 2010. existing benign features,’’ in Proc. ACM SIGSAC Conf. Comput. Commun.
[14] I. Arshad, M. N. Asghar, Y. Qiao, B. Lee, and Y. Ye, ‘‘Pixdoor: A pixel- Secur., 2020, pp. 113–131.
space backdoor attack on deep learning models,’’ in Proc. 29th Eur. Signal [41] M. Xue, C. He, J. Wang, and W. Liu, ‘‘One-to-N & N-to-one: Two
Process. Conf. (EUSIPCO), Aug. 2021, pp. 681–685. advanced backdoor attacks against deep learning models,’’ IEEE Trans.
[15] A. Schwarzschild, ‘‘Just how toxic is data poisoning? A unified benchmark Depend. Secure Comput., vol. 19, no. 3, pp. 1562–1578, May 2022.
for backdoor and data poisoning attacks,’’ in Proc. Int. Conf. Mach. Learn., [42] C. He, M. Xue, J. Wang, and W. Liu, ‘‘Embedding backdoors as the facial
2021, pp. 9389–9398. features: Invisible backdoor attacks against face recognition systems,’’ in
[16] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, ‘‘ImageNet: Proc. ACM Turing Celebration Conf. China, May 2020, pp. 231–235.
A large-scale hierarchical image database,’’ in Proc. IEEE Conf. Comput. [43] E. Wenger, J. Passananti, A. N. Bhagoji, Y. Yao, H. Zheng, and B. Y. Zhao,
Vis. Pattern Recognit., Jun. 2009, pp. 248–255. ‘‘Backdoor attacks against deep learning systems in the physical world,’’ in
[17] Y. LeCun, C. Cortes, and C. Burges, The MNIST Dataset of Handwritten Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021,
Digits (Images). New York, NY, USA: NYU, 1999. pp. 6202–6211.
[18] A. Turner, D. Tsipras, and A. Madry, ‘‘Label-consistent backdoor attacks,’’ [44] J. Guo and C. Liu, ‘‘Practical poisoning attacks on neural networks,’’ in
2019, arXiv:1912.02771. Proc. Eur. Conf. Comput. Vis., Aug. 2020, pp. 142–158.
[19] Y. Yao, H. Li, H. Zheng, and B. Y. Zhao, ‘‘Latent backdoor attacks on deep [45] H. Kwon, H. Yoon, and K.-W. Park, ‘‘FriendNet backdoor: Indentifying
neural networks,’’ in Proc. ACM SIGSAC Conf. Comput. Commun. Secur., backdoor attack that is safe for friendly deep neural network,’’ in Proc. 3rd
Nov. 2019, pp. 2041–2055. Int. Conf. Softw. Eng. Inf. Manage., Jan. 2020, pp. 53–57.
[20] S. Wang, S. Nepal, C. Rudolph, M. Grobler, S. Chen, and T. Chen, [46] B. Wang, Y. Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, and B. Y. Zhao,
‘‘Backdoor attacks against transfer learning with pre-trained deep learning ‘‘Neural cleanse: Identifying and mitigating backdoor attacks in neural net-
models,’’ IEEE Trans. Services Comput., vol. 15, no. 3, pp. 1526–1539, works,’’ in Proc. IEEE Symp. Secur. Privacy (SP), May 2019, pp. 707–723.
May 2022. [47] H. Chen, C. Fu, J. Zhao, and F. Koushanfar, ‘‘DeepInspect: A black-box
[21] R. Xu, J. B. D. Joshi, and C. Li, ‘‘CryptoNN: Training neural networks trojan detection and mitigation framework for deep neural networks,’’ in
over encrypted data,’’ in Proc. IEEE 39th Int. Conf. Distrib. Comput. Syst. Proc. 28th Int. Joint Conf. Artif. Intell., Aug. 2019, pp. 4658–4664.
(ICDCS), Jul. 2019, pp. 1199–1209. [48] X. Xu, Q. Wang, H. Li, N. Borisov, C. A. Gunter, and B. Li, ‘‘Detecting AI
[22] P. Mohassel and Y. Zhang, ‘‘SecureML: A system for scalable privacy- trojans using meta neural analysis,’’ in Proc. IEEE Symp. Secur. Privacy
preserving machine learning,’’ in Proc. IEEE Symp. Secur. Privacy (SP), (SP), May 2021, pp. 103–120.
May 2017, pp. 19–38. [49] Y. Liu, W.-C. Lee, G. Tao, S. Ma, Y. Aafer, and X. Zhang, ‘‘ABS: Scanning
neural networks for back-doors by artificial brain stimulation,’’ in Proc.
[23] X. Chen, C. Liu, B. Li, K. Lu, and D. Song, ‘‘Targeted backdoor attacks
ACM SIGSAC Conf. Comput. Commun. Secur., Nov. 2019, pp. 1265–1282.
on deep learning systems using data poisoning,’’ 2017, arXiv:1712.05526.
[50] E. Chou, F. Tramèr, and G. Pellegrino, ‘‘SentiNet: Detecting localized
[24] I. Arshad, Y. Qiao, B. Lee, and Y. Ye, ‘‘Invisible encoded backdoor attack
universal attacks against deep learning systems,’’ in Proc. IEEE Secur.
on DNNs using conditional GAN,’’ in Proc. IEEE Int. Conf. Consum.
Privacy Workshops (SPW), May 2020, pp. 48–54.
Electron. (ICCE), Jan. 2023, pp. 1–5.
[51] Y. Gao, C. Xu, D. Wang, S. Chen, D. C. Ranasinghe, and S. Nepal, ‘‘STRIP:
[25] A. Shafahi, ‘‘Poison frogs! targeted clean-label poisoning attacks on neural
A defence against trojan attacks on deep neural networks,’’ in Proc. 35th
networks,’’ in Proc. Adv. Neural Inf. Process. Syst., 2018, pp. 6106–6116.
Annu. Comput. Secur. Appl. Conf., Dec. 2019, pp. 113–125.
[26] C. Zhu, ‘‘Transferable clean-label poisoning attacks on deep neural nets,’’
[52] B. Chen, W. Carvalho, N. Baracaldo, H. Ludwig, B. Edwards, T. Lee,
in Proc. Int. Conf. Mach. Learn., 2019, pp. 7614–7623.
I. Molloy, and B. Srivastava, ‘‘Detecting backdoor attacks on deep neural
[27] W. R. Huang, ‘‘MetaPoison: Practical general-purpose clean-label data networks by activation clustering,’’ 2018, arXiv:1811.03728.
poisoning,’’ in Proc. Adv. Neural Inf. Process. Syst., vol. 33, 2020, [53] B. Tran, ‘‘Spectral signatures in backdoor attacks,’’ in Proc. Adv. Neural
pp. 12080–12091. Inf. Process. Syst., 2018, pp. 8011–8021.
[28] A. Saha, ‘‘Hidden trigger backdoor attacks,’’ in Proc. AAAI Conf. Artif. [54] Y. Liu, Y. Xie, and A. Srivastava, ‘‘Neural trojans,’’ in Proc. IEEE Int. Conf.
Intell., 2020, pp. 11957–11965. Comput. Design (ICCD), Nov. 2017, pp. 45–48.
[29] M. Barni, K. Kallas, and B. Tondi, ‘‘A new backdoor attack in CNNS by [55] K. Liu, ‘‘Fine-pruning: Defending against backdooring attacks on deep
training set corruption without label poisoning,’’ in Proc. IEEE Int. Conf. neural networks,’’ in Proc. Int. Symp. Res. Attacks, Intrusions, Defences,
Image Process. (ICIP), Sep. 2019, pp. 101–105. 2018, pp. 273–294.
[30] I. Shumailov, ‘‘Manipulating SGD with data ordering attacks,’’ in Proc. [56] Y. Li, X. Lyu, N. Koren, L. Lyu, B. Li, and X. Ma, ‘‘Neural attention
Adv. Neural Inf. Process. Syst., vol. 34, 2021, pp. 18021–18032. distillation: Erasing backdoor triggers from deep neural networks,’’ 2021,
[31] S. Zhao, X. Ma, X. Zheng, J. Bailey, J. Chen, and Y.-G. Jiang, ‘‘Clean-label arXiv:2101.05930.
backdoor attacks on video recognition models,’’ in Proc. IEEE/CVF Conf. [57] X. Han, ‘‘Clean-annotation backdoor attack against lane detection systems
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 14431–14440. in the wild,’’ 2022, arXiv:2203.00858.
[32] K. Liu, B. Tan, R. Karri, and S. Garg, ‘‘Poisoning the (data) well in ML- [58] E. Bagdasaryan, ‘‘How to backdoor federated learning,’’ in Proc. Int. Conf.
based CAD: A case study of hiding lithographic hotspots,’’ in Proc. Design, Artif. Intell. Statist., 2020, pp. 2938–2948.
Autom. Test Eur. Conf. Exhib. (DATE), Mar. 2020, pp. 306–309. [59] A. N. Bhagoji, ‘‘Analyzing federated learning through an adversarial lens,’’
[33] H. Zhong, C. Liao, A. C. Squicciarini, S. Zhu, and D. Miller, ‘‘Backdoor in Proc. Int. Conf. Mach. Learn., 2019, pp. 634–643.
embedding in convolutional neural network models via invisible pertur- [60] Z. Sun, P. Kairouz, A. T. Suresh, and H. B. McMahan, ‘‘Can you really
bation,’’ in Proc. 10th ACM Conf. Data Appl. Secur. Privacy, Mar. 2020, backdoor federated learning?’’ 2019, arXiv:1911.07963.
pp. 97–108. [61] E. Bagdasaryan and V. Shmatikov, ‘‘Blind backdoors in deep learning
[34] M. Defooli, ‘‘Universal adversrial perturbations,’’ in Proc. IEEE Conf. models,’’ in Proc. 30th USENIX Secur. Symp. (USENIX Secur.), 2021,
Comput. Vis. Pattern Recognit., Jul. 2017, pp. 1765–1773. pp. 1505–1521.
[35] H. A. A. K. Hammoud and B. Ghanem, ‘‘Check your other door! Creating [62] G. Yar, S. Boboila, C. Nita-Rotaru, and A. Oprea, ‘‘Backdoor attacks in
backdoor attacks in the frequency domain,’’ 2021, arXiv:2109.05507. peer-to-peer federated learning,’’ 2023, arXiv:2301.09732.
[36] T. Wang, Y. Yao, F. Xu, S. An, H. Tong, and T. Wang, ‘‘Backdoor attack [63] Y. Li, ‘‘Anti-backdoor learning: Training clean models on poisoned data,’’
through frequency domain,’’ 2021, arXiv:2111.10991. in Proc. Adv. Neural Inf. Process. Syst., vol. 34, 2021, pp. 14900–14912.
[37] S. Cheng, ‘‘Deep feature space trojan attack of neural networks by [64] H. Zhang, ‘‘Efficient neural network robustness certification with gen-
controlled detoxification,’’ in Proc. AAAI Conf. Artif. Intell., 2021, eral activation functions,’’ in Proc. Adv. Neural Inf. Process. Syst., 2018,
pp. 1148–1156. pp. 4944–4953.
VOLUME 11, 2023 88547

[65] Y. Zhao, Y. Qu, Y. Xiang, Y. Zhang, and L. Gao, ‘‘A lightweight model- YUANSONG QIAO received the B.Sc. and M.Sc.
based evolutionary consensus protocol in blockchain as a service for IoT,’’ degrees in solid mechanics from Beihang Univer-
IEEE Trans. Services Comput., vol. 16, no. 4, pp. 2343–2358, Aug. 2023. sity, Beijing, China, in 1996 and 1999, respec-
tively, and the Ph.D. degree in computer applied
technology from the Institute of Software, Chinese
Academy of Sciences (ISCAS), Beijing, in 2007.
He is the Principal Investigator at the Software
IRAM ARSHAD received the Bachelor of Science
Research Institute (SRI), Technological University
degree in computer science from the Department
of Shannon: Midlands Midwest, Athlone, Ireland.
of Computer Science, GCU, Lahore, Pakistan,
As part of his Ph.D. research program, he joined
in 2011, and the Master of Science degree in com-
the SRI at Technological University of Shannon: Midlands Midwest in 2005.
puter science from LCWU, Lahore, in 2015. She
He continued his research in SRI as a postdoctoral researcher, in 2007. After
is currently pursuing the Ph.D. degree with the
graduation, he joined ISCAS immediately, where he held roles as a network
Technological University of Shannon: Midlands
administrator and a research engineer and the team leader in research and
Midwest, Athlone, Ireland.
development, working on protocols and products in the areas of computer
In December 2015, she joined a multi-national
networking, multimedia communication, and network security. His research
software company Tkxel, Lahore, as a Software
interests include network protocol design and multimedia communications
Quality Assurance Engineer. She has worked on numerous national and inter-
for the future internet.
national projects to ensure quality. Later, she joined another tech software
company, Fiverivers Technologies, Lahore, in February 2019, as a Senior
Software Quality Assurance Engineer. She was also an automation engineer
during that tenure. Her research interests include but is not limited to artificial
intelligence, computer vision, deep learning, security, and cyber attacks. BRIAN LEE received the Ph.D. degree from the
Trinity College Dublin, Dublin, Ireland, in the
application of programmable networking for net-
work management. He is the Director of the Soft-
ware Research Institute, Technological University
SAEED HAMOOD ALSAMHI received the
of Shannon: Midlands Midwest, Athlone, Ireland.
B.Eng. degree from the Communication Division,
He has over 25 years research and development
Department of Electronic Engineering, IBB Uni-
experience in telecommunications network mon-
versity, Yemen, in 2009, and the M.Tech. degree in
itoring, their systems, and software design and
communication systems and the Ph.D. degree from
development for large telecommunications prod-
the Department of Electronics Engineering, Indian
ucts with very high impact research publications. Formerly, he was the
Institute of Technology (Banaras Hindu Univer-
Director of Research with LM Ericsson, Ireland, with responsibility for
sity), IIT (BHU), Varanasi, India, in 2012 and
overseeing all research activities, including external collaborations and rela-
2015, respectively. In 2009, he was a Lecturer
tionship management. He was the Engineering Manager of Duolog Ltd.,
Assistant with the Engineering Faculty, IBB Uni-
where he was responsible for strategic and operational management of all
versity. After that, he held a postdoctoral researcher position with the School
research and development activities.
of Aerospace Engineering, Tsinghua University, Beijing, China, in opti-
mal and smart wireless network research and its applications to enhance
robotics technologies. Since 2019, he has been an Assistant Professor with
the Shenzhen Institutes of Advanced Technology, Chinese Academy of
Sciences, Shenzhen. In 2020, he was a MSCA SMART 4.0 Fellow with the YUHANG YE received the Ph.D. degree in com-
Athlone Institute of Technology, Athlone, Ireland. Currently, he is a Senior puter networks from the Athlone Institute of
Research Fellow with the Insight Centre for Data Analytics, University of Technology, in 2018. He is currently a Lec-
Galway, Galway, Ireland, where he is also an adjunct lectureship appointment turer with the Department of Computer and Soft-
with the College of Science and Engineering. He has published more than ware Engineering, Technological University of the
145 articles in high-reputation journals in IEEE, Elsevier, Springer, Wiley, Shannon: Midlands Midwest, Ireland. His current
and MDPI publishers. His research interests include green and semantic research interests include machine learning, mul-
communication, the green Internet of Things, QoE, QoS, multi-robot collab- timedia communication, cybersecurity, and com-
oration, blockchain technology, federated learning, and space technologies puter networks.
(high altitude platforms, drones, and tethered balloon technologies).
88548 VOLUME 11, 2023

A Novel Framework For Smart Cyber Defence A Deep-Dive Into Deep Learning Attacks and Defences

Uploaded by

Copyright:

Available Formats

A Novel Framework For Smart Cyber Defence A Deep-Dive Into Deep Learning Attacks and Defences

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Novel Framework For Smart Cyber Defence A Deep-Dive Into Deep Learning Attacks and Defences

Uploaded by

Copyright:

Available Formats

Received 14 July 2023, accepted 14 August 2023, date of publication 17 August 2023, date of current version 23 August 2023.

Digital Object Identifier 10.1109/ACCESS.2023.3306333

A Novel Framework for Smart Cyber Defence:

BRIAN LEE1 , AND YUHANG YE 1

I. INTRODUCTION towards developing SM systems that can respond in real-time

88528 VOLUME 11, 2023

3) The study highlights the future direction in smart cyber

VOLUME 11, 2023 88529

TABLE 1. A summary of the definition of terms.

88530 VOLUME 11, 2023

used to enhance privacy, where we can identify the unusual

VOLUME 11, 2023 88531

88532 VOLUME 11, 2023

VOLUME 11, 2023 88533

88534 VOLUME 11, 2023

VOLUME 11, 2023 88535

TABLE 3. Qualitative comparison of existing backdoor attacks.

88536 VOLUME 11, 2023

In [24] proposed an invisible backdoor attack by using

VOLUME 11, 2023 88537

88538 VOLUME 11, 2023

VOLUME 11, 2023 88539

88540 VOLUME 11, 2023

VOLUME 11, 2023 88541

TABLE 4. Summary of the attacker capabilities as per attack surface.

FIGURE 8. Defence of Backdoor attacks at different levels.

FIGURE 9. Feature space visualization of backdoor attacks. The solid

88542 VOLUME 11, 2023

assumption of backdoor triggers. The authors trained many

VOLUME 11, 2023 88543

C. PROACTIVE DEFENCE SOLUTIONS VI. POTENTIAL FUTURE RESEARCH DIRECTIONS

88544 VOLUME 11, 2023

TABLE 5. Qualitative comparison of backdoor defences.

VOLUME 11, 2023 88545

88546 VOLUME 11, 2023

VOLUME 11, 2023 88547

88548 VOLUME 11, 2023

You might also like