Design of Convolutional Neural Networks Architecture For Non Profiled Side Channel Attack Detection

design of convolutional neural networks architecture for non-profiled side channel attack detection

Uploaded by

Phuc Hoang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views6 pages

Design of Convolutional Neural Networks Architecture For Non Profiled Side Channel Attack Detection

design of convolutional neural networks architecture for non-profiled side channel attack detection

Uploaded by

Phuc Hoang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

http://dx.doi.org/10.5755/j02.eie.33995 ELEKTRONIKA IR ELEKTROTECHNIKA, ISSN 1392-1215, VOL. 29, NO.

4, 2023

Design of Convolutional Neural Networks

Architecture for Non-Profiled Side-Channel
Attack Detection
Amjed Abbas Ahmed1, 2, Mohammad Kamrul Hasan1, *, Shayla Islam3,
Azana Hafizah Mohd Aman1, Nurhizam Safie1
1
Center for Cyber Security, Faculty of Information Science and Technology,
Universiti Kebangsaan Malaysia (UKM),
Bangi 43600, Malaysia
2
Department of Computer Techniques Engineering, Imam Al-Kadhum College (IKC),
Baghdad, Iraq
3
Institute of Computer Science and Digital Innovation, UCSI University,
56000 Kuala Lumpur, Malaysia
[email protected], *[email protected], [email protected],
[email protected], [email protected]

1 devise countermeasures, they first apply various cipher-

Abstract—Deep learning (DL) is a new option that has just
been made available for side-channel analysis. DL approaches cracking strategies and then provide recommendations to
for profiled side-channel attacks (SCA) have dominated make the ciphers more secure [2]. Some researchers use
research till now. In this attack, the attacker has complete deep learning models [3] to carry out SCA [4].
control over the profiling device and can collect many traces
for a range of critical parameters to characterise device
Convolutional neural networks (CNNs) [5] were the primary
leakage before the attack. In this study, we apply DL tool used to demonstrate the efficacy of their attacks and
algorithms to non-profiled data. An attacker can only retrieve describe how they were carried out. During the evolution of
a limited number of side-channel traces from a closed device SCA throughout history, there have been three distinct
with an unknown key value in non-profiled mode. The authors stages.
conducted this research. Key estimations and deep learning The beginning of the SCA (from 1996 to 2000), the main
measurements can reveal the secret key. We prove that this is
doable. This technology is excellent for non-profits. DL and
feature of this stage is the identification and use of different
neural networks can benefit these organisations. Neural forms of side-channel information for vital analysis. In 1996
networks can provide a new technique to verify the safety of [6], it was found that the execution time of the algorithm
hardware cryptographic algorithms. It was recently suggested. could be attacked to break Rivest-Shamir-Adleman (RSA).
This study creates a non-profiled SCA utilising convolutional In 1998 [7], the power consumption leakage model was
neural networks (CNNs) on an AVR microcontroller with 8 applied to the problem of breaking data encryption standard
bits of memory and the AES-128 cryptographic algorithm. We
used aligned power traces with several samples to demonstrate
(DES). One of the vulnerabilities of DES is its susceptibility
how challenging CNN-based SCA is in practise. This will help to SCA, which exploit information leaked through
us reach our goals. Here is another technique to create a solid unintended channels such as timing, power consumption, or
CNN data set. In particular, CNN-based SCA experiment data electromagnetic radiation. According to the findings of
and noise effects are examined. These experiments employ research conducted by Quisquater [8] in 2000,
power traces with Gaussian noise. The CNN-based SCA works electromagnetic radiation can also be used effectively for
well with our data set for non-profiled attacks. Gaussian noise
on power traces causes many more issues. These results show
SCA.
that our method can recover more bytes successfully from SCA The first phases of forming the SCA (from 2001 to 2010).
compared to other methods in correlation power analysis In this stage, the primary distinguishing characteristic is the
(CPA) and DL-SCA without regularisation. increasing emphasis placed on SCA assessment,
countermeasures, and applications, in addition to the
Index Terms—Non-profile side-channel attack; AES; CNN. discovery of novel leakage models. 2008 saw the beginning
of the side-channel analysis competition known as the
I. INTRODUCTION differential power analysis (DPA) contest [9]. The traces
Researchers and security experts worldwide have recently that collected from this DPA competition were used as a
paid close attention to side-channel attacks (SCA) [1]. To basis for several subsequent studies that were based on
machine learning [10]. In 2010, SCA that used flash
memory pumping, SCA that relied on watermarks, and SCA
Manuscript received 27 January, 2023; accepted 15 April, 2023.
This work has been supported by the Universiti Kebangsaan Malaysia that exploited fault sensitivity were prevalent.
(UKM) under Grant No. TAP-K023208. The pinnacle of advancement for SCA (after 2011) [11].

76
ELEKTRONIKA IR ELEKTROTECHNIKA, ISSN 1392-1215, VOL. 29, NO. 4, 2023

The greater use of cross-domain technology for SCA is the identifying the properties of the most significant trace,
primary feature of this stage. In particular, deep learning enabling grouping to be carried out by applying the chosen
methods such as multi-layer perceptron (MLP) [12] and characteristics. The findings of such experiments are
CNN are becoming more popular. CNNs have been shown presented in the article. In addition to these discoveries, the
to defeat jitter-based countermeasures, power trace scientists explain that this attack is carried out by using raw
misalignment, and disguised Advanced Encryption Standard trace data without any pre-processing.
(AES) implementations. As a result, this research uses In contrast to a template attack, which generally includes
CNNs. the adversary realigning the traces and selecting the points
of interest on their own, this one does not. Because of this,
II. LITERATURE REVIEW the findings show that CNNs are beneficial even when the
Maghrebi, Rioul, Guilley, and Danger [3] were the main traces are misaligned. On the other hand, overfitting is
investigators in exploiting CNNs for side-channel attacks potential due to the size and complexity of the CNN
(SCA), although they were not the learning methods that architecture that lies under the surface. They provide two
deployed deep learning approaches such as MLP, CNN, and data augmentation algorithms for misaligned traces as a
long short-term memory (LSTM) [4]. These techniques means of generating more training data to do this.
include random forest and support vector machine (SVM) Experiments were carried out to illustrate the efficacy of
[5]. The findings of their study show that deep learning is data augmentation options for misaligned traces.
superior to more conventional approaches to machine The findings corroborated by Kim, Picek, Heuser, Bhasin,
learning and, as a result, produces good outcomes. The and Hanjalic [11] show that their CNN framework performs
authors show this using two different data sets, one of which at the leading edge in the random delays (RD) data set. This
is an implementation that does not have any kind of gives more credence to the findings in [12]. In particular,
protection, and the other utilises a countermeasure for compared to DPAv4a data set considered a fundamental
masking. In addition, the results show that the CNN information set, an ideal network of its needs fewer attack
database, sometimes referred to as the side-channel analysis traces to recover the key of the RD data set [13]. In [14], the
data set, is in [6]. This database has been used in the researchers experimented with a wide variety of topologies
investigations of various researchers and was first presented and sets of pieces of information. The results of such
by the authors. After introducing the data set, they experiments showed that no single design succeeds with all
investigate the effect of hyperparameters to find the CNN data sets. Hence, this remains very necessary in selecting a
and MLP architectures [6] that will be the most effective. structure appropriate for issues present at that time. In
According to the findings of their study, Masure, Canovas, addition, the authors provide evidence that including
and Prouff [7] reveal the increase in the volume of the CNN distortion within the primary substrates of the networks
kernel, resulting in better behaviour if the network is helps performance by reducing the amount of overfitting
confronted with misaligned traces. However, they do not that occurs. When working with smaller data sets, this is
explain why increasing the kernel makes the attack more suggested by using increased noise levels, whereas working
effective, which is strange. This discovery, in our view, is with more extensive data sets requires a lower noise level to
fascinating and certainly deserves more discussion. get the best results.
Since both studies reveal that CNN performs successfully These studies suggest that CNNs include two essential
in various scenarios, further study was conducted on CNN’s qualities that make them suitable for side-channel analysis.
behaviour. Picek, Samiotis, Kim, Heuser, Bhasin, and Legay To begin with, they can determine the most critical features
[9] compared CNNs’ performance against machine learning independently and without any guidance. As a consequence
methods such as Random Forest, XGBoost, and Naive of this, prior processing on the traces was not needed to gain
Bayes. Their main objective is to investigate the greater behaviour. Compared to more conventional
circumstances under which CNNs perform better than the approaches, we consider this to be a considerable advantage.
other techniques described. According to the findings of According to the authors in [15], pre-processing is prone to
their study, CNNs can only improve performance in the errors, and poor selection of Points of Interest (PoI) leads to
aggregate. According to the authors, CNNs are most lower performance. Because CNNs are spatially invariant,
effective when the traces are not pre-processed, when noise they can identify characteristics regardless of their position
levels are lowered, and when information dimensions are within feature vectors. This is the second benefit of using
higher (i.e., their many features with many traces). On the CNNs. As a result of this quality, CNNs can perform at the
contrary, machine learning (ML) schemes could achieve cutting edge of the field when it comes to data sets
performance that is almost on par with that of CNNs. The originating from implementations that use a concealed
discovery that ML methods need noticeably fewer countermeasure. The methodologies used in the study that
processing resources than CNNs is a significant result. As a we have discussed up to this point are standard practises in
result, the researchers have severe reservations about the deep learning. According to further research, it has been
usefulness of CNNs. recommended that new innovative tactics be used, designed
After further research, CNNs were shown to have the explicitly for the side-channel attack, aiming to take
potential to surpass existing specific data sets with state-of- advantage of a few qualities.
the-art solutions. An implementation with a covert The researchers in [16] suggested a completely new CNN
countermeasure was the source of the measurements for framework that uses more domain information obtained
each data set. The authors in [10] performed tests to show through a side-channel attack. Data provided for creating
that CNNs can synchronise non-aligned traces by neural networks can be plaintext or ciphertext, and this

77
ELEKTRONIKA IR ELEKTROTECHNIKA, ISSN 1392-1215, VOL. 29, NO. 4, 2023

distinction is determined by the leaky model. The why this layer can offer the results it does. These questions
classification block of a CNN architecture is the component will be addressed in Section IV, at which point we will
that is given the domain information to use as a new feature investigate the spread layer in great detail and solve some of
vector. In the work, the authors compare several its faults that it has.
architectural concepts offered by various works of literature According to Jin, Kim, Kim, and Hong [20], the deep
with and without the architecture that they have provided. CNN framework works admirably for SCAs. Despite this,
They show how a design that uses domain knowledge can some issues remain concerning the training process for deep
improve performance for protected information and not neural networks. The primary issue is that training deep
protected information. However, if profile traces are neural networks can be complicated since gradients can
generated using a fixed key, this method is not proper. either vanish or grow as the training progresses. In the
Zaid, Bossuet, Dassance, Habrard, and Venelli [17] sections concerned, we will explain the latest advancements
strongly focus on the need for fine-tuning architecture and with the initiation of deep neural networks to solve the
hyper-parameters; models do not operate correctly without issues above.
an appropriate configuration. They point out that we cannot Much work has been done on parameter initialisation
realise the full potential of architecture if we do not topics; variables would often be picked randomly from a
understand the influence of a hyperparameter and explain Gaussian distribution. This was significantly reworked by
why this is the case. The authors provide three visualisation Glorot and Bengio [21], who also introduced the latest
methods to solve this problem: weight, gradient, and initialisation technique called “Xavier’s initialisation”
heatmap. These methods are utilised to improve the simultaneously. This method considers the number of inputs
readability and interpretability of each hyperparameter. and outputs associated with the parameter while
These approaches make it simpler to set the simultaneously deriving the parameter values from a
hyperparameters by allowing an opponent to determine the Gaussian distribution. This method is currently considered
influence of each one individually, which in turn makes it standard practise and is used to initialise the parameters of
easier to tune the hyperparameters. Using these three several extensive deep-learning libraries. When academics
visualisation approaches, they also propose implementation began looking into the architectures of deep neural
options for protected and unprotected environments. In networks, they found that several works ran into problems
particular, for data sets that include a concealed with the convergence of their designs. Convergence
countermeasure, it is recommended that the CNN kernel problems were experienced, e.g., by the well-known visual
measure be modified to equal 50 % of the highest delay of geometry group (VGG) architecture, which is trained in four
randomised delay. It remains one of the guidelines provided phases. The network is then enlarged with additional layers,
by their method. and training is performed at each stage to ensure that it
In contrast to the content provided in articles produced by converges correctly [22].
deep learning communities, the increase in substrate is A novel strategy for deep CNN initialisation is presented
recommended as opposed to the number of neurones by F.-X. Standaert in [23]. According to his research
contained within each layer [18]. The authors improved the findings, even though the Xavier initialisation was designed
state-of-the-art work on entire information sets by to work with linear activations, it is not appropriate for use
developing architectures and conducting tests with all with the rectified linear unit (ReLU). In addition, they argue
publicly available data sets using the methodologies that deeper networks have a more difficult time reaching a
described, which led to an increase in overall performance. point of convergence. A solution to such issues, provided by
On the other hand, the choice of hyperparameters is them, is the initialisation of “He”, which was developed
occasionally made without enough rationale, even though specifically for CNNs that use ReLU and, compared to other
their method offers cutting-edge performance for all initialisation methods, results in an improvement in the
publicly available data sets. For example, the authors do not degree to which deep neural networks converge. layer
explain how certain learning rates were determined for a few sequential unit variance (LSUV) initialisation is an
specific data sets or why they were used in the first place. alternative method of initialisation that is proposed in [24].
They also do not explain why they were used at all. Rather than being developed explicitly for designs that use
Pfeifer and Haddad [19] propose using a deep learning ReLU as an activation function, this approach exhibits an
layer known as the spread layer. This layer would be the additional generic character and is appropriate for various
first to be explicitly designed for side-channel attacks. As architectural kinds. They provide evidence of the viability of
demonstrated in their study, Haddad and Pfeifer depicted their approach by conducting experiments to validate their
with this layer that some substrate was needed for better claims. Both sets of research have shown how important it is
outcomes. Furthermore, the profiling phase needs fewer to have accurate initialisation of the network parameters for
traces, which speeds up the learning procedure. Such deep neural networks to converge. The published research
findings were intriguing about side-channel analysis has only recently begun to do investigations in actual SCA
communities because they suggest motivation to create circumstances where the attack traces and the profiling
substrates specially made to take advantage of the side- traces were obtained from identical gadgets. This was not
channel properties of traces. This is because these findings unusual for people to use the same key for both the attack
indicate a motivation to create layers specially made to take path and the profiling track. The results of these studies can,
advantage of the side-channel properties of traces. On the as a direct consequence of this fact, provide an inaccurate
other hand, the authors do not provide much information image of the effectiveness of several therapies, including
about how to establish the hyperparameters of the layer or template attack (TA), ML, and DL. Consequently, the SCA

78
ELEKTRONIKA IR ELEKTROTECHNIKA, ISSN 1392-1215, VOL. 29, NO. 4, 2023

community has begun to construct a more realistic sliding filters down the edge of the layer below them,
environment in which various gadgets are used to acquire convolutional layers can perform convolution on incoming
attack and profiling traces [25]. A comparison of various data. The fact that CNN minimises the loss function using
research methods on SCA is shown in Table I. filter weights allows it to learn invariant features during
translation. Power cords will not hinder the mobility of SCA
TABLE I. COMPARISON OF VARIOUS RESEARCH METHODS ON filters. The max-pooling method with kernel size [12] and
SCA.
Attacked Physical stride [12] is used for the first two layers, while the average-
Research works Limitations
Network Measurement pooling technique is used for the third and final layers.
Minimal (black Maximum and average pooling are non-linear layers that
Wu et al. [13], 2023 MLP, CNN EM
box)
Maji, Banerjee,
can bring down data dimensions. When comparing average
Fuller, and CNN, Methodology pooling with maximum pooling, it is important to note that
SPA
Chandrakasan [14], BNN specific to µC the former determines an average, while the latter
2022
determines a maximum. All convolutional layers in our
Shimada, Kuroda,
Fukuda, Yoshida, model use ReLU Piecewise linear means that when the input
MLP EM Intention paper
and Fujino [15], is positive, the output is also positive. Softmax is used to do
2022 the categorisation in the output layer. Table II shows the
Sako, Kuroda,
Fukuda, Yoshida, Systolic
Only the simulation parameters of the proposed CNN model and Fig.
CPA systolic array 1 details the structural makeup of our convolutional neural
and Fujino [22], array
is implemented
2022 network.
Shi, Sun, Wang, Specific to the
BNN Power
and Hu [24], 2020 line buffer TABLE II. SIMULATION PARAMETERS OF THE PROPOSED CNN
Using non- MODEL.
Yang, Xiang,
fine-tuned Weight
Huang, Fu, and CNN Power Layer Stride Activation
models once Shape
Yang [25], 2023
trained Convolutional (1) 1 × 3 × 16 - -
Batch Normalisation (1) - - ReLU
III. PROPOSED METHODOLOGY Max-Pooling (1) - [12] -
One of the most popular uses of CNN is image Convolutional (2) 1 × 3 × 24 - -
Batch Normalisation (2) - - ReLU
recognition [16]. They are effective in dividing time series Max-Pooling (2) - [12] -
[17]. CNNs are great models for the extraction of features Convolutional (3) 1 × 3 × 24 - -
and categorisation of complex data because they are Batch Normalisation (3) - - ReLU
invariant to translation. As a result, our attack into side- Average-pooling (1) - [12] -
FC-output - - Softmax
channel data attacks benefits from the use of CNNs. CNN
has the drawback of being trained for each major theory
separately. Our best guesses for the 8-bit key will require
256 trials of practise.
CNNs use layers of computation known as convolutional
and pooling layers. The batch normalisation layers will
complete these processes today. The batch normalisation of
Ioffe and Szegedy [18] reduces the internal covariate shift of
neural networks. The authors claim that this leads to more
efficient learning. When we put CNN through its paces, we
use a series of aligned power wires as our test subject. One
power trace sample would include too much data to be used
as CNN input features. Therefore, we use the correlation
coefficient in the first phase of power-trace processing. Fig. 1. Proposed model.
There are typically three parts to a CNN data set: the
training set, which is used to teach the network, the IV. RESULTS
validation set, which is used to test the accuracy of the The CW1173 ChipWhisperer board was subjected to our
network on unseen data, and the test set, which is used to testing [19]. This SCA platform has a target board equipped
assess the quality of the final prediction or classification. with an 8-bit Atmel AVR Xmega128 microcontroller
Details of the network architecture will be covered in a capable of executing AES-128. The ChipWhisperer’s
subsequent section. internal analog to digital converter (ADC) can capture the
− Experiment and Equipment Details continuous wave (CW) Lite signal. Because this system is
To evaluate our neural network models, we employed set up, we can deliver the software, the plaintext, and the
MATLAB. Three convolutional layers and three pooling key to the Xmega board while recording the traces on a
layers precede the fully connected layer and the laptop. For purposes of conducting tests, we have 5000
classification layer in the network, respectively. The first power traces available. 10,000 AES Round 1 and Round 2
convolution layer has 16 filters, each of which is [11] by samples are included in each power trace. An attack that is
[12] in size, and has an output layer of the same size as the not profiled will keep the same key throughout and will
input. The subsequent two convolutional layers are the same choose 5000 plaintexts at random. The only round of the
size as the first but have 24 and 32 filters, respectively. By AES we focus on attacking is the first.

79
ELEKTRONIKA IR ELEKTROTECHNIKA, ISSN 1392-1215, VOL. 29, NO. 4, 2023

The next step is to train a CNN by putting 80 % of the

data set through its paces during training and just 20 %
through its paces during testing. Selecting the features that
have the highest correlation values for I and k is an effective
way to assist the CNN model in locating HW labels. The
correct value for the key parameter, k, will accurately
predict the sequence of intermediate HW values, ultimately
leading to the appropriate CNN training labels. If the CNN
can acquire the appropriate properties from the correct key,
it should train successfully and increase training metrics,
(b)
such as loss and accuracy, with time. The intermediate
Fig. 2. Clean power traces validation accuracy and confusion matrix: (a)
values of all other key candidates will be inaccurate and lead Validation accuracies; (b) Incorrect key guesses.
to unsuccessful training. The attacker can discover the
proper key value if they choose the key with the highest
training metrics. The results of the experiment will now be
presented. Figure 2 shows clean power traces validation
accuracy and confusion matrix. Figure 2(a) and Fig. 2(b)
show validation accuracies and incorrect key guesses,
respectively.
The accuracy of the training network’s validation can
judge the success of CNN’s non-profiled attacks. Loss and
accuracy over epochs are the two primary metrics used to
measure CNN training, as previously mentioned. This study
focused on precision in finding the appropriate sub-byte key.
Validation accuracy is shown in Fig. 2(a), which was Fig. 3. Validation accuracy when Gaussian noise is added (0.025).
derived from an attack on our data set using 30 epochs for − Comparative Analysis
each estimate. The number of training epochs is shown The potential number of partial keys that can be derived
along the horizontal axis, and the validation accuracy of the from a sample of 30,000 traces is shown graphically in Fig.
training network is shown along the vertical axis. Because 4.
we used the correct sub-byte key value, the validation
accuracy of our training was much greater than that of the
other companies. Even after ten epochs, the attack manages
to discover the hidden key. It is not difficult to determine the
suitable key guess if we use the highest accuracy value for
each sub-byte key guess. More intriguingly, we can utilise
the confusion matrix to differentiate between the
distributions of the three HW labels. The incorrect
candidates will fall under the HW4 label, while the correct
candidate will be marked independently, as shown in Fig.
2(b). After discovering these data, we set out to determine Fig. 4. Comparison of the results of the SCA attack.
how the power trace noise affected the accuracy of the
recommended CNN. The initial power traces consist of three The CPA was unsuccessful, except for the unmasked
different layers of Gaussian noise. Then came three sets of bytes 2 and 5. Regularisation caused our method DL-SCAs
data. The results of the training are shown in Fig. 3. Even (CNN) to attack all 16 bytes in 25,000 traces, whereas they
though there is very little noise, as shown in Fig. 3, our only attacked 15 bytes in 30,000 without regularisation (DL-
CNN can still identify the correct key after ten epochs have SCA). Compared to first-order CPAs, non-profiled DL-
passed. The correct key is concealed in the last key byte SCAs performed better in attacks. The disclosed masking
generated after each epoch with more than 3000 power SCA countermeasure is vulnerable to attack if a high-order
traces. The greater the noise variance, the less accurate the CPA can be utilised to reliably predict the internal mask
validation becomes. value. The risk of side-channel attacks is significantly
increased since an attacker unfamiliar with the underlying
processing of the countermeasure can still get all partial
keys using non-profiled DL-SCAs. These results show that
our method can recover more bytes successfully from SCA
compared to other methods in CPA [14] and DL-SCA
without regularisation [15].

V. CONCLUSIONS
According to the findings of this research, CNN creates
difficulties for SCA when aligned power traces include a
(a) large number of samples. After preparing the CNN training

80
ELEKTRONIKA IR ELEKTROTECHNIKA, ISSN 1392-1215, VOL. 29, NO. 4, 2023

data, we evaluated the power traces with the original data 50th International Symposium on Multiple-Valued Logic (ISMVL),
2020, pp. 58–63. DOI: 10.1109/ISMVL49045.2020.00-29.
and those with Gaussian noise. Our non-profiled SCA data [11] J. Kim, S. Picek, A. Heuser, S. Bhasin, and A. Hanjalic, “Make some
preparation method is based on CNN, which allows for noise. Unleashing the power of convolutional neural networks for
extracting key properties. Our method requires fewer power profiled side-channel analysis”, IACR Transactions on Cryptographic
Hardware Embedded Systems, vol. 2019, no. 3, pp. 148–179, 2019.
traces for attacks because the power traces are organised DOI: 10.13154/tches.v2019.i3.148-179.
into three distinct groups. These findings indicate that our [12] Y.-S. Won, D.-G. Han, D. Jap, S. Bhasin, and J.-Y. Park, “Non-
technique can effectively recover an increased number of profiled side-channel attack based on deep learning using picture
bytes from SCA compared to previous methods used in CPA trace”, IEEE Access, vol. 9, pp. 22480–22492, 2021. DOI:
10.1109/ACCESS.2021.3055833.
and DL-SCA without regularisation. The consistent findings [13] L. Wu et al., “Label correlation in deep learning-based side-channel
that our CNN architecture produces for attacks that are not analysis”, IEEE Transactions on Information Forensics and Security,
profiled highlight the considerable challenge posed by vol. 18, pp. 3849–3861, 2023. DOI: 10.1109/TIFS.2023.3287728.
[14] S. Maji, U. Banerjee, S. H. Fuller, and A. P. Chandrakasan, “A
Gaussian noise in power traces. To improve the performance threshold-implementation-based neural-network accelerator securing
of neural networks when faced with non-profiled attacks, we model parameters and inputs against power side-channel attacks”, in
will investigate several pre-processing strategies that aim to Proc. of 2022 IEEE International Solid-State Circuits Conference
(ISSCC), 2022, pp. 518–520. DOI:
decrease power trace noise. 10.1109/ISSCC42614.2022.9731598.
[15] S. Shimada, K. Kuroda, Y. Fukuda, K. Yoshida, and T. Fujino, “Deep
CONFLICTS OF INTEREST learning-based side-channel attacks against software-implemented
RSA using binary exponentiation with dummy multiplication”, IEICE
The authors declare that they have no conflicts of interest. Technical Report, vol. 122, no. 11, pp. 13–18, 2022.
[16] B. Sönmez, A. A. Sarıkaya, and Ş. Bahtiyar, “Machine learning based
side channel selection for time-driven cache attacks on AES”, in Proc.
REFERENCES of 2019 4th International Conference on Computer Science and
[1] G. Yang, H. Li, J. Ming, and Y. Zhou, “Convolutional neural network Engineering (UBMK), 2019, pp. 1–5. DOI:
based side-channel attacks in time-frequency representations”, in 10.1109/UBMK.2019.8907211.
Smart Card Research and Advanced Applications. CARDIS 2018. [17] G. Zaid, L. Bossuet, F. Dassance, A. Habrard, and A. Venelli,
Lecture Notes in Computer Science(), vol. 11389. Springer, Cham, “Ranking loss: Maximizing the success rate in deep learning side-
2019, pp. 1–17. DOI: 10.1007/978-3-030-15462-2_1. channel analysis”, IACR Transactions on Cryptographic Hardware
[2] B. Timon, “Non-profiled deep learning-based side-channel attacks Embedded Systems, vol. 2021, no. 1, pp. 25–55, 2021. DOI:
with sensitivity analysis”, IACR Transactions on Cryptographic 10.46586/tches.v2021.i1.25-55.
Hardware Embedded Systems, vol. 2019, no. 2, pp. 107–131, 2019. [18] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep
DOI: 10.13154/tches.v2019.i2.107-131. network training by reducing internal covariate shift”, in Proc. of the
[3] H. Maghrebi, O. Rioul, S. Guilley, and J.-L. Danger, “Comparison 32nd International Conference on Machine Learning, 2015, pp. 1–9.
between side-channel analysis distinguishers”, in Information and [19] C. Pfeifer and P. Haddad, “Spread: A new layer for profiled deep-
Communications Security. ICICS 2012. Lecture Notes in Computer learning side-channel attacks”, IACR Cryptology ePrint Archive, vol.
Science, vol. 7618. Springer, Berlin, Heidelberg, 2012, pp. 331–340. 2018, p. 880, 2018.
DOI: 10.1007/978-3-642-34129-8_30. [20] S. Jin, S. Kim, H. Kim, and S. Hong, “Recent advances in deep
[4] H. Maghrebi, “Deep learning based side channel attacks in practice”, learning‐based side‐channel analysis”, ETRI Journal, vol. 42, no. 2,
Cryptology ePrint Archive, vol. 2019, p. 578, 2019. pp. 292–304, 2020. DOI: 10.4218/etrij.2019-0163.
[5] B. Timon, “Non-profiled deep learning-based side-channel attacks”, [21] X. Glorot and Y. Bengio, “Understanding the difficulty of training
IACR Cryptology ePrint Archive, vol. 2018, p. 196, 2019. DOI: deep feedforward neural networks”, in Proc. of the Thirteenth
10.46586/tches.v2019.i2.107-131. International Conference on Artificial Intelligence and Statistics,
[6] D. Das, A. Golder, J. Danial, S. Ghosh, A. Raychowdhury, and S. 2010, pp. 249–256.
Sen, “X-DeepSCA: Cross-device deep learning side channel attack”, [22] M. Sako, K. Kuroda, Y. Fukuda, K. Yoshida, and T. Fujino, “Deep
in Proc. of 2019 56th ACM/IEEE Design Automation Conference learning side-channel attacks against hardware-implemented
(DAC), 2019, pp. 1–6. DOI: 10.1145/3316781.3317934. lightweight cipher Midori 64”, IEICE Technical Report, vol. 122, no.
[7] L. Masure, C. Canovas, and E. Prouff, “A comprehensive study of 11, pp. 7–12, 2022.
deep learning for side-channel analysis”, IACR Transactions on [23] F.-X. Standaert, “Introduction to side-channel attacks”, in Secure
Cryptographic Hardware and Embedded Systems, vol. 2020, pp. 348– Integrated Circuits and Systems. Integrated Circuits and Systems.
375, 2019. DOI: 10.13154/tches.v2020.i1.348-375. Springer, Boston, MA, 2010, pp. 27–42. DOI: 10.1007/978-0-387-
[8] J. J. Quisquater, “A new tool for non-intrusive analysis of smart cards 71829-3_2.
based on electro-magnetic emissions. The SEMA and DEMA [24] M. Wei, D. Shi, S. Sun, P. Wang, and L. Hu, “Convolutional neural
methods”, Eurocrypt Rump Session, 2000. network based side-channel attacks with customized filters”, in
[9] S. Picek, I. P. Samiotis, J. Kim, A. Heuser, S. Bhasin, and A. Legay, Information and Communications Security. ICICS 2019. Lecture
“On the performance of convolutional neural networks for side- Notes in Computer Science(), vol 11999. Springer, Cham, 2020, pp.
channel analysis”, in Security, Privacy, and Applied Cryptography 799–813. DOI: 10.1007/978-3-030-41579-2_46.
Engineering. SPACE 2018. Lecture Notes in Computer Science(), vol. [25] W. Yang, X. Xiang, C. Huang, A. Fu, and Y. Yang, “MCA-based
11348. Springer, Cham, 2018, pp. 157–176. DOI: 10.1007/978-3-030- multi-channel fusion attacks against cryptographic implementations”,
05072-6_10. IEEE Journal on Emerging and Selected Topics in Circuits and
[10] H. Wang, S. Forsmark, M. Brisfors, and E. Dubrova, “Multi-source Systems, vol. 13, no. 2, pp. 476–488, 2023. DOI:
training deep-learning side-channel attacks”, in Proc. of 2020 IEEE 10.1109/JETCAS.2023.3252085.
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution 4.0
(CC BY 4.0) license (http://creativecommons.org/licenses/by/4.0/).