0% found this document useful (0 votes)
64 views

Efficient Non-Profiled Side Channel Attack Using Multi-Output Classification Neural Network

This document proposes a new non-profiled side channel attack technique using multi-output neural networks. Specifically, it introduces a multi-output multi-layer perceptron and a multi-output convolutional neural network to reveal secret keys from various side channel countermeasures faster than previous techniques. The models are trained with multiple outputs corresponding to each key hypothesis, using multiple losses rather than a single loss. This allows the models to predict all key hypotheses in one training, providing accuracy metrics for each without extra calculation. Experimental results show the technique performs attacks up to 9 and 30 times faster than previous methods against masking and desynchronization countermeasures respectively.

Uploaded by

Phuc Hoang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views

Efficient Non-Profiled Side Channel Attack Using Multi-Output Classification Neural Network

This document proposes a new non-profiled side channel attack technique using multi-output neural networks. Specifically, it introduces a multi-output multi-layer perceptron and a multi-output convolutional neural network to reveal secret keys from various side channel countermeasures faster than previous techniques. The models are trained with multiple outputs corresponding to each key hypothesis, using multiple losses rather than a single loss. This allows the models to predict all key hypotheses in one training, providing accuracy metrics for each without extra calculation. Experimental results show the technique performs attacks up to 9 and 30 times faster than previous methods against masking and desynchronization countermeasures respectively.

Uploaded by

Phuc Hoang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

This article has been accepted for publication in IEEE Embedded Systems Letters.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/LES.2022.3213443

IEEE EMBEDDED SYSTEMS LETTERS 1

Efficient Non-profiled Side Channel Attack Using


Multi-output Classification Neural Network
Van-Phuc Hoang, Member, IEEE, Ngoc-Tuan Do, and Van Sang Doan

Abstract—Differential Deep Learning Analysis (DDLA) is entropy loss function, their models are optimized by minimiz-
the first deep learning based non-profiled side-channel attack ing a scalar loss value after training processes. Therefore, the
(SCA) on embedded systems. However, DDLA requires many accuracy metrics of key guesses are calculated by a custom
training processes to distinguish the correct key. In this letter,
we introduce a non-profiled SCA technique using multi-output function on each epoch, which separates the output and then
classification to mitigate the aforementioned issue. Specifically, matches the hypothesis values. The results of this function
a multi-output multi-layer perceptron and a multi-output con- are then used to determine the correct key. Despite being a
volutional neural network are introduced against various SCA very fast attack technique, the parallel architecture requires a
protected schemes, such as masking, noise generation, and trace high memory usage. To mitigate the disadvantage of parallel
de-synchronization countermeasures. The experimental results
on different power side channel datasets have clarified that architecture, the authors have introduced a shared-layer-based
our model performs the attack up to 9 and 30 times faster model that is reconstructed by the same DDLA model, except
than DDLA in the case of masking and de-synchronization for the output layer. To the best of our knowledge, this is the
countermeasures, respectively. In addition, regarding combined first model that can predict 256 keys hypotheses in only one
masking and noise generation countermeasure, our proposed training process.
model achieves a higher success rate of at least 20% in the cases
of the standard deviation equal to 1.0 and 1.5. An alternative and often more effective approach in the
DL domain is to develop a single neural network model that
Index Terms—Side channel attacks, embedded systems, deep
can learn multiple related tasks (i.e., outputs) at the same
learning, multi-output, multi-loss.
time, called multiple-output learning (MOL) [6]. From the
point of view of the SCA domain, MOL is a promising
I. I NTRODUCTION technique that could increase the performance of the SCA
Side channel attacks (SCA) have become a serious threat to evaluation process. In this letter, we propose a novel SCA
the cryptographic implementations on embedded systems. It attack based on multi-output classification, which can predict
has raised the awareness of the security research community 256 values of the key hypothesis in a single training without
to seek new techniques, which can be used to detect vul- any reference device. Specifically, a multi-output multi-layer
nerabilities [1] or counteract the SCA attacks [2]. However, perceptron (MLPMO ) and a multi-output convolutional neural
researching new SCA attacks is critical to point out the network (CNNMO ) are introduced. In which MLPMO is used
potential threats. In this letter, we introduce a new SCA attack for breaking the boolean masking [7] and reducing the effect
method using multi-output neural networks, which can reveal of noise-generation countermeasures [8], whereas CNNMO is
the secret key quickly in a non-profiled context. exploited to reveal the secret key from de-synchronization
countermeasure [9]. Our approach exploits multi-loss instead
Our work is motivated by the previous work that was
of using only binary cross entropy loss as in [4], [5]. Accord-
presented by Timon et al. [3]. Accordingly, based on deep
ingly, a separate loss corresponding to each output is calculated
learning (DL) techniques, their proposal called DDLA can
in the training process. Therefore, the training metrics (loss
reveal the secret key without any reference devices. However,
and accuracy) of each key hypothesis can be achieved easily
DDLA requires the attacker repeatedly perform the training
without any extra calculation. As a result, our proposal can
process to observe the training metrics, which are then used
perform attacks faster than a parallel network.
to determine the correct subkey byte. Recently, Kwon et
al. [4] have investigated the aforementioned drawbacks of the
DDLA technique and mitigated them by using a parallel neural II. P ROPOSED MULTI - OUTPUT DEEP LEARNING MODEL
network architecture. Kwon’s work can be considered as a FOR NON - PROFILED SCA
multi-label SCA approach as in [5]. Based on the binary cross
A. Data preparation
Manuscript received August 06, 2022; accepted September 30, 2022. Date
of publication October 07, 2022; date of current version October 07, 2022. To apply multi-output model in SCA domain, the data input
This research is funded by Vietnam National Foundation for Science and (power traces) should be labeled by the values corresponding
Technology Development (NAFOSTED) under grant number 102.02-2020.14.
This manuscript was recommended for publication by Ozgur Sinanoglu. to the model outputs. We aim to predict all key hypotheses
(Corresponding author: Van-Phuc Hoang.) in one training process. Therefore, the number of network
The authors are with Institute of System Integration, Le Quy Don Technical outputs is 256, which corresponds to 256 key guesses (0 to
University, Ha Noi, Vietnam, and also with Faculty of Communications and
Radar, Vietnam Naval Academy, Nha Trang, Vietnam. 255). To benchmark our proposed architecture, we use the
DOI:10.1109/LES.2022.0176 same LSB labeling technique as in previous works [3], [4],

© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: NASATI. Downloaded on November 09,2022 at 07:23:17 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Embedded Systems Letters. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/LES.2022.3213443

IEEE EMBEDDED SYSTEMS LETTERS 2

RELU RELU Softmax


Softmax
Block1 Block2 Flatten
Output 1
trace1 ( x x x ) , labelLSB ( l l l )


(m samples/trace)
1 1 1 1 1 1 Output 1
Dataset1

n Power traces
1 2 m 1 2 256 conv1d_1, conv1d_2,
Data.csv norm, norm,

trace1 ( x1n x2n xmn ) , labelLSB ( l1nl2n l256 )


n
pool_1, pool_2,
relu relu
Output k Output k
Dataset16
Input Shared Non-shared Output Input Shared Output
layer layer layer
a) b) layer layer layer layer
c)

Fig. 1. Structure of reconstructed dataset and proposed multi-output models. a) Multi-output dataset; b) MLPMO model; c) CNNMO model.

TABLE I B. MLPMO
S TRUCTURE OF RECONSTRUCTED DATASETS .
To solve the problem of DDLA, a feed-forward MO model
Dataset
ASCAD CW
Label based on the MLP architecture is proposed. As depicted in
Traces Samples Traces Samples Fig. 1.b, the overall architecture of our proposed network
Dataset1 20000 700 - - LSB
Dataset2 20000 700 - - LSB(vector) consists of an input layer, a shared layer followed by k
Dataset3 50000 700 - - LSB(vector) branches corresponding to k key hypotheses (k = 256).
Dataset4 - - 10000 480 LSB Each branch contains the same MLP architecture as MLPDDLA
Dataset5 - - 10000 480 LSB(vector)
(except the input layer) [3]. According to [11], we keep the
number of layers and the number of nodes in each layer as the
original MLPDDLA model (hidden layer: 20 × 10-Relu, output
TABLE II
D EEP LEARNING HYPER - PARAMETERS OF PROPOSED MODELS .
layer: 2-Softmax). The input layer of the proposed model has
the same size as the number of samples in the power trace.
Model MLP-MO CNN-MO The shared layer plays an important role in the proposed ar-
Input size 700 480
chitecture. It can be utilized the max shared as in MLPmax-shared
conv1d 1 ( 4 32 × 1 filters),
0/50/ [4]. However, using the same architecture of MLPDDLA except
pool 1 (2 × 1), norm, relu
Shared layer 200/400
conv1d 2 (4 16 × 1 filters), output layer, their architecture only decreases the execution
-Relu
pool 2 (4 × 1), norm, relu time without enhancing the success rate, especially in the
Branch 256 256
Hidden layer/branch 20x10-Relu 0 case of noisy data. In contrast, our proposal aims to decrease
Output layer/branch 2-Softmax 2-Softmax the computation time as well as enhance the success rate.
Batch 1000 50 Therefore, we do not make the comparison to MLPmax-shared
Initializing He uniform
in this work. In the case of the model without using the
shared layer, the first hidden layer of each branch is fully
connected to the input layer. Unlike MLPDDLA , the network
which is calculated by formula (1). As a result, the multi- parameters of our model are updated for all key hypotheses
output datasets used in this letter are constructed as depicted in each iteration instead of updating for only one key guess
in Fig. 1.a. as MLPDDLA architecture.
Since the same structure is applied for all branches, the
weights used for each branch are equivalent. Consequently, the
lji = LSB(Sbox(pi ⊕ kj )) (1) loss function of the whole network is calculated as follows:
256
X
Ltotal = γk ∗ L[k] (θ) (2)
where pi={1,n} denotes the ith plaintext encrypted in AES- k=1
128 algorithm, n is the number of plaintexts. kj={0,255} is
the key guess number j. where θ represents the set of all parameters of the model,
γk is used as weighted factor of branch number k th and set
as 1 for all branches (weights of each branch is equivalent),
In order to evaluate the efficiency of the proposed models,
L[k] denotes the loss results calculated for the k th branch. It
we consider two SCA data, same as in [3], including ASCAD
is noted that the same loss function is used for all branches,
data [7] and the data captured from ChipWhisperer-lite (CW)
which can be generally defined as follows:
board [10]. Regarding ASCAD data, the fixed key dataset is
2
selected to perform attacks on the first-order masking coun- 1 X
termeasure. The leakage model of ASCAD data is the output L[k] (θ) = − ytrue ln (z) (3)
Ns j=1
of the third Sbox with unknown mask values as described
in [7]. In the case of CW data, we select 10,000 power where ytrue and z are the ground-truth and the predicted val-
traces with the size of 480 samples/trace, which correspond ues, respectively. Ns denotes the number of training samples.
to the power consumption of the first Sbox output process. In For successful training, the deep learning algorithm needs to
addition, we simulate the de-synchronization countermeasure find the optimal value to minimize the loss function Ltotal . The
by using the same method as introduced in [4]. The structure network is trained in a series of iterations. In each iteration,
of reconstructed datasets is shown in Table I. the gradient of the loss function ∇Ltotal is computed for

© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: NASATI. Downloaded on November 09,2022 at 07:23:17 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Embedded Systems Letters. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/LES.2022.3213443

IEEE EMBEDDED SYSTEMS LETTERS 3

0 .7 5 0 .7 5 1 2 0 M L P -D D L A
0 .7 5 0 .7 5

S u c c e s s ra te (P e rc e n t)
C o rre c t k e y g u e ss C o rre c t k e y g u e ss M L P -D D L A N o n -S o S L C o rre c t k e y g u e ss C o rre c t k e y g u e ss
1 0 0 0

A tta c k tim e (S e c o n d )

0
0
0
N o n -S o S L

10
10
10
0 .7 0 1 0 0 In c o rre c t k e y g u e ss

21

96
In c o rre c t k e y g u e ss In c o rre c t k e y g u e ss S o S L -2 0 0 S o S L -2 0 0 0 .7 0 In c o rre c t k e y g u e ss 0 .7 0
0 .7 0

90
0 .6 8 1

2
2.
2
8 0 0

50

77

80
8 0 0 .6 4 8

85

7.
0 .6 5 0 .6 3 6 0 .6 6 7

A c c u ra c y
A c c u ra c y
0 .6 4 5 0 .6 5
A c c u ra c y

0 .6 5

A c c u ra c y

67
5.
0 .6 5 6 0 0

59
0 .6 0 9 6 0 0 .6 2 4
0 .6 0 0 .5 8 9 0 .6 0 9
0 .5 9 5 0 .6 0 0 .6 0

44
0 .6 0 4 0 0 4 0

36
0 .5 5

30
9. 2
7
0 .5 5 0 .5 5

1 0 .3 1
06
2 0

86 741
57
0 .5 5 2 0 0

39

6
0 .5 1 4

6 4 .3 6
0 .5 0

.9
.

12
.6

98
72
0 .4 9 5
0 0 0 .5 0 0 .5 0
0 .5 0 0 .5 1 1 .5 0 5 1 0 1 5 2 0 2 5 3 0 0 5 1 0 1 5 2 0 2 5 3 0
0 5 1 0 1 5 2 0 2 5 0 5 1 0 1 5 2 0 2 5 1 0 2 0 3 0
N u m b e r o f e p o c h s N u m b e r o f e p o c h s S ta n d a rd d e v ia tio n o f G a u s s ia n n o is e N u m b e r o f e p o c h s N u m b e r o f e p o c h s
N u m b e r o f e p o c h s b )
c ) a ) c )
a ) b )

Fig. 2. The experimental results on masking countermeasure. a, b) Accuracy Fig. 3. The experimental results on combined masking and noise generation.
of proposed model with and without shared layer, respectively; c) Comparison a) Success rate of MLPDDLA and MLPMO models on different levels of noise
of attack time. using 20,000 power traces; b) SoSL-200 on 20,000 power traces, σ = 1.5;
c) SoSL-200 on 50,000 power traces, σ = 1.5;

updating the network. In our study, the popular optimization


algorithm called Adaptive Moment Estimation (ADAM) with 0.609) at the end. It means that only the output of the correct
default setting is employed to train the proposed model. key contributes a stable update to the appropriate branch;
other outputs make the weights of others chaotic. In the case
of non-shared layer model (Non-SoSL), we carry out further
C. CNNMO experiments using Dataset2. The results show that Non-SoSL
The authors in [3] have introduced a CNN model can discriminate the correct key very early and more gap
(CNNDDLA ) to reveal the secret key from the de-synchronized between correct and incorrect ones (0.645 and 0.595 at epoch
countermeasure. Similar to MLPDDLA , their model needs to 25th ), as depicted in Fig.2.b. The reported results have clarified
be trained repeatedly to determine the correct key. To mitigate the efficiency of our proposed methods for masking protected
this disadvantage, we introduce a multi-output model based on devices.
CNN architecture (CNNMO ), which can break de-synchronized Next, we compare the efficiency of proposed networks
countermeasure in a single training process. Our CNNMO with and without any shared layer (SoSL-200 and Non-SoSL,
consists of an input layer, share layers, and a MO layer, as respectively) using Dataset2 and MLPDDLA using Dataset1.
depicted in Fig.1.c. The shared layer consists of two blocks. Fig. 2.c presents the execution time of attacks using selected
Each block includes a 1D convolutional (conv1d) layer, a models on different numbers of epochs. Overall, the execution
average pooling (pool) layer, a batch normalization (norm) time of DDLA is the highest in all cases. Interestingly, the
layer, and a rectified linear unit (relu) layer. These layers execution time of our proposed network decreases dramatically
are placed in order as follows conv1d-norm-pool-relu. In about eight times and nine times (from 595.98 to 72.36 and
the training phase, with the equivalent weights used for all 64.639 seconds), corresponding to Non-SoSL and SoSL-200
branches, the loss function of the whole network is calculated over ten epochs, respectively. Similar results can be seen in
by the formula (2), same as in MLPMO . the case of training 30 epochs. The execution time of Non-
The details of the proposed models are presented in Table II. SoSL and SoSL-200 decreased significantly about six and
It is worth noting that the MLPMO is experimented with seven times (from 772.221 to 126.312 and 109.067) compared
four variants of the shared layer, including a non-shared to MLPDDLA , respectively. Compared to MLPPL , the multi-
layer (0 node) and one shared layer of 50, 200, and 400 loss-based model in this work provides the training metric
nodes, for comparison with MLPDDLA . Regarding CNNMO , to separately on each output. It leads to lower complexity than
simplify, we choose the simplest model based on CNNDDLA MLPPL using a custom function. Indeed, MLPPL reduces
model, except for the output layer. It is worth noting that the execution time approximately 2.81 times (from 1950.9
attacker is able to apply other hyperparameters to our proposed seconds to 693.8 seconds) compared to MLPDDLA [4], whereas
architecture to enhance the success rate of SCA attacks. Non-SoSL decrease the execution time approximately 5.62
times (from 1025.2 seconds to 182.1 seconds) compared
III. E XPERIMENTAL RESULTS to MLPDDLA . These results have clarified our assumption
and demonstrated that the proposed model outperforms both
A. Masking
MLPDDLA and MLPPL in the computation time.
All experiments were performed by Keras framework on a
personal computer with Intel Core i5-9500 CPU, DDR4 24GB
B. Noise generation hiding countermeasure
memory. Our first experiment is performed on the Dataset2.
We conduct various training processes with different sizes To simulate the noise generation countermeasure, each
of the shared layer. Regarding the proposed network with a sample of power traces of ASCAD dataset is added different
shared layer, we choose the size of the shared layer equal to levels of Gaussian noise as follows:
200 and denote it as SoSL-200 model. Our choice is motivated
tnoise (i, m) = t(i, m) + σ × randn (1, m) + mean (4)
by the fact that it provides good results in terms of attack
time and accuracy. The results of SoSL-200 using accuracy where randn returns a vector of numbers drawn from the
metrics are plotted in Fig.2.a. Accordingly, an increasing trend standard normal distribution, σ and mean are the standard
of accuracy metrics of all key guesses can be seen. However, deviation and mean value (mean= 0), respectively. Conse-
only the correct key (red) achieves stable and highest accuracy quently, the datasets called DatasetX-N1, DatasetX-N2, and
in most epochs (from epoch 5th ) with a clear gap (0.636 and DatasetX-N3 (correspond to σ = 0.5, 1.0 and 1.5, respectively)

© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: NASATI. Downloaded on November 09,2022 at 07:23:17 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in IEEE Embedded Systems Letters. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/LES.2022.3213443

IEEE EMBEDDED SYSTEMS LETTERS 4

0 .0 5 C o rre c t k e y g u e ss
In c o rre c t k e y g u e ss
0 .8 5
C o rre c t k e y g u e ss 0 .7 4 C o rre c t k e y g u e ss In addition, CNNMO provides a clear distinction at very early
0 .0 4 0 .8 0 In c o rre c t k e y g u e ss 0 .7 2 In c o rre c t k e y g u e ss
0 .7 0 epoch compared to that of CNNDDLA . These results have
C o rre la tio n

0 .0 3 0 .7 5
0 .6 8

L o ss

L o ss
0 .0 2 0 .7 0 0 .6 6 clarified that the proposed model outperforms previous work
0 .6 5 0 .6 4
0 .0 1
0 .6 0 0 .6 2 in terms of attack time.
0 .0 0 0 .6 0
0 1 0 0 2 0 0 3 0 0 4 0 0 0 2 0 4 0 6 0 8 0 1 0 0 0 2 0 4 0 6 0 8 0 1 0 0
S a m p le s N u m b e r o f e p o c h s N u m b e r o f e p o c h s
a ) b ) c ) IV. C ONCLUSION
In this letter, two multi-output models called MLPMO and
Fig. 4. Attack results on de-synchronized power traces using CPA, CNNDDLA ,
and CNNMO . a) CPA; b) CNNDDLA ; c) CNNMO .
CNNMO were introduced, which could perform SCA attacks
better than the single-output approach on different protected
schemes. Specifically, MLPMO reduces the execution time
are reconstructed as the same technique as DatasetX, where up to nine times compared to the MLPDDLA in the case
X = 1, 2, 3. In this case, we consider MLPDDLA is more of masking countermeasure. In addition, Non-SoSL model
reliable than MLPmax-shared on noisy data. Therefore, only the outperforms the state-of-the-art MLPPL in term of execu-
comparison between Non-SoSL, SoSL-200, and MLPDDLA is tion time. Regarding hiding countermeasures, two common
performed. By repeating the attacks using Non-SoSL, SoSL- hiding countermeasures, such as noise generation and de-
200, and MLPDDLA 50 times, we calculate the percentage synchronization, were used to evaluate the proposed models.
of successful attacks over total attacks. The comparison of The experimental results have shown that MLPMO can reduce
the success rate between MLPDDLA , Non-SoSL, and SoSL- the effect of noise and achieves a higher success rate of at least
200 is shown in Fig. 3.a. Evidently, all models achieve good 20% compared to MLPDDLA in the case of σ = 1.0 and 1.5. The
performance (100%) with the presence of a small level of experimental results have also clarified that CNNMO can break
additive noise (σ = 0.5). However, in the case of higher noise the de-synchronization countermeasure. More interesting, our
(σ = 1.0), the number of successful attacks drops from the proposed model performs SCA attacks faster, up to 30 times
100% to 80% and 90% corresponding to MLPDDLA and SoSL compared to CNNDDLA . However, by using a fixed number of
model, respectively. Interestingly, the success rate of SoSL- epochs, the attack results are not optimized. This problem will
200 only decreases slightly from 100% to 96%. A similar trend be investigated in the future work.
can be seen at the higher level of Gaussian noise (σ = 1.5).
The success rate goes down significantly because the models R EFERENCES
provide poor discrimination, as illustrated in Fig. 3.b, in the [1] P. Chakraborty, J. Cruz, C. Posada, S. Ray, and S. Bhunia, “HASTE:
Software security analysis for timing attacks on clear hardware assump-
case of SoSL-200 (0.681 and 0.667). However, our network tion,” IEEE Embedded Systems Letters, vol. 14, no. 2, pp. 71–74, jun
still achieves better results than MLPDDLA (44% and 36% 2022.
compared to 30%). We perform further attacks using SoSL- [2] I. M. Delgado-Lozano, E. Tena-Sanchez, J. Nunez, and A. J. Acosta,
“Gate-level design methodology for side-channel resistant logic styles
200 on a lager size of dataset (Dataset3-N3). A clear gap using TFETs,” IEEE Embedded Systems Letters, vol. 14, no. 2, pp. 99–
between correct and incorrect keys (0.648 and 0.624) can be 102, jun 2022.
seen in Fig. 3.c. More interesting, the success rate is 100%. [3] B. Timon, “Non-profiled deep learning-based side-channel attacks
with sensitivity analysis,” IACR Transactions on Cryptographic
It indicates that by using the reasonable value of SoSL, the Hardware and Embedded Systems, vol. 2019, no. 2, pp. 107–131,
proposed network can mitigate the effect of the additive noise Feb. 2019. [Online]. Available: https://tches.iacr.org/index.php/TCHES/
better. In addition, the attacker can perform DDLA attacks article/view/7387
[4] D. Kwon, S. Hong, and H. Kim, “Optimizing implementations of non-
with reasonable epochs, a larger number of traces, or more profiled deep learning-based side-channel attacks,” IEEE Access, vol. 10,
hyperparameters, which will, in turn, improve the success rate. pp. 5957–5967, 2022.
[5] L. Zhang, X. Xing, J. Fan, Z. Wang, and S. Wang, “Multi-label deep
learning based side channel attack,” in 2019 Asian Hardware Oriented
C. De-synchronized traces Security and Trust Symposium (AsianHOST). IEEE, dec 2019.
[6] D. Xu, Y. Shi, I. W. Tsang, Y.-S. Ong, C. Gong, and X. Shen, “Survey
Finally, we consider other protected datasets containing on multi-output learning,” IEEE Transactions on Neural Networks and
Learning Systems, vol. 31, no. 7, pp. 2409–2429, 2020.
de-synchronization power traces. To simulate this counter- [7] E. Prouff, R. Strullu, R. Benadjila, E. Cagli, and C. Dumas, “Study of
measure, we randomly shift each power trace of Dataset4 Deep Learning Techniques for Side-Channel Analysis and Introduction
and Dataset5 in a maximum of 20 samples. Consequently, to ASCAD Database,” CoRR, pp. 1–46, 2018.
[8] N. Kamoun, L. Bossuet, and A. Ghazel, “Correlated power noise
two new datasets called Dataset4-sh20, Dataset5-sh20 are generator as a low cost DPA countermeasures to secure hardware AES
used for training CNNDDLA and CNNMO , respectively. In this cipher,” in 2009 3rd International Conference on Signals, Circuits and
experiment, we used the loss metric to reveal the correct key. Systems (SCS). IEEE, nov 2009.
[9] J.-S. Coron and I. Kizhvatov, “An efficient method for random delay
Firstly, a CPA attack is performed on Dataset4 to validate generation in embedded software,” in Cryptographic Hardware and
the efficiency of this countermeasure. As depicted in Fig. 4.a, Embedded Systems - CHES 2009, C. Clavier and K. Gaj, Eds. Berlin,
the secret key can not be revealed. In contrast, a good result Heidelberg: Springer Berlin Heidelberg, 2009, pp. 156–170.
[10] C. O’Flynn and Z. Chen, “ChipWhisperer: An open-source platform for
in detecting the correct key can be seen in Fig. 4.b and hardware embedded security research,” in Constructive Side-Channel
Fig. 4.c. These results demonstrate that CNN model can Analysis and Secure Design. Springer International Publishing, 2014,
break the de-synchronization countermeasure based on the pp. 243–260.
[11] K. Kuroda, Y. Fukuda, K. Yoshida, and T. Fujino, “Practical Aspects on
translation-invariance property. However, the attack time of Non-profiled Deep-learning Side-channel Attacks against AES Software
CNNMO is shorter by approximately 30 times compared to Implementation with Two Types of Masking Countermeasures including
CNNDDLA (703.65 seconds compared to 20792.43 seconds). RSM,” pp. 29–40, 2021.

© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: NASATI. Downloaded on November 09,2022 at 07:23:17 UTC from IEEE Xplore. Restrictions apply.

You might also like