Scare 1
Scare 1
Diplomarbeit
Ruhr-University Bochum
Hiermit versichere ich, dass ich meine Diplomarbeit eigenständig verfasst und
keine anderen als die angegebenen Quellen und Hilfsmittel benutzt, sowie Zitate
kenntlich gemacht habe.
Ort, Datum
Unterschrift
iv
Abstract
Since side-channel analysis was introduced in the mid-1990s, it has permanently
been enhanced and become a reliable method for cryptanalysts to break physical
implementations of cryptographic algorithms. Recently, these methods have become
of interest to be used for reverse engineering program code running on
microcontrollers (e.g., [QS02], [No03]), which are often used in security critical
environments
such as the financial sector.
Until now, statistical methods using a huge number of side-channel observations
are often used for this purpose. However, in some scenarios such an approach is not
feasible, for example, when the target device is only available for a short period
of
time.
Hence, in this work it is examined in how far the analysis of single observations
of
a side-channel can be utilized in order to gather information of a program. For
that,
a commercially available microcontroller is used as an exemplary target platform.
Furthermore, templates as introduced by Chari et. al in [CRR02], specially suited
to
get the most information out of single traces, are applied as a side-channel
analysis
method.
As a result, we present a power model for the PIC which provides a basis for a wire
range of applications, as for example an improved DPA on this device. Moreover,
in conjunction with templates we show that reverse engineering of secret parts of
a cryptographic algorithm or program path detection of known code is feasible. In
this thesis, the latter one is analyzed in more detail and formulated as an
algorithm
which can, for example, be used for debugging in the field, version checking, or it
can be helpful for gaining a basic design level understanding of a program.
vi
Contents
1 Introduction
2 Background
2.1 Power Consumption . . . . . . . . . . .
2.1.1 CMOS Technology . . . . . . . .
2.1.2 Measure Requirements . . . . . .
2.1.3 Noise . . . . . . . . . . . . . . . .
2.2 Power Analysis . . . . . . . . . . . . . .
2.2.1 Simple Power Analysis . . . . . .
2.2.2 Differential Power Analysis . . . .
2.2.3 Correlation Coefficient and Power
2.2.4 Template Attacks . . . . . . . . .
2.2.5 Countermeasures . . . . . . . . .
2.3 Microcontrollers . . . . . . . . . . . . . .
2.4 Related Work . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
Models
. . . . .
. . . . .
. . . . .
. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
3
3
5
6
7
7
8
10
14
18
19
22
3 Device Examination
23
3.1 Components and Interaction . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Instruction Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Instruction Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Power Consumption Properties
4.1 Clock . . . . . . . . . . . . .
4.2 Working Register . . . . . .
4.3 Fetching Process . . . . . .
4.4 Data Bus . . . . . . . . . .
4.5 Arithmetic Logical Unit . .
4.6 Opcode . . . . . . . . . . .
4.7 Instruction Register . . . . .
4.8 Noise . . . . . . . . . . . . .
4.9 Other Influences . . . . . . .
4.10 Summary and Conclusion .
5 Template Creation
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
29
30
32
34
37
40
41
43
44
45
45
51
viii
Contents
5.1
5.2
5.3
5.4
Selecting Points . . . . . . . . . . . . . . . . . . . .
Partitioning . . . . . . . . . . . . . . . . . . . . . .
Test Procedure . . . . . . . . . . . . . . . . . . . .
Tests . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4.1 Single Templates . . . . . . . . . . . . . . .
5.4.2 Single Templates with Constant Pf etch . . .
5.4.3 Partitioned Templates with Constant Pf etch
5.4.4 Optimizing Point Selection . . . . . . . . . .
5.4.5 Reduced Templates . . . . . . . . . . . . . .
5.4.6 Peak Selection . . . . . . . . . . . . . . . . .
5.4.7 Partitioned Templates . . . . . . . . . . . .
5.5 Summary and Conclusion . . . . . . . . . . . . . .
6 Path Detection
6.1 Basic Idea . . . . . .
6.2 Algorithm . . . . . .
6.2.1 Complexity .
6.2.2 Improvements
6.3 Test . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
51
54
56
58
58
59
60
61
64
65
66
68
.
.
.
.
.
71
71
73
76
76
77
81
8 Future Work
83
A Test Setup
85
A.1 Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
A.2 Test Board . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
B DVD Contents
87
C Bibliography
89
List of Figures
2.1
2.2
2.3
2.6
2.7
2.8
2.9
CMOS inverter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Measuring the power consumption . . . . . . . . . . . . . . . . . . . .
Single power trace of a DES implementation on a ATMega163 smart
card, DES rounds are clearly visible . . . . . . . . . . . . . . . . . . .
Two difference traces of a DPA on a DES implementation (ATMega163
smart card), wrong (top) and correct key hypothesis (bottom) . . . .
Correlation coefficients for a DPA on DES with the correct key hypothesis, the
peaks toward 0.8 and −0.8 indicate a high correlation .
Three plots of normal distirbutions with different values (x, σx ) . . . .
Overview of countermeasures to prevent side-channel analysis . . . .
Block diagram of a Microcontroller . . . . . . . . . . . . . . . . . . .
Processor architectures - von-Neumann (left) and Havard (right) . . .
12
15
18
20
21
3.1
3.2
3.3
3.4
24
26
27
28
4.1
2.4
2.5
4.2
4.3
4.4
4.5
4.6
4.7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
6
7
10
30
31
33
36
39
41
42
x
List of Figures
4.8 Effects caused by Hamming distance of the current and previous opcode –
averaged traces for three different Hamming distances . . . . . 44
4.9 Histogram showing the noise distribution for a fixed point in time . . 45
4.10 Summary of identified power consumption influences . . . . . . . . . 46
5.1 Two approaches for identifying points of information – the upper plot
shows average traces for two scenarios, the center plot shows the
difference of both, and the lower plot shows the variance taken from
eighty measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.2 Identifying points of information by means of randomly executed instructions –
the upper plot exemplary shows a measurement of a random instruction, the lower
plot shows the variance of one thousand
measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.3 Approximation of a binomial distribution – binomial distribution
(blue) and approximated normal distribution (green) . . . . . . . . . 55
6.1 Hypothetical paths for the test code . . . . . . . . . . . . . . . . . . . 79
A.1 Block diagram of the test setup . . . . . . . . . . . . . . . . . . . . . 85
A.2 Schematic of the used test board . . . . . . . . . . . . . . . . . . . . 86
B.1 DVD Directory Structure . . . . . . . . . . . . . . . . . . . . . . . . . 87
List of Tables
4.1
5.1
5.2
5.3
5.4
5.5
5.6
6.1
6.2
6.3
. 58
. 60
. 61
. 63
. 65
. 66
List of Tables
Nomenclature
Cov(x) . . . . . . . . covariance of X
d . . . . . . . . . . . . . . known data
E(X) . . . . . . . . . . expected value of X
h . . . . . . . . . . . . . . estimator for the power consumption of a intermedate
value
HD(v1 , v2 ) . . . . Hamming distance between v1 and v2
HW (v) . . . . . . . . Hamming weight of v
k . . . . . . . . . . . . . . key
Pbus . . . . . . . . . . . The factor of the power model weighing the influences of
the data
bus
Pconst,Qx . . . . . . . The constant fraction of the power consumption concerning
Qx
Pdata . . . . . . . . . . . The data dependent fraction of the power consumption
Pext,Qx . . . . . . . . . The additional fraction of the power consumption at Qx
caused
by unknown influences
Pf etch . . . . . . . . . . The fraction of the power consumption caused by the
fetching
process
Pinst . . . . . . . . . . . The fraction of the power consumption caused by the
opcode of
the current instruction
Popcode . . . . . . . . . The factor of the power model weighing the influences of
the current or subsequent opcode
Pprog.bus . . . . . . . . The factor of the power model weighing the influences of
the instruction bus
PQx . . . . . . . . . . . The estimated power consumption at Qx
Psw . . . . . . . . . . . . The switching noise fraction of the power consumption
xiv
List of Tables
xv
xvi
List of Tables
1 Introduction
Up to the mid-1990s, cryptographic devices were considered to be black boxes that,
given some input (plaintext), produced an output (ciphertext) by means of a secret
key stored in the device. Thus, attacks were based on known plaintexts, ciphertexts
or plaintext/ciphertext pairs. No further information appeared to be available.
Today, it is known that this is not true. In physical implementations, there are
always additional outputs that leak information, the so-called side-channel
information, that can be passively observed and then exploited by cryptanalysts to
break a
cryptographic system, as long as an attacker has physical access to the computing
device. This is referred to as side-channel analysis or side-channel attack.
Execution
time, power consumption or electromagnetic radiation are typical examples of the
side-channels used.
The first attack of this kind was presented to the scientific world by Paul C.
Kocher
in [Ko96]. In this paper a method is described which uses timing information in
order
to reveal the secret key of an RSA implementation. Since this groundbreaking
publication, many new analysis techniques have been presented utilizing visual
analysis,
statistics, or destined mathematical models. Representatives for these techniques
are for example Simple Power Analysis, Differential Power Analysis, or Template
Attacks [KJJ98], [KJJ99], [CRR02].
Above all, microcontrollers, often in the form of smart cards, became of special
interest for applying these analysis methods, since they are often used as a
platform
for cryptographic primitives in security critical environments like, for example,
bank
cards, pay-tv, health cards or machine readable travel documents (MRTD).
Recently, these methods have additionally become of interest to be used for reverse
engineering purposes (e.g. [QS02], [No03], [Cl04], and [Ve06]). In this scheme it
is not the intention to reveal a secret key of a particular device but to retrieve
secret program code, i.e. the intellectual property of a programmer, in order to
duplicate the program similar to the reverse engineering of hardware, or to gain
a basic design level understanding of an implemented program making a porting
to another platform possible [CC90]. Furthermore, the term reverse engineering is
frequently used to describe the disclosure of secret algorithms.
In this thesis, we want to examine what can be achieved in terms of reverse
engineering by means of merely analyzing single observations of a side-channel,
which
2
Introduction
has been sparsely discussed so far. Yet, statistical methods utilizing a huge
amount
of observations are applied more frequently for reverse engineering. However, in
some
applications, as for example the recognition of a program path that was taken
during
a certain program execution, only single observation may be taken into account.
Thus, following this approach is reasonable and worth the effort to be examined.
Hence, for the analysis, a commercially available mircocontroller, namely the
PIC16F687 manufactured by Microchip, is used as a working example. Since only
single observations are to be analyzed, templates as introduced by Chari et al. in
[CRR02], specially suited for this purpose, will be utilized. In addition, the
sidechannel used in this thesis is the power consumption of the device, since this
is a
well known leakage source which can be easily observed.
The remaining chapters of this thesis are organized as follows:
The subsequent chapter covers the basics needed for the understanding of this
work. The causes for side-channel leakage are presented as well as the major
sidechannel analysis techniques developed during the last years, i.e. Simple Power
Analysis, Template Attacks, and Differentiel Power Analysis based on difference of
means
and correlation coefficients. Further, a brief overview of microcontrollers and the
already performed work on side-channel based reverse engineering will be given.
In Chapter 3, the target platform is examined from a theoretical point of view.
This comprises the design, i.e. components and their interaction, and further
instruction cycle and instruction set.
Assumptions concerning the power consumption properties of the device can be
drawn up and analyzed in detail by means of Simple Power Analysis in Chapter 4.
As a result a power model for the PIC16F687 is presented.
Due to this knowledge, several template creation techniques for the sake of
instruction recognition can be concluded and tested in order to discover a suitable
approach with respect to the underlying hardware in Chapter 5. Based on the
performed tests, some applications for templates in terms of reverse engineering
are
presented.
Finally, in chapter 6, an algorithm is presented, which can be used to detect
program paths from single power consumption traces if the program code is known.
Furthermore, its functionality is proven before this thesis is concluded and an
outlook
is given in the last two chapters.
2 Background
Chapter 2 lays the basis for this thesis. At first, the causes for side-channel
leakage
in the power consumption of CMOS devices are discussed. Afterwards, common
power analysis techniques will be presented in order to get a basic overview of
these
techniques and further to select promising methods for reverse engineering. We will
exclusively focus on the power consumption because it is the most frequently used
side-channel and the one we will exploit in this work. However, the techniques
discussed in Section 2.1.3 can be applied to other side-channels. Subsequently,
some
basic design principles for microcontrollers are presented. Although side-channel
analysis was invented to break cryptographic systems, some work has already been
done to exploit side-channel analysis for reverse engineering purposes on
microcontrollers. Section 2.3 summarizes this work.
Background
VDD
Output
Input
CL
(2.1)
When the input causes a cell to change its output, then the load capacitance, which
comprises the internal capacitances connected to the output as well as the external
capacitances, is either charged or discharged by a charging current. For example,
in the case of the inverter, CL is discharged over the NMOS transistor to GND on
an input transition from ’0’ to ’1’ and charged by Vdd over the PMOS transistor
on a transition from ’1’ to ’0’. Thus, we get an additional power consumption, the
so-called dynamic power consumption Pd . It can be computed as
2.1 Power Consumption
2
Pd = CL Vdd
fp
(2.2)
in which fp is the switching frequency of the cell. Since CL and Vdd are given, we
can conclude that a cell build in CMOS technology consumes the more power the
higher the switching frequency fp , i.e. the more often CL has to be reloaded in a
given time. Note that this frequency is determined by the input or more precisely
by the input sequence of the cell. Hence, Pd is data dependent.
The total power consumption Ptotal is then given by
Ptotal = Ps + Pd
(2.3)
(2.5)
6
Background
VS
VDD
integrated circuit
GND
VR
GND
2.1.3 Noise
When the power consumption of a deterministic device is measured with constant
inputs during a constant operation, power consumption traces should equal each
other. This is because the device performs exactly the same transitions with every
execution therefore resulting in a deterministic power trace. However, there is
always
a certain amount of electronic noise, i.e. random deviations from the intrinsic
signal,
present in each trace that can not be avoided completely. As a result the measured
power trace Pmeas can be described as the sum of the power consumption of the
device Psig and noise Pel.noise .
Pmeas = Psig + Pel.noise
(2.6)
There are various causes for noise, including power supply, USB/RS232 interfaces
that are connected to the device, radiated emissions, temperature, or quantization
noise caused by analog/digital conversion of the oscilloscope [MOP07].
The so-called Signal-to-Noise ratio is commonly used in signal theory to express
the ratio between signal, the component that contains information, and noise, the
component that contains no information at all:
SN R =
Psig
Pel.noise
(2.7)
Thus, the higher the Signal-to-Noise ratio, the lower the fraction of unwanted
noise and the easier it is to extract information out of the recorded power traces.
Hence, the aim is to design a measurement setup that suppresses noise as much as
possible.
2.2 Power Analysis
0.5
0.45
0.4
0.35
voltage (V)
0.3
0.25
0.2
0.15
0.1
0.05
0.5
1.5
2.5
time (ms)
Background
Fig. 2.3 exemplarily shows the power consumption trace of a DES1 encryption that
was implemented on a ATMega163 smart card. As can been seen, the sixteen DES
rounds can be clearly identified.
Additionally, it is possible to identify even small differences by inspecting a
visual
representation of just a small part of the trace. An attackers intention is to find
such
differences. More precisely, he searches for key dependent varieties in order to
reveal
the secret key bit by bit. Messerges et al. for example showed that the complexity
of
a brute-force attack on DES can be reduced from 256 to approximately 238 by
observing a certain peak heights during the PC1 permutation [MDS99]. Further
attacks
are based on conditional branches, which, for instance, occur in register rotations
or
permutation tables [KJJ99]. Some attacks on other cryptographic algorithms like
IDEA or RC5 can be found in [KSWH98]. Further, in [JQ01] an attack on elliptic
curve cryptography is provided.
As one can imagine, detailed knowledge about the implementation of the algorithm is
usually required to mount an attack. However, if only single or just a small
number of power traces can be recorded, e.g. when the device is only available for
a short period of time, SPA is quite useful in practice.
On the other hand, SPA is a great tool for reverse engineering, since it can be
used
to analyze the implications on the power consumption for different code sequences
or values.
(2.8)
plaintexts, which are additionally stored together with the traces. Now, the
attacker
can calculate the results of f (d, k) for each stored trace and each key
hypothesis.
For each key hypothesis, the measurements are divided into two parts based on the
results of the selection function, either 0 or 1. The mean values of the traces for
both
partitions are then calculated and the difference is built. If the key hypothesis
was
correct, the resulting difference trace shows significant peaks exactly at
positions
where the chosen intermediate result was processed. The correct key corresponds to
the difference trace with the highest peaks.
The reasons for this are quite simple. The power consumption for processing
the intermediate result v = 0 and v = 1 varies in parts in which v is processed.
However, due to noise, this rather small difference is not visible in the
difference of
just two traces. As a result, a large number has to be recorded in order to later
reduce noise by computing mean traces. The selection function helps to determine
whether a trace belongs to v = 0 or v = 1 for a certain key guess and the
difference
of the mean traces of both partitions indicates the correlation to this key guess.
For a better understanding, we first assume that the key hypothesis was correct.
Thus, f (d, k) always yields the correct result so that in the end the set of
traces
is partitioned one-hundred percent correctly. In this case, the noise reduction for
the mean traces is maximized so that dissimilarities according to the value of v
are
equally maximized. If the key hypothesis is not correct, both partitions contain a
certain amount of incorrect traces. As a result, the difference trace either shows
an
almost flat line if the key hypothesis was completely uncorrelated to the traces or
some rather small peaks in cases in which the key guess is somehow correlated to
the correct subkey.
As a concrete example, we will again consider a software implementation of the
DES algorithm. As a first step, the intermediate value v has to be specified. We
choose v as the first output bit of the first S-Box that is calculated in round
one.
The selection function is then given by
v = S1 ((E(IP (d)32−63 )0−5 ⊕ k0−5 ))1
(2.9)
where d is the plaintext and k0−5 represent 6 bits of an unknown subkey. Note,
that this is only one of many possible selection functions. We then record traces
of encryptions with random plaintexts. Subsequently, for each of the 26 = 64 key
hypotheses the traces are partitioned and averaged in order to create a difference
trace. All sixty-four differences can then be compared to find the correct key.
Fig.
2.4 exemplarily shows two differences of a DPA performed with five hundred
measurements. In contrast to the upper trace, in which no peaks are visible, the
bottom
curve clearly shows two peaks indicating the correct key. At this point of the
attack,
only six subkey bits are known. However, the attack can be repeated with other
selection functions that match the other S-Boxes in order to reveal the remaining
10
Background
4
−2
−4
−6
−8
500
550
600
650
700
750
time (µs)
11
(2.11)
On the basis of one of these models, the hypothetical power consumption for each
intermediate value can be calculated. In a concrete attack in which D traces were
recorded and K keys are possible, we can write the hypothetical power consumptions
for each value to a D × K matrix H in which each column corresponds to a certain
key hypothesis and each row to a certain plaintext.
By now, two conclusions can be drawn. First, both models are appropriate for
intermediate values that are not binary. This is one of the main advantages a DPA
based on correlation coefficient has over the difference of means approach in which
only binary models, i.e. v ∈ {0, 1} are feasible. Second, the values that occur in
both models are inaccurate concerning the values of the real power consumption.
To understand why this is not a problem, we will now have a look at the correlation
coefficient.
The correlation coefficient is a statistical method to measure linear relationships
between two random variables X and Y and is defined as follows:
12
Background
0.8
0.6
correlation coefficient
0.4
0.2
−0.2
−0.4
−0.6
−0.8
500
550
600
650
700
750
time (µs)
Figure 2.5: Correlation coefficients for a DPA on DES with the correct key
hypothesis, the peaks toward 0.8 and −0.8 indicate a high correlation
Cov(X, Y )
ρ(X, Y ) = p
V ar(X) · V ar(Y )
(2.12)
Cov(X, Y ) = E((X − E(X)) · E(Y − E(Y ))) = E(X · Y ) − E(X) · E(Y ) (2.13)
V ar(X) = E((X − E(X))2 )
(2.14)
2.2 Power Analysis
13
Since the expected values of X and Y are usually not known, they have to be
2
estimated by the arithmetic mean. The estimators σx,y
for the covariance and σx2
for the variance are then given by
n
1 X
(xi − x)(yi − y)
n − 1 i=1
2
σx,y
=
(2.15)
1 X
=
(xi − x)2
n − 1 i=1
σx2
(2.16)
in which n is the number of samples and x, y are the mean values. With (2.15)
we can estimate the correlation coefficient as
n
P
xi y i − n x y
i=1
r=r
(
n
P
i=1
x2i
− n x )(
n
P
i=1
(2.17)
yi2
− ny )
ri,j = s
(
(hd,i td,j ) − n hi tj
d=1
D
P
d=1
h2d,i − n hi )(
D
P
d=1
(2.18)
2
t2d,j − n tj )
If the key hypothesis i was correct, the correlation coefficient ri,j will then
show
significant peaks at positions j where the intermediate value is processed.
For an exemplary DPA on DES with the Hamming-Weight model as the model
of choice, we can for example estimate the power consumption as
hi,j = HW (S1 ((E(IP (d)32−63 )0−5 ⊕ k0−5 )))
(2.19)
14
Background
for each possible key k and plaintext d to correlate these values to our power
consumption traces. Fig. 2.5 shows the results for a correlation DPA on DES on an
ATMega163 smart card with 200 measurements and a correct key hypothesis. As
can be seen, the trace shows significant peaks indicating a high correlation of
about
0.75. By comparing this figure to Fig. 2.3 we can see that the correlation occurs
at the time of the first DES round, i.e. exactly where the intermediate value is
processed.
Please note, that a DPA by means of correlation coefficients is not a higherorder
DPA. As with the difference of means approach, just one intermediate value is
exploited, although the number of bits of this value was increased from one to four
bits.
15
0.7
(0,1)
(0,0.5)
(1,2)
0.6
probability
0.5
0.4
0.3
0.2
0.1
0
−4
−3
−2
−1
0
x
Figure 2.6: Three plots of normal distirbutions with different values (x, σx )
the standard deviation can be estimated by the square root of σx2 which has been
defined in (2.16). The probability density function for a normal distributed random
value is then given by
f (x) = √
1 x−x 2
1
exp (− (
))
2 σx
2πσx
(2.20)
y(x) = p
(2.21)
Background
guarantees that the square root in (2.21) can be calculated. The mean vector m is
a vector of n elements and contains the arithmetic mean for each point.
With the Mulivariate-Gaussian distribution we now have an adequate model for
the power consumption. The mean vector m represents the static component of the
power consumption, i.e. Psig and the dynamic component Pel.noise is modeled by the
covariance matrix C. As a result, we can define templates as follows:
Definition 2.2.3 (Template) A template
T with n points
t1,1 t1,2
t2,1 t2,2
T = ..
..
.
.
tm,1 tm,2
t1,n
t2,n
..
.
(2.22)
. . . tm,n
(2.23)
c1,1 c1,2
c2,1 c2,2
C = ..
..
.
.
cn,1 cn,2
in which ci,j =
1
m−1
m
P
. . . c1,n
. . . c2,n
. . . ..
.
. . . cn,n
(2.24)
(tk,i − ti )(tk,j − tj ).
k=1
With Definition 2.2.4, we are now able to build multiple templates hi covering
different cases of power consumption characteristics. For instance, to identify
which
Hamming weight belongs to an intermediate value, we could create several templates,
one for each Hamming weight.
Template matching phase In the template matching phase the goal is to find out
which template is the most likely for a power trace taken by use of the device
under
attack. This can be achieved by calculating the result of the probability density
function for each template as shown in the following equation.
p(x, hi ) = p
1
· exp (− (x − mi )T · C−1
i · (x − mi ))
2
· det(Ci )
1
(2π)n
(2.25)
The resulting values p(x, hi ) indicate how well a template fits to an observed
vector
x. If all templates are equiprobable, then the best choice is to choose the
template
2.2 Power Analysis
17
with the highest probability, i.e. the highest value. This approach is called the
maximum-likelihood decision rule [Pa01].
By now, we have discussed the theory that is needed to apply a template attack.
In practice however, problems can occur caused by the covariance matrix. First, the
size of the covariance matrix grows quadratically with the number of points. As a
result, it is adequate to limit the number of points to only those which are likely
to
contain information. For this purpose, Chari et. al proposed to calculate pairwise
differences of averaged traces for each observation. Consequently, points showing a
high difference are the ones containing information and should thus be used for the
template building phase. The second problem is that the covariance matrix can be
badly conditioned so that the inverse of C, which has to be calculated in (2.25),
tends to be close to singular. Consequently, the values calculated in the exponent
tend to be very small, which causes additional numerical problems. Basically, there
are two ways to counter the mentioned problems.
The first way is to avoid the exponentiation completely. To do this, the logarithm
can be applied to (2.25):
1
ln (p(x, hi )) = − (ln((2π)n · det(Ci )) + (x − mi )T · C−1
i · (x − mi ))
2
(2.26)
In this case, the template hi that leads to the smallest absolute value has the
highest probability.
In a second approach, we can set the covariance matrix to the identity matrix. A
template built this way is called reduced template [MOP07]. Clearly, this solves
the
problem of inverting C but as a drawback infomation about the noise component
is lost. For the use of reduced templates, (2.25) and (2.26) can be written as
(2.27)
and (2.28) respectively.
1
1
p(x, hi ) = p
· exp (− (x − mi )T (x − mi ))
2
(2π)n
1
ln (p(x, hi )) = − (ln((2π)n ) + (x − mi )T (x − mi ))
2
(2.27)
(2.28)
Background
Countermeasures
Hiding
Time dimension
Dummy
operations
Shuffling
Masking
Amplitude
dimension
Increase
noise
Boolean
Arithmetic
Reduce
signal
2.2.5 Countermeasures
The described power analysis attacks have in common that they exploit key dependent
differences which occur in the power consumption traces. Consequently, the
goal of countermeasures is to prevent these key related dependencies. In this
section
we will briefly discuss various countermeasures that can be taken into
consideration.
As shown in Fig. 2.7 countermeasures are divided into hiding and masking
schemes. In hiding schemes, the designer tries to hide the power consumption in a
way that the measured power consumption is independent of the intermediate values
and independent of the performed operations. This means that both dimension,
time and amplitude have to be secured.
In the time domain, one way to avoid operation dependent execution times is
to insert random dummy operations. Thereby, an attacker can not determine the
position of the actual algorithm that is performed. A drawback of this technique
is that the throughput is reduced. Another method to hide the time domain is
shuffling. In this approach the sequence of executed primitives of the algorithm is
randomly changed in every iteration so that it becomes more complex to find the
chronologic position of the attacked intermediate value.
2.3 Microcontrollers
19
For the amplitude dimension, mainly two methods are feasible. The first one is to
purposely increase the noise of the device to drastically reduce the SN R. This
again
significantly increases the number of measurements needed to mount an attack. One
way to realize an increase of noise it to implement specific noise generators. The
second method of amplitude hiding is the opposite of the first. Instead of
increasing
noise, the signal is reduced by adjusting the power consumption of the device to a
constant level. This can be achieved by filtering or by using a dedicated logic
style
whose cells which ideally consume a constant power.
An alternative to hiding is masking. This technique does not hide intermediate
values but it randomizes these by applying a mask. In this scenario an attacker
faces
the problem that he can not sort or correlate the traces to the intermediate value
of
his choice because this value has additionally become a function of a mask m (v =
f (d, k, m)), which is unknown and changes in every iteration of the algorithm.
There
are two types of masking: boolean and arithmetic masking. In boolean masking the
intermediate value is simply XORed with the mask, whereas in arithmetic masking
arithmetic operations like modular addition or multiplication are applied.
Hiding and masking can be implemented in combination to increase the security.
However, both techniques are not completely secure. Actually, hiding only hides
key related dependencies and does not eliminate these completely. Thus, with a
sufficient high amount of traces this technique can be attacked. Masking schemes
can equally be attacked by means of higher-order DPA.
More information about this topic can be, for example, found in [MOP07] or
[CJRR99].
2.3 Microcontrollers
To get an overview of microcontrollers, this section provides a brief overview
about
its components. Furthermore, some general design principles are discussed. Specific
information of the device that will be analyzed in this work is covered in Chapter
3.
As shown in Fig. 2.8, a microcontroller consists of CPU, Memory and peripheral
components which are connected via a bus system.
The CPU represents the main part. Its function is to fetch instructions from
memory, interpret and execute them. Basically there are two architectures to
implement
this as illustrated in Fig. 2.9. The first one is the von-Neumann architecture. In
this
design, instructions and data are fetched from the same memory over the same bus
of width x. Usually several bus accesses are needed to fetch an instruction in this
scenario. Further, data may have to be fetched subsequently. As a result, the bus
can become extremely congested. To avoid this, the second approach, called Havard
20
Background
Microcontroller
CPU
Control unit
Memory
(volatile and non -volatile)
Registers
R1
R2
...
Peripheral
components
( digital / analog I /O,
UART, …)
Rn
Bus
21
Von Neumann
CPU
Program
and
Data
Memory
Havard
Data
Memory
CPU
Program
Memory
Background
microcontroller may contain special purpose hardware like USB or LCD driver.
More information about CPU design can for example be found in [Ta90].
Device Examination
13 Bit
Data Bus
Program Counter
Flash
2048 x 14
Program
Memory
PORTA
RA0
RA1
RA2
RA3
RA4
RA5
SRAM
256 Bytes
File
Registers
9 Bit
Program Bus
RAM
Address
PORTB
14 Bit
Addr MUX
Direct Address
Instruction Reg
Indirect
8 Bit Address
7 Bit
FSR Reg
8 Bit
Instruction
Decode and
Control
Power-up
Timer
STATUS Reg
Oscillator
Start-up Timer
8 Bit
3 Bit
MUX
PORTC
RC0
RC1
RC2
RC3
RC4
RC5
RC6
RC7
Power-on
Reset
OSC1/CLKI
OSC2/CLKO
RB4
RB5
RB6
RB7
Watchdog
Timer
Timing
Generation
ALU
Brown-out
Reset
8 Bit
Internal
Oscillator
Block
W Reg
MCLR VDD
VSS
SDO SDA SCL SS
ULPWU
Ultra Low-Power
wake-up
T0CKI
Timer0
T1G
Timer1
T1CKI
TX
RX
Synchronous
Serial Port
EUSART
8 Bit
EEDAT
Analog-To-Digital Converter
256 Bytes
Data
EEPROM
EEADR
25
Another way to access the SRAM is to use an indirect address stored in FSR. The
register content to which the applied address points to is sent to the data bus
later
on. Thus, all components connected to it have access to this data.
Moreover, an instruction can contain constants to be processed by the ALU. For
this reason, eight bits of IR are connected to an ALU multiplexer whose second
input is the data bus. Which one is sent to the ALU is controlled by the Decode
and Control unit. The second operand of the ALU is provided by a working register
(W). Consequently, if two operands stored in memory or in other components
connected to the data bus, are to be processed by an instruction, one of these has
to be loaded into W first. After the ALU has combined these operands, the result
is sent to the data bus. Additionally, the status register may be affected by this
operation, e.g. when a carry bit occurs or when the result of an operation is zero.
After the ALU has performed an operation and the result is stored to some component
connected to the data bus, the next instruction has to be fetched from the
program memory. The address of this instruction is determined by the program
counter (PC). Usually it is incremented every instruction cycle to point to the
next instruction. However, if branches are taken, the PC is modified by the data
bus in order to point to the instruction to be branched to. Further, the currently
addressed memory location can be saved on an 8-level stack to be able to return
to this instruction later on.
All the discussed actions which are executed by the PIC are controlled by a clock.
This can be an internal clock generated by an internal oscillator or an external
clock,
e.g. a crystal. Depending on the type of the clock, the working range of the device
differs. With an external clock the entire clock range (up to 20 MHz) can be
utilized,
whereas the internal clock is limited to a range from 32 kHz to 8 MHz.
Device Examination
Cy0
Cy1
FETCH 1
EXECUTE 1
FETCH 2
Code:
1. Operation A
2. Operation B
3. Branch to X
4. Operation D
5. Operation X
Cy2
Cy3
Cy4
Cy5
EXECUTE 2
FETCH 3
EXECUTE 3
FETCH 4
NOP
FETCH X
EXECUTE X
FETCH X+1
27
8
Identifier
0
f (file register address )
10
Identifier
b (bit number)
6
f (file register address )
Identifier
k (literal )
0
k (literal )
Device Examination
Mnemonic,
Operands
Description
Cycles
14-Bit Opcode
f,
f,
f
f,
f,
f,
f,
f,
f,
f,
f
f,
f,
f,
f,
f,
d
d
d
d
d
d
d
Add W to f
AND W to f
Clear f
Clear W
Complement f
Decrement f
Decrement f, Skip if 0
Increment f
Increment f , Skip if 0
Inclusive OR W to f
Move f
Move W to f
No Operation
Rotate Left f through Carry
Rotate Right f through Carry
Subtract W from f
Swap nibbles in f
Exclusive OR W with f
BCF
BSF
BTFSC
BTFSS
f,
f,
f,
f,
b
b
b
b
Bit
Bit
Bit
Bit
ADDLW
ANDLW
CALL
CLRWDT
GOTO
IORLW
MOVLW
RETFIE
RETLW
RETURN
SLEEP
SUBLW
XORLW
k
k
k
k
k
k
k
k
k
d
d
d
d
d
d
d
1
1
1
1
1
1
1 (2)
1
1 (2)
1
1
1
1
1
1
1
1
1
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
0111
0101
0001
0001
1001
0011
1011
1010
1111
0100
1000
0000
0000
1101
1100
0010
1110
0110
dfff
dfff
1fff
0xxx
dfff
dfff
dfff
dfff
dfff
dfff
dfff
1fff
0xx0
dfff
dfff
dfff
dfff
dfff
ffff
ffff
ffff
xxxx
ffff
ffff
ffff
ffff
ffff
ffff
ffff
ffff
0000
ffff
ffff
ffff
ffff
ffff
01
01
01
01
00bb
01bb
10bb
11bb
bfff
bfff
bfff
bfff
ffff
ffff
ffff
ffff
11
11
10
00
10
11
11
00
11
00
00
11
11
111x
1001
0kkk
0000
1kkk
1000
00xx
0000
01xx
0000
0000
110x
1010
kkkk
kkkk
kkkk
0110
kkkk
kkkk
kkkk
0000
kkkk
0000
0110
kkkk
kkkk
kkkk
kkkk
kkkk
0100
kkkk
kkkk
kkkk
1001
kkkk
1000
0011
kkkk
kkkk
1
1
1 (2)
1 (2)
1
1
2
1
2
1
1
2
2
2
1
1
1
0.25
Cy1
Q1
Q3
Q2
Cy2
Q4
Q1
Q2
Q2
Cy3
Q3
Q4
Q1
Q2
Q3
Q4
0.2
0.15
0.1
0.05
8
time [µs]
10
12
14
16
Figure 4.1: How to interpret power consumption traces of the PIC16F687 – clock
signal (top/green), trigger signal (center/red) and power consumption
(bottom/blue)
consumption figures.
4.1 Clock
First, the influences of the clock will be analyzed. As we have seen in the
exemplary
figure of the previous section, peaks occur at rising edges of the clock. If the
clock
rate is increased, a higher power consumption can be expected (see equation (2.2))
due to a higher switching frequency of the transistors. Additionally, effects due
to
capacitances may be visible, mainly because the distance between peaks decreases
with higher clock rates.
To test this, one and the same code sequence consisting of five instructions was
executed for clock frequencies of 32kHz, 1MHz, and 4MHz respectively. The results
are shown in Fig. 4.2.
4.1 Clock
31
32kHz
voltage [mV]
400
300
200
100
0
0.2
0.4
0.6
0.8
1.2
1.4
time [ms]
1MHz
voltage [mV]
400
300
200
100
0
10
15
20
time [µs]
4MHz
25
30
35
40
voltage [mV]
500
400
300
200
100
5
time [µs]
10
Figure 4.2: Clock rate influences – same code sequence at 32kHz, 1MHz, and 4Mhz
As can be seen in this figure, at a clock speed of only 32kHz, the power
consumption trace looks extremely regular. Peaks only occur at rising and falling
edges of the
clock, whereas even the peaks of falling clock edges have a constant height. Thus,
only the peaks at rising edges show differences and are therefore the only
indication
for the kind of instructions executed during that instruction cycle. The measured
voltages are in-between 100 and 320 mV .
At a clock rate of 1MHz the trace becomes more dynamic. At rising clock edges
we see that the power consumption rises rapidly but falls back slowly1 . As already
mentioned, this is most likely caused by reloading capacitances of the device. As a
result, when the capacitances are not completely reloaded before the next clock
edge
rises, the trace gets a steppy shape. Additionally, the peaks which occur at
falling
clock edges can no longer be as clearly identified as in the case of 32kHz. They
are
only indicated by a small bias on top of the falling curve. The measured voltage is
in-between 100 and 400 mV and thus slightly higher than at a clock rate of 32kHz.
1
a similar behavior can even be seen in the 32kHz trace. However, due to the high
time scale,
this can not be identified in the figure
32
If the clock rate is further increased, as shown by the example of 4MHz, the
time for reloading the capacitances becomes even shorter. As expected, the time is
reduced to a fourth of the 1MHz test run. Thus, the power consumption trace gets
an even more dynamic shape. Actually, the power consumption of one instruction
cycle propagates into the next one, which was not the case at a clock rate of 1MHz.
For instance, the trace of 4MHz has a lower constant component at the beginning
than at the end, whereas the 1MHz trace always falls back to the same level at the
end of an instruction cycle. The measured voltages in this case are significantly
higher than in the case of 32kHz and 1MHz.
At this point, the decision has to be made with which clock rate to proceed.
Obviously, the use of 4 MHz is not appropriate since the recognition of single
instruction
cycle will be hindered. Thus, 1 MHz or 32 kHz are applicable. Since 1 MHz is closer
to a real case clock rate and further speeds up the measuring process, solely this
clock rate will be utilized in this work. Yet it is merely questionable whether a
clock
rate of 32 kHz yields better results.
33
0.4
0.35
reset trigger
set trigger
0.3
Q1
voltage [V]
Q3 Q4
0.25
0.2
0.15
0.1
end last NOP
10
15
time [µs]
20
25
30
Figure 4.3: Power consumption influences of the working register – power traces
of two code sequences of five NOP instructions with different working
register contents
4. reset trigger signal
5. repeat one hundred and fifty times
6. calculate the average trace for each test case
By setting W to 0x00h and 0xffh respectively, we have two completely different
Hamming weights: in one case it is zero and in the other case it is eight.
According
to the discussed Hamming-Weight power model in Section 2.2.2 and under the
assumption that this model is adequate, the result will show significant deviations
in
the power consumption traces since the power consumption in this model is assumed
to be proportional to the Hamming weight of a value.
The results of the test are shown in Fig. 4.3. The trace for the value 0x00h is
plotted in blue, whereas the trace for value 0xffh is plotted in green. Both are
averaged over one hundred and fifty measurements. To review how power traces
are analyzed, the most important points are marked by arrows. At first, we have
two arrows indicating the set and reset instruction for the trigger signal, visible
as
34
comparatively high peaks at the start and end of the traces. Hence, these points
mark start and end of the test sequence. The first peak following the set-trigger
peak corresponds to Q1 of the first NOP instruction2 . Thus, this peak together
with the following nineteen peaks comprise the five NOP instructions.
As can be seen, Q1, Q3, and Q4 of the second instruction, indicated by arrows,
show offsets which can only be caused by different contents of W. More precisely,
the Q1 ,Q3, and, less distinctive, the Q4 peak for a value of 0x00h are
significantly
smaller than for a value of 0xffh. Same holds for the third and fourth NOP. Since
it
is known from the device examination that the instruction is executed in the third
and stored in the fourth clock cycle, Q3 may be caused by sending the contents
of the ALU to the data bus and from there back to W during Q4. About Q1 at
this state, no assumption can be drawn up. However, the Hamming-Weight model
seems to describe the discussed peaks quite accurately which was further verified
by
additional tests performed with different Hamming weights.
The contents of the working register highly influences Q1, Q3 and less
distinctively Q4. Q2 remains unaffected.
In contrast to this, the last NOP is different compared to its predecessors. The
peaks of Q2 to Q4 in this case provide an offset of approximately 25 mV . However,
the same features are visible as with the previous NOP instructions. Consequently,
this deviation is most-likely caused by a fetching process, i.e. fetching the BCF
instruction resetting the trigger. Yet, this is only an assumption and has to be
verified by adequate tests later on.
Furthermore, the first NOP is also different to the second to fourth NOP, since
Q1 does not provide the same offset. In consequence of this, the Hamming-Weight
model does not describe the power consumption of Q1 accurately in this case. We
have to keep this in mind when analyzing other aspects of the power consumption
in order to reveal causes for this difference.
Recapitulatory, even though the working register may not be needed by an
instruction, its contents can have significant influences on the power consumption
of
Q1 and Q3 and less distinctive on Q4. However, Q2 remains unaffected.
Actually, this was a working hypothesis which turned out to be correct due to the
fact that all
subsequent observations can be explained by assigning the highest peak to Q4 of the
(re)set
trigger instruction.
4.3 Fetching Process
35
on the instruction that is fetched while another one is executed. This will be
analyzed
in more detail throughout this section.
By now, it is known from Section 3.2 that the fetching process starts by
incrementing PC in Q1 and ends at Q4. Hence, theoretically the power consumption of
an entire instruction cycle may be influenced.
Test Description 4.3.1 (Fetching Process) To see how the fetching process affects
the power consumption, the following test was performed:
1. set W to 0x55h to have fixed value
2. set trigger signal
3. execute and measure one of the following sequences
NOP NOP [NOP / SUBWF 0x5Dh / SUBLW 0x55h] NOP NOP
4. reset trigger signal
5. repeat one hundred and fifty times for each possible code sequence
6. calculate the average trace for each test case
To analyze this, the test described in Test Description 4.3.1 was performed. At
first, W is set to 0x55h simply to have a clearly defined value throughout the test
sequence. Then, the trigger signal is set to activate the trace acquisition before
the
test sequence consisting of five instructions starts. In this sequence, all
instructions
are fixed except for the third one which is one of the following : NOP, MOVWF
0x5Dh, or MOVLW 0x55h. The MOVWF instruction is a file-register operation
which moves the contents of W to the file-register its operand points to, in this
case
0x5Dh. In contrast to this, MOVLW is a literal operation moving a literal (0x55h)
to W. Note that both instructions do not modify the contents of W so that after
instruction execution the entire data path is in the same state as before. As a
result,
the successive instructions should not be affected.
Further, the opcodes of MOVWF 0x5Dh and MOVLW 0x55h have the same Hamming weight of
six, in contrast to the NOP which has a Hamming weight of zero
(see Fig. 3.4). Thus, by comparing the NOP trace to the others, differences
concerning the Hamming weight can be identified as far as they exist. By comparing
MOVWF to MOVLW we may then be able to identify differences concerning the
type of instruction, i.e. file-register or literal operation.
The results of the tests are illustrated in Fig. 4.4. The upper plot shows the
entire
averaged traces for each scenario. The trace for the NOP instruction is plotted in
blue, MOVWF is plotted in green, and MOVLW in red. As expected, all traces
have an identical shape except for the second and third instruction cycle in which
the test instruction is fetched and executed respectively.
36
0.35
voltage [V]
0.3
execute
fetch
0.25
0.2
0.15
0.1
0.05
10
15
20
25
time [µs]
Zoomed in on the area of 6 to 11µs
0.24
voltage [V]
0.22
0.2
0.18
0.16
0.14
0.12
0.1
6.5
7.5
8
8.5
time [µs]
9.5
10
10.5
11
Figure 4.4: Power consumption influences of the two-stage pipeline – the upper plot
shows three traces for three different test cases, the lower plot shows a
zoom of the upper trace in the area of 6 to 11 µs
The fetching process caused by the two-stage pipeline increases the overall
power consumption of Q2 to Q4 in a direct proportional relationship to the
Hamming weight of the opcode. Q1 remains unaffected.
The lower plot shows a zoom-in on the area of 6 to 11 µs which contains Q1 to Q4
of the instruction cycle when the test instruction is fetched plus Q1 of the
instruction
cycle when it is executed. As can be seen, Q1 of fetching the test instruction is
equal for all three cases. In contrast to this, Q2 to Q4 are obviously affected by
the
Hamming-Weight of the opcode because they are equal for MOVWF and MOVLW
whose opcode has the same Hamming weight but not for NOP which has a Hamming
weight of zero. Since, Q2 to Q4 are significantly higher for MOVLW and MOVWF,
this indicates a direct proportional relationship to the Hamming weight of the
opcode
which could be verified by additional tests. Further, when comparing the MOVLW
trace to MOVWF we see that Q4 of MOVWF has s slightly higher peak. As a
result, this may be an indicator for the kind of operation that is fetched.
4.4 Data Bus
37
0xAAh / ADDLW
0x33h] NOP
(4.1)
(4.2)
(4.3)
For the second test ADDLW is replaced by the file-register instruction ADDWF,
which performs the same operation but by means of a file-register. Since the
summand added to W is not provided by the opcode in this case, the respective file
register has to be set to the test value in advance. However, as with the previous
test we get three different Hamming distances and can thus decide if the Hamming
distance model describes the power consumption adequately.
Test Description 4.4.2 (Data Bus – File-register) To see how the multiplexer
connected upstream the ALU affects the power consumption concerning file-register
operations, the following test was performed:
1. set file-register to [0x55h / 0xAAh / 0x33h]
2. set W to 0x55h
3. set trigger signal
4. execute and measure the following sequences
NOP ADDWF 0x40h,1 NOP
4.4 Data Bus
39
Q1 of ADDLW
Q1 of ADDWF
0.28
0.28
0x55h
0xAAh
0x33h
0.26
0.24
0.24
0.22
0.22
voltage [V]
voltage [V]
0.26
0.2
0.18
0.2
0.18
0.16
0.16
0.14
0.14
0.12
0.12
0.1
6.5
0x55h
0xAAh
0x33h
7.5
0.1
time [µs]
6.5
7.5
time [µs]
Figure 4.5: Effects caused by the data bus – three average traces for ADDLW (left)
and MOVWF (right)
5. reset trigger signal
6. repeat one hundred and fifty times for each value of file-register 0x40
7. calculate the average trace for each test case
Fig. 4.5 illustrated two power consumption plots of Q1 each showing the averaged
traces for the three different test cases. The left plot corresponds to the
execution of
ADDLW, whereas the right plot corresponds to ADDWF. As can be seen, the plot
for ADDLW shows a direct proportional relationship to the Hamming distance of W
and the three literals, i.e. 0xAAh with a Hamming distance of eight shows the
highest
peak and 0x55h the smallest. Same holds for ADDWF and the different contents
of the file-register. Hence, our assumption about the influences on Q1 was correct.
Additionally, when comparing both plots, we see that the power consumption, in
average, is higher for the file-register operations than for the literal
operations. This
may be due to the fact that SRAM is activated during the execution of a file-
register
operation.
40
Since the assumption about the Hamming distance model at Q1 was correct we
can further state that this behavior is in conjunction with changing values on the
data bus, as explained at the beginning of this section. This means that whenever
data on the data bus is modified, the power consumption will be proportional to the
number of bits that have to be changed. Consequently, we can assume that when
the ALU executes the actual instruction, e.g. performs an addition, this will
result
in an equivalent behavior, since the result is sent to the bus. This will be
analyzed
in the next section.
(4.4)
(4.5)
(4.6)
41
Q3 of ADDLW
Q3 of ADDWF
0.28
0.28
0x55h
0xAAh
0x33h
0.26
0.24
0.24
0.22
0.22
voltage [V]
voltage [V]
0.26
0.2
0.18
0.2
0.18
0.16
0.16
0.14
0.14
0.12
0.12
0.1
8.5
9
time [µs]
0x55h
0xAAh
0x33h
9.5
0.1
8.5
9.5
time [µs]
Figure 4.6: Effects caused by the result of the ALU – three average traces for
ADDLW (left) and MOVWF (right)
shown in Fig. 4.6. The left plot shows the three average traces for MOVLW and the
right plot for MOFWF. As expected, the value of 0x55h shows the highest power
consumption in both plots, since the Hamming distance in this case is maximized.
According to this, 0xAAh with a Hamming distance of four shows the smallest peak.
Hence, this indicates the correctness of our assumption. Again, this was verified
by
additional tests.
4.6 Opcode
In the last two sections we have discovered how literals or file-register addresses
affect the power consumption of Q1 and Q3 due to the data bus. However, until
now we do not know if the Hamming weight of the opcode in general influences the
power consumption.
To check this, the procedure given in Test Description 4.6.1 was performed, which
42
voltage [V]
0.25
HW = 3
HW = 5
HW = 7
HW = 9
0.2
0.15
0.1
6.5
7.5
8.5
time [µs]
9.5
10
10.5
Figure 4.7: Effects caused by Hamming weight of the opcode – averaged traces for
four different Hamming weights
works with the MOVWF instruction. It comprises four test cases for four different
file-register addresses. At first, the respective file-register is cleared and W is
set to
0x55h before the trigger signal is set. Then the test sequence is executed and
measured for one hundred and fifty times. The MOVWF instruction has an important
property: the Hamming weight of the opcode can be increased without changing
any other conditions possibly influencing the power consumption. For instance, the
operands and thus the result calculated by the ALU is equal for each test case as
long as the file-register is cleared before the test execution.
Test Description 4.6.1 (Opcode) To see how the Hamming weight of the opcode
affects the power consumption the following test was performed:
1. clear file-register [0x40h / 0x54h / 0x5Dh / 0x7Fh]
2. set W to 0x55h
3. set trigger signal
4. execute and measure the following sequences
NOP MOVWF [0x40h / 0x54h / 0x5Dh / 0x7Fh],1 NOP
5. reset trigger signal
6. repeat one hundred and fifty times for each file-register address 0x40
7. calculate the average trace for each test case
The results of the test are illustrated in Fig. 4.7. It shows the average traces
for
the four different Hamming weights of the opcode, which are in range of three to
nine. As can be seen, Q1 and Q3 are constant in each test case due to a constant
value of W and the respective file-register. However, Q2 and Q4 are affected by the
4.7 Instruction Register
43
0.26
votlage [V]
0.24
0.22
0.2
0.18
0.16
6
9
time [µs]
10
11
−10
x 10
Figure 4.8: Effects caused by Hamming distance of the current and previous opcode
– averaged traces for three different Hamming distances
Q4 shows a direct proportional relationship to the Hamming distance of the
executed and fetched opcode.
Fig. 4.8 shows the power consumption of the instruction cycle in which the first
MOVLW instruction is executed. As can be seen, Q1 to Q3 are undistinguishable.
Nonetheless, Q4 shows differences proportional to the Hamming distance of the
current instruction being executed and the subsequent one, which is the second
MOVLW instruction differing in the used literal. Hence, our assumption has been
proved.
4.8 Noise
The last influence that is analyzed is the noise component of the measured power
consumption. For this, the power consumption of a fixed code sequence with fixed
operands was measured one thousand times. The distribution for one fixed point
in time can then be illustrated by a histogram, which is often used to present the
distribution of data values. The results are shown in Fig. 4.9. At the axis of
abscissae
the measured voltage is plotted. The ordinate shows the frequency of the measured
voltage. Thus, the sum of all occurrences is equal to the number of measurements,
i.e. one thousand. As can be seen from the histogram, the noise component is
close to normal distributed. Solely the highest peak seems to be unusual for the
normal-distribution indicated by the remaining peaks. However, with only a set of
one thousands samples such divergences can occur.
As a result, since the noise component of single points is normal-distributed, the
Multivariate-Gaussian distribution is an appropriate model for the noise component
of the PIC.
4.9 Other Influences
45
250
frequency
200
150
100
50
0
0.105
0.11
0.115
0.12
0.125
voltage [V]
0.13
0.135
0.14
Figure 4.9: Histogram showing the noise distribution for a fixed point in time
Q2
Q3
Q4
Hamming
distance of value
on data bus and
operand
Hamming weight
of opcode
Hamming
distance of
operand and
result
Hamming weight
of opcode
Hamming
distance of
opcode of current
and subsequent
instruction
Hamming weight of opcode of next instruction
Clock rate
47
since we can not act on the assumption that the power consumption properties are
described completely. For instance, certain bits of an operand may be weighted
higher than others. Yet, this has not been analyzed. Moreover, we mainly focused
on file-register and literal operations which work on two operands. Other
instructions like, e.g., CLRW for clearing W, was not analyzed. However, the
discussed
instructions theoretically show the highest activity, since operands are loaded
from
memory, processed and stored back. Consequently, we can assume that other
instructions, which for example only use one operand, show significantly less
activity
concerning the power consumption. For instance, a MOVLW instruction shows no
deviations on Q3 concerning the used literal since this value is already on the
data
bus in Q1. Moreover, literal and file-register operations that work on two operands
constitute the majority of nearly fifty percent of the entire instruction set and
examinations of freely available program code showed that these are the most
frequently
used operations together with MOVLW and MOVWF. Hence, to get an overview of
the power consumption properties, it was reasonable to concentrate on those
effects.
As a consequence of the revealed influences, we can derive a qualitative power
model for the various peaks of the instruction cycle as described in Definition
4.10.1.
Definition 4.10.1 (Power Model for the PIC) The power consumption of Q1
to Q4 of an instruction cycle can be described by the following equations:
PQ1 = Pconst,Q1 + Pbus · HD(vbus , voperand ) + Pext,Q1 + Pel.noise
PQ2 = Pconst,Q2 + Popcode · (HW (vopcode ) + HW (vopcode+1 )
+ Pext,Q2 + Pel.noise
PQ3 = Pconst,Q3 + Pbus · HD(voperand , vresult ) + Popcode · HW (vopcode+1 )
+ Pext,Q3 + Pel.noise
PQ4 = Pconst,Q4 + Popcode · (HW (vopcode ) + HW (vopcode+1 )
+ Pprog.bus · HD(vopcode , vopcode+1 ) + Pext,Q4 + Pel.noise
(4.7a)
(4.7b)
(4.7c)
(4.7d)
with Pbus > Pprog.bus > Popcode and Pel.noise ∼ N (0, σ).
As can be seen, each equation contains a constant power consumption Pconst,Qx ,
which is caused by leakage currents or those transistors switching unaffected by
the
performed instruction or used operands. Furthermore, each clock cycle contains
an extra power consumption Pext,Qx , which corresponds to the part of the power
consumption caused by effects that have not been revealed throughout this chapter.
Hence, when estimating the power consumption with the provided equations, this
part can not be taken into account. The electronic noise component Pel.noise , also
a component of every equation, is, as shown in Section 4.8, a normal-distributed
random variable which can be described by a mean value of zero and a variance
σ depending on the measurement setup. The power consumption of the first clock
48
cycle PQ1 in particular further depends on the Hamming distance between the value
on the data bus vbus and the operand provided by an instruction voperand . Hence,
the power consumption can be estimated by multiplying a constant for the data bus
– referred to as Pbus – to this Hamming distance. Similarly, the influences of the
opcode concerning the current instruction (vopcode ) and the subsequent instruction
(vopcode+1 ) can be computed by means of a factor named Popcode . Furthermore, the
power consumption of the instruction bus affecting the power consumption at Q4
is represented by Pinst.bus . Finally, vresult occurring in Q3 corresponds to the
value
calculated by the ALU.
According to the performed tests, approximated values for the various components
of the power consumption can additionally be provided as illustrated in Tab. 4.1.
As can be seen, the values for Pconst,Q1 to Pconst,Q4 range from 144.5 mV up to 180
mV. Furthermore, Pbus increases the power consumption at Q1 and Q3 by about
10 mV per Hamming distance, which is significantly higher than for Pinst.bus . This
was predictable as the data bus connects several peripheral components resulting in
comparatively long wires and thus in a high load capacitance. At last, Popcode can
be estimated by 4.2 mV per Hamming weight and the tests showed that this value
is suitable for weighting the influences of vopcode as well as for vopcode+1 .
Component Value (in mV)
Pconst,Q1
Pconst,Q2
Pconst,Q3
Pconst,Q4
Pbus
Popcode
Pinst.bus
176.5
144.5
163.5
180.0
10.0
3.6
4.2
Note, that the provided power model is more suitable than a simplistic
Hammingweight or Hamming-distance model. Hence, this power model may be used for
DPA
attacks by means of correlation coefficients on the PIC and should yield
significant
better results. Yet, this has not been proved.
To conclude, when executing one and the same instruction, the power consumption
of that instruction is highly influenced by operands and opcode. Further, the next
instruction influences the shape of the power consumption trace due to the fetching
process as well as the previous instruction due to its effect on the current value
on
the data bus. Hence, it will most-likely become difficult to determine the executed
4.10 Summary and Conclusion
49
Template Creation
average traces
voltage [V]
0.3
0.2
0.1
0
9
time [µs]
difference of averaged traces
10
11
12
10
11
12
10
11
12
voltage [V]
0.08
0.06
0.04
0.02
0
−4
voltage [V]
x 10
9
time [µs]
variance of single traces
6
4
2
0
9
time [µs]
Figure 5.1: Two approaches for identifying points of information – the upper plot
shows average traces for two scenarios, the center plot shows the difference of
both, and the lower plot shows the variance taken from eighty
measurements
peaks. In this case, points of information are points which help to identify the
patterns a template was built for. Hence, building the difference is a good
approach.
Another approach is to execute and measure code sequences of several scenarios
in order to identify interesting points by checking the variances of each point. By
this means, points that vary significantly more than due to noise can be identified
similarly to the difference approach.
When following both methods for two instructions with randomly chosen inputs
for every measurement we get results as presented in Fig. 5.1. The upper plot
shows the average traces for the two scenarios. The corresponding difference is
illustrated in the center plot. As can be seen, a significant difference occurs at
10 µs. Thus, points at this position are useful to determine which instruction was
executed. Additionally, the lower plot shows the variance for both instructions and
was calculated from eighty measurements (forty for each instruction). Obviously,
this plot is qualitatively equal to the difference trace but has the advantage that
5.1 Selecting Points
53
examplary trace of a random instruction with random operands
0.16
0.14
voltage [V]
0.12
0.1
0.08
0.06
0.04
0.02
10
12
14
16
18
14
16
18
time [µs]
−4
x 10
variance
5
4
3
2
1
0
0
10
12
time [µs]
Template Creation
Therefore, the most important points of a trace are most likely the maxima of Q1
to Q4. Another indication for this is the way the curve bottoms out. It seems to be
defined by the height of the peaks as can be seen in Fig. 4.6. In any case, this
may
be a promising approach. In other approaches, the maxima plus a certain amount
of subsequent points or down-sampled traces can be used. The tests presented in
Section 5.4 have to prove which approach works best.
5.2 Partitioning
As we know from Section 2.1.2, the measured power consumption consists of the
intrinsic power consumption Psig and noise. However, in terms of instruction
recognition, Psig contains parasitic components which complicate the recognition
process
as we know from the last chapter. For instance, the power consumption of an
instruction depends on the processed data. Hence, this component can be considered
as noise. In general, the part of the power consumption which does not help
to retrieve the wanted information is referred to as switching noise Psw [MOP07].
Consequently, Psig can be divided as follows:
Psig = Pop + Psw
(5.2)
(5.3)
When creating templates, the regular approach is to simulate the electronic noise
component of the power consumption by the covariance matrix. The question is
whether the switching noise or its components can additionally be modeled by the
covariance matrix together with the electronic noise. This can not be taken for
granted, since templates model noise as a Multivariate-Gaussian distribution and
the switching noise is not necessarily in line with this model. As a result, the
distribution of the components of Psw has to be estimated in order to answer the
question.
5.2 Partitioning
55
0.35
binomial distribution
approximation with norm. distribution
0.3
probability
0.25
0.2
0.15
0.1
0.05
0
−1
4
value
Template Creation
57
which part of Psig is modeled as noise. Depending on the test case, this results in
one
or more templates for one instruction. This procedure is repeated for a set of four
instructions, namely NOP, CLRW, XORLW and ANDLW. In this way, we have two
instructions that do not work by means of operands provided in the opcode. More
precisely, NOP naturally contains the lowest amount of switching noise. Solely the
fetching process and the contents of W influence the power consumption. CLRW
deletes the contents of W and is therefore more data dependent. With ANDLW and
XORLW, we have two instructions processing two operands. Hence, the switching
noise is significantly higher. Further, the instruction identifier of XORLW and
ANDLW have the same Hamming weight so that we are able to determine whether this
affects the success of instruction identification or not. Note, that in the
following
tests no file-register instructions are included. However, these work similarly to
the
literal instruction so that the results of ANDLW and XORLW can be applied to
file-register operations.
To check, whether the templates are practical to recognize instructions, five
sequences containing fifty instructions each are randomly selected consisting of
only
those instructions for which templates have been created. The operands that occur
in the sequences are also chosen randomly. Subsequently, these sequences are
executed and measured for one hundred times in a random order to exclude time
dependent interferences. This means, when the amount of noise, for some reason, is
significantly higher for a certain time during a measuring session, this will
affect all
test sequences uniformly. Which sequence was executed at a given time was stored
together with the measurement file so that the executed instructions for a certain
measurement are known later on.
Finally, the templates are applied to the instructions of the one hundred test
sequences one by one in order to determine whether the winning template corresponds
to the executed instruction or not. Thereby, the percentage of correctly and
incorrectly recognized instructions can be calculated indicating the quality of the
constructed templates.
Test Description 5.3.1 (General Test Procedure for Templates) To be able
to evaluate the quality of templates the following test is performed:
1. measure traces with respect to the followed approach in order to create
templates for four different instructions
2. measure one out of five test sequences, each containing fifty randomly selected
instructions with random operands, for one hundred times
3. apply templates to the measured instructions of all test sequences and determine
the template with the highest probability
4. calculate the percentage of correctly and incorrectly determined instructions
58
Template Creation
5.4 Tests
In this section various approaches for creating templates for the PIC will be
tested
and compared in order to reveal the best method.
Recognized as
NOP CLRW ANDLW XORLW
0
0
0
0
11
0
0
0
207
422
520
210
Sum
Percentage
(correct)
1020
782
826
1002
1238
1204
1346
1212
0
0
38.6
82.6
Overall
5000
30.4
Table 5.1: Results for applying templates, created by modeling Psw completely, to
random code sequences
The results of the test are shown in Tab. 5.1. The first column contains the
instructions for which templates have been created and which were thus executed in
the test sequences. The next four columns show the number of how many times an
instruction was recognized as one of the others. Further, the next column presents
5.4 Tests
59
the sum of executed instructions and finally the last column indicates the
percentage of correctly identified instructions for each instruction independently
as well as
accumulated for all instructions.
Obviously, the templates for the literal instruction ANDLW and XORLW cover
the other templates so that the instruction recognition shows a clear tendency
towards these. Actually, NOP and CLRW were not recognized correctly in a single
case. Moreover, the accumulated result shows that in total only thirty percent of
the instructions were correctly identified. Since the test sequences only contain
four
instructions, namely the ones templates were created for, the chance to guess an
instruction is twenty-five percent. Hence, the results show that our templates are
not significantly better than guessing an instruction and therefore unfeasible.
Thus, the test did not show any indication that modeling all components of Psw
in one template is practical.
Template Creation
Instruction
NOP
CLRW
ANDLW
XORLW
Recognized as
NOP CLRW ANDLW XORLW
503
50
0
0
19
547
15
9
39
56
408
366
Sum
Percentage
(correct)
5
34
146
303
566
687
569
678
88.8
79.2
71.7
44.6
Overall
2500
70.4
Table 5.2: Results for applying templates, created with a constant subsequent
instruction, to random code sequences
When comparing the success rates, it can further be seen that the success rate
decreases with an increasing amount of switching noise. As already mentioned, for
a NOP instruction Psw only depends on the contents of W and the instruction being
fetched. Since this instruction is fixed during these tests, the amount of
switching
noise is minimal. For CLRW, Psw is increased due to clearing the contents of W.
Considering the last two instructions, Pdata becomes a dominant factor as can be
concluded from the discussed power model presented in Definition 4.10.1. Thus, the
amount of switching noise is maximized.
Consequently, reducing the switching noise is a reasonable method to increase
success rates significantly.
61
per instruction due to nine different Hamming weights. These templates can then
again be applied to the test sequences. As in the previous tests, the template with
the lowest absolute value of the logarithmic probability density function is the
most
likely one.
Instruction
NOP
NOP
CLRW
ANDLW
XORLW
471
22
0
0
Recognized as
CLRW ANDLW XORLW
12
449
1
2
59
57
467
176
Sum
Percentage
(correct)
24
159
101
500
566
687
569
678
83.2
65.3
82.1
73.7
Overall
2500
75.5
Table 5.3: Results for applying templates, created with a constant subsequent
instruction and sorted by the Hamming weight of W, to random code sequences
The results of the test are shown in Tab. 5.3. With the new approach even better
results can be achieved. The overall success rate is increased by approximately
five
percent compared to the previous test in which templates were not partitioned.
Furthermore, it can be seen that the success rates for ANDLW and XORLW are
increased significantly. Particularly, the chance to identify an XORLW instruction
was increased by 19 percent, from 44 to 73.1 percent, whereas the chance for ANDLW
was only increased by approximately eleven percent. This is mainly due to the fact
that XORLW is significantly less mistaken for an ANDLW. In contrast to this gain,
the success rate of CLRW and NOP is decreased with the new approach. Above
all, CLRW is more often confused with ANDLW reducing the success rate by 14
percent. The percentage of correctly identified instructions for NOP is only
slightly
decreased by approximately five percent.
As a result, the fractions of correctly identified instructions are rearranged when
using partitioned templates. However, the overall success rate is increased so that
this approach is an effective way to to achieve better recognition results.
Template Creation
Therefore, the maximum number of points has to be revealed that can be handled
by the analysis program in terms of the logarithmic probability density function.
For this, some tests based on the measurements, recorded for the previous tests,
were performed. It turned out that the most critical part of equation (2.26) occurs
within the natural logarithm
ln(2π n · det(C))
(5.5)
because the value of 2π n · det(C) causes overflows for large n of around 180 and
larger. Note, that this value is only a rough approximation since equation (5.5)
also
depends on the determinant of C. Hence, even values smaller than 180 could result
in overflows.
However, if acting on the assumption of 180 points, the following tests can be
performed to reveal the best selection of points.
At first, an equal amount of points from Q1 to Q4 can be selected resulting in
a range of 1 to 45 points per clock cycle. These can either be symmetrically or
asymmetrically distributed around the maxima of the peaks. Hence, different point
ranges will be tested for both symmetrical and asymmetrical approach. However,
the step size for the asymmetrical approach will be shorter and in the direction of
the consecutive peaks, since the way the peaks bottom out is more likely to contain
information than the rising edge of the peak.
Furthermore, the entire trace of an instruction can be down-sampled from eight
thousand points to one hundred and eighty points or less in order to get a template
that is equally based on the power consumption of the complete instruction. More
precisely, down-sampling to x points can be achieved by selecting point 1, 1 · b
8000
c,
x
2 · b 8000
c
and
so
forth.
With
this
approach,
we
may
be
able
to
find
out
whether
the
x
primary approach of selecting points from the peaks is reasonable.
As in the previous tests, the quality of the templates will be given by the overall
success rate which arises from applying the templates to the test sequences. Herein
the templates are built according to Section 5.4.2, i.e. the subsequent instruction
is
fixed but not sorted by the Hamming weight of W.
The results are shown in Tab. 5.4. In the first column, the selection type is
presented which is either symmetrical, asymmetrical or down-sampled. The next
column specifies the variations of selected points for each selection type.
Thereby,
the first number correspond to the number of points taken from the left side of the
maximum and the second number to the points taken from the right side of the
maximum. In case of the down-sampled template only one number is provided as
the overall number of points used. Finally, columns three to seven illustrate the
percentage of correctly identified instruction for each instruction independently
as
well as the accumulated results.
As can be seen, the overall results are quite similar and distributed in the range
of 70 to 77 percent. The first row contains the reference values for this test,
since
5.4 Tests
63
in this case only the maxima of the peaks are selected. Obviously, increasing the
number of points selected next to the maximum increases the success rate slightly.
However, for values larger than eleven in the case of a symmetrical points
selection,
and thirty-one in case of an asymmetrical point selection, this gain is already
lost.
Selection Type
symmetrical
0/0
2/2
5/5
10/10
16/16
22/22
88.8
90.1
89.9
90.3
89.2
87.6
79.2
82.7
81.2
78.1
71.4
63.0
71.7
78.2
77.9
79.2
79.3
78.7
44.6
59.4
60.3
62.2
62.5
62.4
70.4
77.0
76.8
76.8
74.8
72.0
asymmetrical
0/5
0/15
0/30
0/44
90.1
90.0
88.3
86.4
83.0
81.2
75.1
66.2
78.2
78.0
78.6
76.6
57.2
61.6
61.3
58.0
76.5
76.7
75.2
71.0
down-sampled
80
180
86.3
85.7
79.1
71.0
75.3
73.1
48.5
62.7
72.3
73.1
The maximum success rate is reached for a symmetrical selection of only two
points next to the maximum of a peak, which results in a mean vector of twenty
points per template. Similar results are shown by the asymmetrical point selection
in which the best overall success rate reaches 76.7 percent.
This increase is probably caused by reducing noise at the maxima by means of
surrounding points. However, when the number of points exceeds a certain level this
gain is lost due to numerical problems.
When comparing the single success rates for each instruction, we see that the
values for NOP and ANDLW are rather constant, only varying in a range of four
percent, whereas CLRW and XORLW show deviations of up to twenty percent.
Hence, CLRW and XORLW are more significantly affected by the selected points.
Interestingly, even down-sampling yields good results of 72.3 and 73.1 percent
respectively. The reason for this is not obvious. One assumption is that the shape
of the trace is reproduced, included the maxima and their relationships, so that
even
if the maximum values are not reached completely due to the fact that the absolute
maxima are not sampled, it may be sufficient to identify an instruction.
64
Template Creation
However, selecting points around the maxima is more practical. Not only because
the success rates are higher but furthermore because the complexity for creating
and applying templates is reduced. For instance, in the best approach of Tab. 5.4,
only 20 points are sufficient to create a template. Hence, the covariance matrix is
reduced by a factor of 16 compared to the case of down-sampling to 80 points and
by a factor of 81 when down-sampling to 180 points.
To test whether the approach of the previous section is equally affected by a
different point selection, the most promising method of selecting the maxima
plus/minus
2 points was applied for creating templates which are additionally sorted by the
Hamming weight of W. In this case, the success rate was increased from 75.7 to 82.5
percent, which is approximately the same gain as for the other approach.
In conclusion, the point selection approach which was initially used turned out
to be a good choice. However, it can be improved by adding some points around
the maxima. Above all, selecting five point per clock cycle was the most successful
technique.
65
Points per Peak
Percentage (correct)
(left/right)
NOP CLRW ANDLW XORLW Overall
symmetrical
0/0
2/2
5/5
10/10
16/16
22/22
94.5
95.0
95.6
95.4
95.4
95.8
81.0
85.0
85.6
86.5
86.5
85.9
56.0
55.2
55.5
55.9
55.9
56.4
22.4
23.2
23.6
23.6
23.6
23.3
62.5
63.7
64.2
64.4
64.5
64.7
asymmetrical
0/5
0/15
0/30
0/44
95.2
95.4
95.6
96.1
86.3
86.2
84.9
85.3
55.5
55.5
54.7
55.0
22.8
23.5
22.9
23.3
64.1
64.3
63.6
64.0
down-sampled
80
180
89.4
91.5
73.4
81.0
54.0
48.9
19.9
19.8
58.0
59.4
Table 5.5: Results for various points selection methods and the use of reduced
templates
Template Creation
to avoid the high variances for ANDLW and XORLW. In a next step this selection
is inverted to reveal whether omitting peaks with high variances is a good
approach.
The discussed methods will be tested by creating templates according to the
previous section, i.e. five points are selected from each peak and the templates
are
further partitioned by the Hamming weight of W. This was the best approach so
far with a success rate of 82.5 percent. The results of the tests can therefore be
compared to this value in order to measure the quality of the different approaches.
Selected Clock Cycles
Q1 to Q4
Q2 to Q4
Q2 and Q4
Q1 and Q3
Percentage (correct)
NOP CLRW ANDLW XORLW Overall
90.3
95.2
97.5
23.9
82.3
75.1
60.7
30.9
86.1
73.1
71.2
73.0
73.3
68.1
68.3
64.0
82.5
77.3
73.5
47.9
Tab. 5.6 presents the results of the performed tests. As can be seen, the success
rates vary in a range from 47.9 to 82.5 percent. The highest percentage of
correctly recognized instructions occurs for the primal case, i.e. selecting Q1 to
Q4.
Furthermore, we see that Q2 and Q4 contain more information for instruction
recognition than the others since the success rate for these cycles is
significantly higher
than for Q1 and Q3 in which, above all, NOP and CLRW are seldom recognized
correctly. Nonetheless, omitting Q1 and Q3 causes a decrease in success rate for
CLRW, ANDLW and XORLW respectively.
Hence, we can conclude that omitting the points of one or more clock cycles causes
a loss of information and thus results in an decreased overall success rate.
As a consequence, adding peaks of the previous instruction may be a promising
approach since additional information about the executed instruction, i.e. its
fetching process, would then be usable. However, tests in this respect resulted in
success
rates similar to the first approach of modeling Psw completely. This can be
explained
by means of the established power model from which follows that PQ1 to PQ4 are
mainly influences by the execution of an instruction. Hence, the fraction of Pf
etch is
comparatively small.
67
success rate. These will now be applied and extended to the general case in which
the subsequent instruction is not fixed.
For this, the templates will be partitioned by the Hamming weight of W as in the
previous sections but further by the Hamming weight of the subsequent operation
and the Hamming weight of the opcode of the executed instruction. By doing this,
we get more accurate templates, which has turned out to be a probable approach
throughout the last sections. Further, points of all clock cycles, i.e. Q1 to Q4,
will
be extracted by selecting five points per cycle. Hence, one template consists of a
mean vector of twenty points and a covariance matrix of four hundred points.
Due to the partitioning process, the number of templates for one instruction
increases and with it the number of measurements which have to be recorded in
order to get a sufficient amount for each partition. For instance, by partitioning
the
templates as described we will get
nummeasurements = 9 · 9 · 15 · 50 = 60750
(5.6)
measurements to create all 1215 templates for a literal operation if each template
is created from only fifty measurements. This is because there are nine different
Hamming weights for W, nine different Hamming weights the opcode can have due
to the literal, and fifteen different Hamming weights of the subsequent
instruction.
Note, that thus the number of measurements is not fix. A NOP for example has a
constant opcode so that in this case only 6750 measurements are required to create
all templates.
To test the templates, these were again applied to random test sequences using
random operands, i.e. the logarithmic probability function was calculated for each
template. The template resulting in the lowest value corresponded to the recognized
instruction. Unfortunately, the results showed that almost all instructions were
mistaken for XORLW instructions. Hence, the success rate was approximately 25
percent, similar to the approach discussed in Section 5.4.1.
To exclude that this is caused by the extended partitioning approach, the test
was repeated with only sorting templates according to the Hamming weight of W
and by the Hamming weight of the subsequent instruction which has shown high
success rates for a fixed subsequent instruction. However, this also leads to the
same
results. Obviously, the set of templates for XORLW covers the other instructions
which was not the case for a fixed subsequent instruction.
This can be explained by means of the power model presented in Definition 4.10.1.
If we omit the common components Pconst,Qx and Pel.noise , which are independent
of the performed instruction, we see that the power consumption highly depends
on several Hamming distances and further on the Hamming weight of the current
and subsequent instruction. Concerning the Hamming weight we can expect that
68
Template Creation
these values are not appropriate indicators for an instruction since the Hamming
weight can vary significantly for instructions that use literal of file-register
addresses.
Furthermore, PQ2 and PQ4 lead to the conclusion that fetching an XORLW and
executing a NOP for example causes the same power consumption when neglecting
the effects of IR in Q4. Furthermore, Q1 will show high variances since vbus is
predefined randomly for every measurement and was unaccounted for regarding
template partitioning to keep the number of measurements practicable. In addition,
Q3 highly depends on the vresult calculated by the ALU which may be the reason
why ANDLW is mistaken for an XORLW. This is a result of the assumption that
the variance at Q3 is higher due to the fact that the probability to flip a bit
during
an XOR operation is 1/2 in contrast to 1/4 in case of ANDLW.
In consequence, the instruction recognition for the general case highly depends
on the power consumption influences not explicitly revealed, i.e. Pext.Qx .
Obviously,
this component does not contain sufficient information so that the chance for one
of the 1215 templates of XORLW or ANDLW being mistaken for one of the 135
templates for NOP and CLRW is remarkably high.
This leads to the conclusion that a general instruction recognition by means of
templates is not feasible for the device under attack, at least, if only single
measurements are taken into account.
69
Template Creation
6 Path Detection
In the last section we created templates for the sake of instruction recognition.
Although the tests indicated that this does not work properly without additional
knowledge, templates may nevertheless be used to recognize the path a program
took at a time by utilizing a priori information, i.e. assuming the program code or
parts of it to be known.
To examine this, the first section discusses the basic idea about how path
recognition can be applied for the PIC microcontroller. Further, working conditions
and
application areas are presented. The subsequent section deals with the algorithm
that can be used for program path recognition. The functionality of the presented
algorithm is then proved by practical tests and the results are presented.
Path Detection
example analog inputs are not known. Or it can even be used to reveal the program
version currently running on the device.
To perform path recognition, the program code, as already mentioned, needs to be
known. With this source of information, possible paths can then be calculated from
a certain starting point. Moreover, the opcode for each instruction is known since
the literal and file-register addressed are included in the code. Solely the
content
of the registers are generally not known since they depend on the inputs of the
performed algorithm which can not necessarily assumed to be known. With this
information we are able to apply the appropriate templates for each possible path
to a given power consumption trace. The calculated values can then be summed up
for each path. When using the logarithmic probability density function, the overall
smallest value corresponds to the path with the highest probability. To understand
how this works, an example will now be presented.
¨
1
2
The assembler code given in Listing 6.1 contains two code sequences, namely
CODE1 and CODE2, each containing five instructions. Both alternate NOP and
ADDLW instructions but in a complementary order. When one of these code sequences
is executed and measured, the probability for CODE1 is calculated by applying the
template for NOP to the power consumption of the first instruction cycle,
the template of ADDLW to the power consumption of the second instruction cycle
and so forth. When using partitioned templates, only those templates matching the
basic conditions are applied. For instance, only the template for the NOP
instruction created under the premise that the subsequent sequence has a Hamming
weight
of seven would be used because seven is the Hamming weight of ADDLW 0x55h.
The results of the logarithmic probability density function are then summed up. By
doing this we get the chained probability to observe this sequence simply because
it is the sum of the single probabilities. This process is then repeated for CODE2.
As a consequence, the code sequence with the smallest value has the highest
probability. Again, if the appropriate template in average yields better results
than
an inappropriate templates a code sequence of sufficient length will be recognized
correctly.
An important factor for the success of this method is that correct templates have
to be applied to the correct instruction cycle of the power trace. This is not as
trivial as one might think. As can be seen in Listing 6.2, in this case, CODE1 and
CODE2 both consist of the same code sequence. Nonetheless, these sequences can
be distinguished because the respective part of the power consumption differs for
both sequences. The explanation for this is as follows:
¦
6.2 Algorithm
73
Listing 6.2: Example B for path detection
1 BTFSC 0 x40h , 1
2 CALL CODE1
3 CALL CODE2
4
5 CODE1 : NOP, ADDLW 0 x55 , NOP, ADDLW 0 x55 , NOP, RETURN
6 CODE2 : NOP, ADDLW 0 x55 , NOP, ADDLW 0 x55 , NOP, RETURN
§
The BTFSC is a conditional branch instruction which reads the first bit of
fileregister 0x40h and then executes CODE1 if this bit is set and CODE2 otherwise.
If
the condition is true, then BTFSC takes two instruction cycles because in this case
the next instruction is skipped which means that it is executed as a NOP as
explained
in Section 3.2. If the condition is not true, the next instruction is executed as
given.
Hence, if the conditional branch implies the execution of CODE1, this sequence
starts with the fifth instruction cycle and with the fourth otherwise. Therefore,
the
respective parts for the executed instruction have different positions in the power
consumption trace, which has to be kept in mind when applying the templates.
In conclusion, it is thus crucial to determine the correct part of the power
consumption for a certain instruction.
6.2 Algorithm
Due to the results of the previous section, an algorithm can be defined in order to
detect program paths from a side-channel. Note, that this algorithm does not depend
on the underlying hardware so that it can be applied to other microcontrollers as
well.
The preliminaries to run the algorithm are as follows. At first, a set of templates
has to be created for several instructions. For a better success rate these
templates
should be partitioned according to the main influences on the power consumption
which have to be analyzed in advance. As in the case of the PIC, these where
for example the Hamming weight of the subsequent instruction and the Hamming
weight of the instruction itself. Operand-dependent influences do not have to be
taken into account since these dependencies can not be concluded from the assembler
code. When templates for N instructions are created with M templates for each
instruction, then these can be stored to a N × M matrix T to function as an input
of the algorithm.
Secondly, the program code or at least the part for which the path detection is
supposed to be performed has to be known in order to calculate possible paths. This
does not have to be necessarily the assembler code. Even the binary representation
¦
74
Path Detection
of this code can be used. However, in this case the binary code has to be
translated
back to the assembler code as an intermediate step.
If these two preliminaries are given, the algorithm described as in Algorithm 6.2
can be performed.
Algorithm 6.2.1 (Path Detection) Given a set of templates T for n instructions
and m basic conditions, a recorded trace R = (r1 , ..., rk ) of k instruction
cycles, and
an assembler code A = (a1 , ..., al ) of l instructions, the most-likely path which
was
taken at the time of recording R, can be detected as follows:
1. Calculate all hypothetical paths up to a length of k instruction cycles from A.
2. Select only those instructions from the hypothetical paths for which templates
have been created and find the best matching template mi,j of T for each
instruction j and path i, with respect to the program code.
3. Determine the corresponding instruction cycle ri,j for each instruction cycle j
of each path i. If an instruction takes more than one instruction cycle, select
the first one and handle the second instruction cycle as a NOP.
4. For each of x hypothetical paths this leads to a 2 × d matrix Hi , i =
1, . . . , x,
which contains the best matching templates for the respective path together with
the expected positions in R:
µ
¶
ri,1 ri,2 . . . ri,d
Hi =
(6.1)
mi,1 mi,2 . . . mi,d
5. For each hypothetical path Hi calculate its overall logarithmic probability:
d
1X
p(Hi ) =
| ln (p(ri,n , mi,n ))|
n n=1
(6.2)
(6.3)
75
From this it follows that not all templates have to be known in order to apply
the algorithm, i.e. we can start with a few templates and adjust the model step by
step by adding more templates. As a result, the algorithm is quite flexible in this
respect.
Moreover, it is useful to predefine the best matching templates on basis of the
program code A for complexity reasons. By doing this, templates are determined
only once and not later on in a loop for each hypothetical path.
In step 3, for each instruction of each hypothetical path the corresponding part
of the recorded trace is determined. This means that it is calculated at which
instruction cycle one instruction should occur for a given path. In case of a two
cycle instruction, the first instruction cycle is selected. The second one can be
handled as a NOP since this is the instruction that is executed during the second
instruction cycle.
This information together with the templates of step two is written to a matrix Hi
for each hypothetical path i. Thereby, Hi contains the positions ri,j of in the
power
consumption trace together with the most accurate template mi,j for each
instruction
j of this. Thus, all information needed to calculate the overall probability for
each
hypothetical path is given and can be computed in the next step.
The overall probability to observe path Hi is calculated in step 5 by computing the
arithmetic mean over all single-probabilities as illustrated in equation (6.2). In
this
equation | ln (p(ri,n , mi,n ))| represents the logarithmic probability density
function
for the Multivariate-Gaussian model as defined in (2.26). The mean is computed
instead of the sum because the length d may differ for the various paths since the
number of relevant instructions included in a path depends on the code that is
executed by it. Another approach is to adjust the hypothetical paths to an even
amount of instructions. Actually, this can become complicated when paths equal in
most of the instruction and corresponding positions so that the arithmetic mean is
the better approach.
Please note that in case that templates are partitioned according to the used
operands or calculated results, all templates of this kind have to be selected as
best matching templates since the operand may not be identifiable by the program
code. At least if no information about the input of the program is known. Hence,
in this case all templates have to be applied to the logarithmic probability
density
function in order to reveal the best template which is then used to compute the
overall probability by means of equation (6.2).
As a result of the algorithm, the most likely path is the one with the smallest
value of p(Hi ). Remember, that the logarithmic density function is used, so that
the smallest absolute value indicates the highest probability. Therefore, this also
holds for the arithmetic mean.
76
Path Detection
6.2.1 Complexity
Clearly, the complexity highly depends on the number of hypothetical paths since
the number of iterations for all steps of the presented algorithm increase with
this
number.
As we know, different paths only occur with conditional branches which take one
clock cycle to perform if the condition is true and two otherwise. For the worst
case,
in which the code solely consists of conditional branch instructions, the maximum
number of paths hmax for a sequence of k instructions is then given by the
following
approximation:
hmax (k) ≈ 2k−1
(6.4)
As a result, the complexity of the algorithm increases quadratically with the
number of conditional branches. However, since the program code is known the
complexity can be estimated before the algorithm is applied so that the considered
code can be limited in order to reduce the complexity. Furthermore, the algorithm
can be improved as discussed in Section 6.2.2 to counter the mentioned problems.
Please note that paths may recombine after having spread apart for some time,
for example to perform different subroutines. Thus, the question may occur why
the algorithm does not take this recombination into account to handle these paths
independently to reduce the complexity. Unfortunately this is not feasible. Even
though paths may recombine, these will be performed at different points in time so
that they have to be handled independently in order to calculate the probabilities.
Furthermore, the complexity depends on the number of templates applied to the
trace. If the set of templates is large, comparatively few instructions will be
omitted
in step two and as a consequence, more computations have to be calculated in step
five in which the singe probabilities are computed. Consequently, the storage and
CPU requirements are significantly increased. Nonetheless, when the template set
is kept small to only the most important instruction, the complexity is
significantly
reduced. On the other hand, this may lead to a loss of accuracy since the overall
probability is calculated by means of fewer templates. Nonetheless, the algorithm
offers a sufficient amount of flexibility to find a good trade-off between
performance
and accuracy.
6.2.2 Improvements
For the case that the recorded power consumption traces contain a huge amount of
instruction cycles, i.e. k is large, the algorithm may be improved in the following
way. Instead of calculating all hypothetical paths up to k instructions, a
threshold
th is defined and paths are only calculated up to this threshold. Then the
algorithm
is performed. At the end, the winning path w is selected as the survivor path and
6.3 Test
77
the algorithm repeated from the state where w ended. By doing this, the complexity
is reduced, since not all paths have to be managed simultaneously and the number
of hypothetical paths is reduced with the threshold. For instance, if the power
consumption trace contains 100 instruction cycles, th can be chosen as 25. Thus,
instead of calculating the paths for a length of k cycles, the algorithm is
repeated
four times for a length of th cycles. Acting on the assumption of a worst-case
scenario only 4 times 224 hypothetical paths have to be managed. In contrast to 299
this yields a reduced complexity of 273 .
6.3 Test
In this section the functionality of the algorithm is proved by a test which
further
functions as a real case example.
The Matlab code that implements the algorithm for the PIC can be found on
the enclosed DVD as well as all other scripts used for this thesis. This includes
measurement scripts as well as template creation scripts, assembler code and so
forth. An overview about the contents of the DVD can be found in the appendix.
The code used in this test is given by Listing 6.3. Note, that this code does not
perform any reasonable function. It is just a code created to test the algorithm.
The problem with real code is that it gets significantly more difficult to
determine
whether the output of the algorithm is correct or not than in the case of dedicated
test code.
For instance, branches depend on bits that may be modified during a program
execution. Hence, to know if a branch was taken or not, the entire program has to
be simulated with all its calculations up to this points. Additionally, real code
often
contains CALL and RETURN instructions. Keeping track of the branch addresses
can become complicated and is not supported by the analysis program yet.
¨
1 START
2
MOVF W_BUF, 0
3
MOVWF FOP_HW1
4
BTFSC FOP_HW1, 0
5
GOTO ALPHA
6
CLRW
7
NOP
8
CLRW
9
CLRW
10
NOP
11 ALPHA
78
Path Detection
12
MOVF W_BUF, 0
13
MOVWF FOP_HW5
14
MOVWF FOP_HW4
15
MOVLW LOP_HW6
16
CLRW
17
NOP
18
CLRW
19
NOP
20
CLRW
21
BTFSC FOP_HW5, 1
22
GOTO BRAVO
23
CLRW
24
CLRW
25
NOP
26
CLRW
27 BRAVO
28
MOVF W_BUF, 0
29
MOVLW LOP_HW8
30
XORLW LOP_HW2
31
ANDLW LOP_HW5
32
CLRW
33
CLRW
34
CLRW
35
MOVLW LOP_HW6
§
As a first step the hypothetical paths for this code have to be calculated up to
the
number of instruction cycles the recorded trace contains, which in this case is
equal
to thirty instruction cycles. This results in four paths, as illustrated in Fig.
6.1,
since two conditional branches occur. In the first path the jump to ALPHA and
BRAVO is taken, in the second path the jump to ALPHA is taken but not the
one to BRAVO and so forth. Note, that the length for the paths differ since less
instructions are executed by the program when parts of the program are skipped
due to branches.
Now the second step can be performed in which all instructions without a template
are discarded. In this test we assume that only templates for the CLRW instruction
have been created yet. Hence, all other instructions have to be omitted. Further,
the
best matching templates have to be revealed for each instruction. For the CLRW
operation templates can be created for each Hamming weight of the subsequent
instruction. Consequently, this Hamming weight has to be calculated in order to
associate the templates to the instructions. For instance, a CLRW that occurs
before
a MOVLW LOP_HW6 would be associated to the template created for CLRW with
¦
6.3 Test
79
Positions
1
2
3
4
These positions together with the matching templates are written to the matrix
Hi for each hypothetical path.
Now, the calculation of the overall probability for each hypothetical path can
be calculated to determine the most-likely path for a certain recorded trace. For
this test thirty measurements were recorded which randomly executed either the
first, second or third path. The fourth path was never executed to see whether
measurements would be recognized as this sequence anyhow, due to the control
code enframing the test code.
First, the second path was executed and measured. In this case the overall
logarithmic probabilities that were calculated are given as shown in Tab. 6.2. As
can be
seen, the smallest value occurs for the second path and is significantly smaller
than
the values for the other paths. Thus, in this case the path was recognized
correctly.
START:
MOVF W_BUF, 0
MOVWF FOP_HW1
BTFSC FOP_HW1, 0
CLRW
NOP
CLRW
CLRW
NOP
ALPHA:
MOVF W_BUF, 0
MOVWF FOP_HW5
MOVWF FOP_HW4
MOVWF LOP_HW6
CLRW
NOP
CLRW
NOP
CLRW
BTFSC FOP_HW5, 1
CLRW
CLRW
NOP
CLRW
BRAVO:
MOVF W_BUF, 0
MOVLW LOP_HW8
XORLW LOP_HW2
ANDLW LOP_HW5
CLRW
CLRW
CLRW
MOVLW LOP_HW6
Path #1
Path #2
Path #3
Path #4
Path Detection
i=
Hypothetical Path
2
3
4
The results for all thirty recorded measurements is given by Tab. 6.3. As can be
seen the path detection worked one hundred percent correctly.
Measured Trace
10
Executed Path
1
2
Detected Path
11 12 13 14
15 16
17
18 19
20
Executed Path
Detected Path
3
21 22 23 24
25 26
27
28 29
30
Executed Path
Detected Path
2
1
Measured Trace
Measured Trace
Table 6.3: Executed and detected program paths for thirty traces
This test was repeated with NOP templates and a combination of NOP and
CLRW templates. Furthermore different codes were used. In any case the algorithm
yield good results. However, when adding templates for ANDLW and XORLW the
recognition success rate was decreased. However, this may be due to inaccurate
templates. Note, that these instructions are significantly more influenced by
switching noise than NOP and CLRW. However, in terms of path recognition NOP may
be an even more important instruction than all other instructions because every
conditional test which is not true results in the execution of a NOP instruction.
In conclusion, the provided algorithm has been proved to be feasible. Hence,
our assumption that a priori information can be utilized to improve the recognition
process by means of templates and single measurements has been confirmed.
Consequently, additional use cases like the reverse engineering of secret parts of
an
algorithm, as discussed in Section 5.5, are additionally feasible.
7 Summary and Conclusion
In this thesis it was shown that Simple Power Analysis is a great tool for the
characterization of power consumption properties, concerning the instruction
processing
of a device. By means of a theoretical examination based on the data sheet, one
is able to draw up assumptions of hypothetical power consumption influences and
can verify them individually by visually analyzing traces measured while executing
dedicated test codes. Further, we have seen that this is an iterative process. When
the results are not in line with the assumption, new assumptions have to be drawn
up and, again, verified later on. Additionally, even if no data sheet is available
for a theoretical observation, we can expect that experience in characterizing
other
devices can help to understand how an unknown device operates.
As a result of this examination process, a power model was defined which can be
used to explain and estimate the power consumption of the device. With this model,
it may be for example possible to improve DPA attacks on the PIC significantly,
since the power consumption should show a higher correlation compared to the
Hamming-weight and Hamming-distance model.
Furthermore, it was shown that the side-channel based instruction recognition on
the basis of templates and single measurements highly depends on the presumed
knowledge of the observer. If no a priori information is available, instruction
recognition is difficult due to the various influences on the power consumption.
These
are predominantly effects caused by fetching the next instruction and the data
dependencies. Moreover, it was shown that modeling switching noise by use of a
covariance matrix is not practicable. Instead, it is useful to create several
templates
for one instruction each covering different influences. At this point, the
intensive
characterization or more precisely the defined power model becomes useful in order
to partition these influences. Nonetheless, the performed tests have shown no
sufficient evidence for the possibility of effective and adequate instruction
recognition by
means of templates and single measurements for the PIC16F687 when no a priori
information is available. However, the tests indicated that the use of additional
information can improve the recognition process.
Therefore, other possible applications, assuming a higher a priori knowledge about
the executed code were introduced and shown to be feasible. As an example, path
recognition was elaborated. In this approach, the basic idea is to determine the
positions in time when instructions should occur for every hypothetical path in
82
Future Work
far but is possible due to different wire lengths causing different load
capacitances
and thus different power consumption properties. Moreover, the power model can be
used to mount a DPA attack in order to show if the formulated model yields better
results than a DPA attack based on the simplistic Hamming-weight and
Hammingdistance model.
Some other applications have been additionally mentioned in this work that might
work on the basis of detailed a priori information like for example the reverse
engineering of secret parts of an algorithm. These can be performed and verified in
the
future.
At last, template creation can be applied to other microcontrollers. As we have
seen, the PIC showed a lot of dependencies in the power consumption. Above all,
the fetching process and the huge data dependency rendered a direct instruction
recognition from the side-channel to be unlikely. However, other microcontrollers
may be better suited for this purpose. The original 8051, for example, executes one
instruction within 12 clock cycles. Hence, three times the clock cycles can be
analyzed and thus there may be more instruction related information to create
adequate
templates. Yet, today’s 8051 are built differently to the original one. Actually,
most
of them have an improved design enabling them to execute one instruction within
one clock cycle by introducing pipelines which was shown to be disadvantageous for
our purpose. However, there are still some microcontrollers compatible to the 8051
that work on twelve and six clock cycles. The Atmel AT89C51ED2, for example,
may be a promising candidate for future tests.
A Test Setup
A.1 Block Diagram
The block diagram of the test setup used to perform the power consumption
measurements for this work is illustrated in Fig. A.1.
Test Board
USB
GPIB
Digital Oscilloscope
(Agilent Infiniium 54832 D)
Shunt Resistor
RS232
Microcontroller
(PIC16F687)
Ch1
Ch2
Ch3
PC
Ch4
Trigger
Pin
ISP Interface
COM1
Programmer
(ICD 2)
USB
Test Setup
Assembler Code
Documents
Measurements
Power Consumption
Characterization
Power Consumption
Characterization
Template Creation
Template Creation
Template Test
Template Test
Data Sheets
Results
Measure
Template Creation
Tests
Templates Path
Detection
Paper
Matlab Code
Thesis
Path Detection
Template Creation
Template Test
gfx
88
DVD Contents
The Assembler Code directory contains the assembler code which was executed on
the PIC for creating and testing templates. It is divided in four subdirectories.
In
Path Detection Test the assembler code used for testing the algorithm as described
in Chapter 6 is stored. Power Consumption Characterization contains the code with
which the various tests for analyzing the power consumption properties of the PIC
as described in Chapter 4. Furthermore Template Creation includes the code for the
template creation tests of Chapter 5 and Template Test the corresponding code for
testing the quality of those templates.
The Measurement directory is organized the same way as Assembler Code but
contains the most important measurement files. Due to space limitations not all
measurements can be included.
Similar to the last directories, the Matlab Code directory stores the Matlab
scripts
for creating and testing templates. Additionally, the script for measuring the
power
consumption are provided in Measure. The code which implements the path detection
algorithm for the PIC is stored in Path Detection.
In the Result directory, the results of the template creation tests, stored in
Template Creation Tests, and the templates applied in the algorithm test of Chapter
6,
contained in Templates Path Detection, are comprised.
The data sheets of the oscilloscope and the PIC as well as most of the paper
referred to in the bibiography are stored in two subdirectories in Documents.
Finally, the Latex code this elaboration is contained in Thesis. Furthermore, all
figures used are stored in the subdirectory gfx.
C Bibliography
[KJJ99]
[KJJ98]
[MOV01]
[MDS99]
[TK98]
[Bi06]
[KSWH98]
[JQ01]
[MOP07]
Stefan Mangard, Elisabeth Oswald, Thomas Popp: Power Analysis Attacks - Revealing
the Secret of Smart Cards, Springer Science+Business Media
[S99]
[BCO04]
Eric Brier, Christophe Clavier, Francis Olivier - Correlation Power
Analysis with a Leakage Model, Proceedings of CHES 2004
90
C Bibliography
[Pa01]
Lothar Papula - Mathematik für Ingeneure und Naturwissenschaftler (Band 3), Vieweg
Verlag
[CRR02]
[CJRR99]
[Ta90]
[CC90]
[ARR03]
[Ve06]
[No03]
[Cl04]
[QS02]
[MC07]
[MC97]
PICmicro Mid-Range MCU Family - Reference Manual, Microchip Technology Inc., 1997,
http://ww1.microchip.com/
downloads/en/devicedoc/33023a.pdf
[KL03]
Sung-Mo Kang, Yusuf Leblebici - CMOS Digital Integrated Circuits – Analysis and
Design, McGraw-Hill Higher Education
C Bibliography
91
[Ko96]
Paul C. Kocher:
Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems
,http://www.cryptography.com/resources/whitepapers/
TimingAttacks.pdf
[FP99]