Remote Attacks On FPGA
Remote Attacks On FPGA
Hardware
genehmigte
Dissertation
von
iii
Dennis Gnad
Höhenstraße 10
75210 Keltern
Hiermit erkläre ich an Eides statt, dass ich die von mir vorgelegte Arbeit selbstständig
verfasst habe, dass ich die verwendeten Quellen, Internet-Quellen und Hilfsmittel voll-
ständig angegeben haben und dass ich die Stellen der Arbeit – einschließlich Tabellen,
Karten und Abbildungen – die anderen Werken oder dem Internet im Wortlaut oder
dem Sinn nach entnommen sind, auf jeden Fall unter Angabe der Quelle als Entlehnung
kenntlich gemacht habe.
————————————
Karlsruhe, 15. Mai 2020
Dennis R. E. Gnad
v
Abstract
An increasing amount of computer systems are connected on a global scale and become
remotely accessible, increasing their requirements for security. One recent technology
that is increasingly used as a computing accelerator, both for embedded systems and
in the cloud, are Field-Programmable Gate Arrays (FPGAs). They are very flexible
devices that can be configured and programmed by software, to implement arbitrary
digital circuits. Like other integrated circuits, FPGAs are based on modern semicon-
ductor technologies that are affected by variations in the manufacturing process and
runtime conditions. It is already known that these variations impact the reliability of
a system, but their impact on security has not been widely explored.
This PhD thesis looks into a cross section of these topics: Remotely-accessible and
multi-user FPGAs, and the security threats dependent on physical variation in modern
semiconductor technologies. The first contribution in this thesis identifies transient
voltage fluctuations as one of the highest impacts on FPGA performance, and exper-
imentally analyzes their dependency on the workload the FPGA is executing. In the
remaining thesis, the security implications of these transient voltage fluctuations are
explored. Various attacks are proven possible that were previously thought to require
physical access to the chip, and the use of dedicated and expensive test and mea-
surement equipment. It shows that isolation countermeasures can be circumvented by
a malicious user with partial access to the FPGA, affecting other users in the same
FPGA, or complete system.
Using circuits to affect the FPGA on-chip voltage, active attacks are shown that can
cause faults in other parts of the system. By that, Denial-of-Service is possible, and
can also be escalated to extract secret key information from the system. Furthermore,
passive attacks are shown that indirectly measure the on-chip voltage fluctuations,
and show that these measurements are sufficient to extract secret key information
through power analysis side-channel attacks, which can also be escalated to other chips
connected to the same power supply as the FPGA. To prove comparable attacks are
not exclusive to FPGAs, small IoT devices are also shown vulnerable to attacks that
leverage on partial access to their power distribution network.
Overall, this thesis shows that fundamental physical variations in integrated circuits
can undermine the security of an entire system, even if the attacker is absent from
the device. For FPGAs in their current form, these problems need to be solved before
they can be securely used in multi-user systems, or with 3rd party access to them. In
publications that are not part of this thesis, some first countermeasures were already
explored.
vii
Zusammenfassung
Immer mehr Computersysteme sind weltweit miteinander verbunden und über das In-
ternet zugänglich, was auch die Sicherheitsanforderungen an diese erhöht. Eine neuere
Technologie, die zunehmend als Rechenbeschleuniger sowohl für eingebettete Systeme
als auch in der Cloud verwendet wird, sind Field-Programmable Gate Arrays (FPGAs).
Sie sind sehr flexible Mikrochips, die per Software konfiguriert und programmiert wer-
den können, um beliebige digitale Schaltungen zu implementieren. Wie auch andere
integrierte Schaltkreise basieren FPGAs auf modernen Halbleitertechnologien, die von
Fertigungstoleranzen und verschiedenen Laufzeitschwankungen betroffen sind. Es ist
bereits bekannt, dass diese Variationen die Zuverlässigkeit eines Systems beeinflussen,
aber ihre Auswirkungen auf die Sicherheit wurden nicht umfassend untersucht.
Diese Doktorarbeit befasst sich mit einem Querschnitt dieser Themen: Sicher-
heitsprobleme die dadurch entstehen wenn FPGAs von mehreren Benutzern benutzt
werden, oder über das Internet zugänglich sind, in Kombination mit physikalischen
Schwankungen in modernen Halbleitertechnologien. Der erste Beitrag in dieser Arbeit
identifiziert transiente Spannungsschwankungen als eine der stärksten Auswirkungen
auf die FPGA-Leistung und analysiert experimentell wie sich verschiedene Arbeitslas-
ten des FPGAs darauf auswirken. In der restlichen Arbeit werden dann die Auswirkun-
gen dieser Spannungsschwankungen auf die Sicherheit untersucht. Die Arbeit zeigt,
dass verschiedene Angriffe möglich sind, von denen früher angenommen wurde, dass
sie physischen Zugriff auf den Chip und die Verwendung spezieller und teurer Test-
und Messgeräte erfordern. Dies zeigt, dass bekannte Isolationsmaßnahmen innerhalb
FPGAs von böswilligen Benutzern umgangen werden können, um andere Benutzer im
selben FPGA oder sogar das gesamte System anzugreifen.
Unter Verwendung von Schaltkreisen zur Beeinflussung der Spannung innerhalb
eines FPGAs zeigt diese Arbeit aktive Angriffe, die Fehler (Faults) in anderen Teilen des
Systems verursachen können. Auf diese Weise sind Denial-of-Service Angriffe möglich,
als auch Fault-Angriffe um geheime Schlüsselinformationen aus dem System zu ex-
trahieren. Darüber hinaus werden passive Angriffe gezeigt, die indirekt die Spannungss-
chwankungen auf dem Chip messen. Diese Messungen reichen aus, um geheime Schlüs-
selinformationen durch Power Analysis Seitenkanalangriffe zu extrahieren. In einer
weiteren Eskalationsstufe können sich diese Angriffe auch auf andere Chips auswirken
die an dasselbe Netzteil angeschlossen sind wie der FPGA. Um zu beweisen, dass vergle-
ichbare Angriffe nicht nur innerhalb FPGAs möglich sind, wird gezeigt, dass auch kleine
IoT-Geräte anfällig für Angriffe sind welche die gemeinsame Spannungsversorgung in-
nerhalb eines Chips ausnutzen.
Insgesamt zeigt diese Arbeit, dass grundlegende physikalische Variationen in in-
tegrierten Schaltkreisen die Sicherheit eines gesamten Systems untergraben können,
selbst wenn der Angreifer keinen direkten Zugriff auf das Gerät hat. Für FPGAs in
ihrer aktuellen Form müssen diese Probleme zuerst gelöst werden, bevor man sie mit
ix
Kurzfassung
mehreren Benutzern oder mit Zugriff von Drittanbietern sicher verwenden kann. In
Veröffentlichungen die nicht Teil dieser Arbeit sind wurden bereits einige erste Gegen-
maßnahmen untersucht.
x
Contents
Acknowledgments iii
Abstract vii
Kurzfassung x
Contents xi
I. Preliminaries 1
1. Introduction 3
1.1. Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1. On-Chip Characterization of Voltage Fluctuations . . . . . . . . 4
1.1.2. Adversary Model for Multi-Tenant FPGAs . . . . . . . . . . . . 5
1.1.3. PDN-based Voltage Drop-based Fault Attacks . . . . . . . . . . 5
1.1.4. PDN-based Power Side-Channel Analysis Attacks . . . . . . . . 5
1.1.5. Main High-Level Contribution . . . . . . . . . . . . . . . . . . . 6
1.2. Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2. Background 7
2.1. Power Distribution Networks . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2. Switched-Mode Voltage Regulator Modules (VRM) . . . . . . . . . . . 8
2.3. Mixed-Signal Integrated Circuits . . . . . . . . . . . . . . . . . . . . . 9
2.4. Analog-to-Digital Converters (ADCs) . . . . . . . . . . . . . . . . . . . 10
2.5. Sensing and Protecting against On-Chip Voltage Fluctuations . . . . . 11
2.6. Traditional FPGA and Hardware Security Threats . . . . . . . . . . . . 12
2.7. Leakage Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.8. Correlation Power Analysis on AES . . . . . . . . . . . . . . . . . . . . 14
2.9. Differential Fault Analysis on the AES . . . . . . . . . . . . . . . . . . 15
II. Contributions 17
3. An Experimental Evaluation and Analysis of Transient Voltage Fluctua-
tions in FPGAs 19
3.1. Sensing Transient Delay Variation . . . . . . . . . . . . . . . . . . . . . 21
3.1.1. Time-to-Digital Converter Implementation . . . . . . . . . . . . 21
xi
Contents
xii
Contents
xiii
List of Own Publications
Conferences
[18] D. R. E. Gnad, F. Oboril, S. Kiamehr, M. B. Tahoori, "Analysis of Transient Volt-
age Fluctuations in FPGAs", International Conference on Field-Programmable
Technology (FPT), 2016, China. (Best Paper Candidate)
xv
List of Own Publications
xvi
Part I.
Preliminaries
1
1. Introduction
Integrated circuits (ICs) based on semiconductors are one of the driving technologies
of the 21st century. Next to computing accelerators based on Graphics Processing
Units (GPUs), performance and energy efficiency can be further increased when cus-
tom computing accelerators in the form of Field Programmable Gate Array (FPGA)
are added. Through reconfiguration, FPGAs are very flexible devices that can be
configured and programmed by software to implement arbitrary digital circuits, and
thus also accelerate most algorithms. FPGAs are becoming widespread in various em-
bedded systems, personal computers, and high-end servers [1–5]. Microsoft is already
using FPGAs in their datacenters since 2014 [6], and companies such as Amazon [7],
Alibaba [8], and Huawei [9] are renting FPGAs as computing accelerators to arbitrary
customers. Furthermore, we are heading towards a future where any device gets con-
nected to the Internet of Things, either as small embedded devices or in datacenters
for Cloud Computing, also putting them at higher security risk [10]. Both, the trend of
cloud computing, as well as Internet of Things devices, are applied more and more in
critical application domains. For instance, these consist of applications in healthcare,
transportation, or public infrastructure, just to name a few.
Typically, most systems have been attacked by flaws in the software implementation.
However, we also see an increase in attacks that leverage on properties or flaws in
the hardware, undermining any security that software can provide. These problems
are gaining more widespread attention, especially since the microarchitectural attacks
named Spectre and Meltdown were revealed in January 2018 [11, 12]. Thus, hardware
technologies, such as FPGAs, need to be more thoroughly analyzed in their security
aspects, before they get widely adopted as computing accelerators. A single FPGA can
be virtualized and shared among multiple users and tasks, opening new questions how
isolated these users are from each other. Potential security risks on the hardware level
of virtualized FPGAs have been mostly disregarded prior to this thesis.
In this PhD thesis, we thus focus on new types of security threats when FPGAs
are used as computing accelerators, or parts of any bigger system. Modern semicon-
ductor technologies suffer from higher physical variations in the manufacturing process
and changing runtime conditions. Such variations are typically handled with increased
safety margins, sufficient for typical operating conditions [13, 14]. One of the highest
and fastest variations are changes in the power supply voltage level within the chip,
which can change as rapidly as single clock cycles of the operating clock of the sys-
tem [15–17]. It is already well-known that these transient voltage fluctuations impact
the reliability of a system, but their effect on security has not been widely explored
prior to this thesis. Thus, we experimentally analyze on-chip voltage fluctuations and
then show that they can indeed lead to new security threats for FPGA-based systems.
To the best of our knowledge, these new threats are also the first to show that physical
attacks do not necessarily need local access to the device. Through various indirect
3
1. Introduction
ways, voltage fluctuations can be caused or measured, making it feasible for power
analysis or fault attacks to be performed remotely. While the main focus is on FPGAs,
the thesis also proves voltage fluctuations to be an issue in mixed-signal integrated
circuits, or inside a printed circuit board.
1.1. Contributions
The main contribution of this PhD thesis is showing that voltage-based side channel
and fault attacks are feasible only through software access to a system. By that, it also
becomes feasible remotely.
During the PhD thesis, methods were successfully developed to remotely extract
side-channel leakage or cause faults on the electrical level of various FPGA chips, and
some mixed-signal devices used in low-cost IoT applications. Attacks that previously
required dedicated test and measurement equipment can now be performed in software,
potentially remotely. By that, to the best of our knowledge, the thesis proves for the
first time that power analysis side-channel attacks or fault attacks can also be performed
remotely, and need to be considered in threat models that previously ignored them.
More specifically, this thesis shows power analysis attacks to be a potential security
risk in multi-user systems with FPGAs and highly-integrated mixed-signal IoT systems.
For instance, such systems are FPGAs virtualized as multi-tenant systems, or IoT
applications that do not limit access to analog sensors in the system.
The following subsections explains the individual contributions and clarifies what
has been done in respect to the co-authors of the respective works that have been
published already.
4
1.1. Contributions
5
1. Introduction
equipment to the device under attack. These contributions have been published in [22]
and [23], which have been integrated into this thesis as Section 5.1 and Section 5.2.
The experiments for these works have been jointly performed with Falk Schellenberg
of the Ruhr-Universität Bochum.
To prove the generality of on-chip voltage fluctuations as a security vulnerability,
other computing devices were also analyzed. Low cost devices used in Internet-of-
Things applications often integrate analog and digital components on a single chip. It
is shown that data recorded with the analog subsystem of the chip can contain secret
information from the digital subsystem in form of correlated noise, which is proven
with a Correlation Power Analysis (CPA) attack on AES. That contribution has been
published in [24], on which Section 5.3 is based on. The experiments for this work have
been jointly devised and performed with Jonas Krautter.
1.2. Outline
The remaining thesis is organized in the following chapters:
• Chapter 7 concludes the thesis and gives a perspective on further research direc-
tions.
6
2. Background
The sections in this chapter were partially overtaken from previously published works
included in this thesis, which were co-authored with (in no particular order): Falk
Schellenberg, Jonas Krautter, Fabian Oboril, Saman Kiamehr, Amir Moradi and
Mehdi B. Tahoori.
This chapter explains and presents relevant background knowledge that is required
to understand the remaining chapters of the thesis, mainly in three areas:
• Applications and related threat models in which FPGAs and Microcontrollers are
used
• Analysis of side-channel data that can be gathered from these lower implemen-
tation layers
The contents of this chapter have been taken from the respective publications in-
cluded inside this thesis [19–23, 25], with minor adjustments to fit in this format.
7
2. Background
stability [15, 16]. Compared with other variations that can affect semiconductor circuit
performance, such as manufacturing process or temperature variations, voltage fluctu-
ations have one of the highest influences on the required timing margin, and change
faster than other fluctuations (up to circuit speed).
The differences in the electrical current i required to cause a voltage drop is influ-
enced by both spatial and temporal circuit switching activity, which in turn is depen-
dent on workload characteristics and system behavior [15, 16, 30]. Subsequently, a
maliciously crafted circuit and respective switching activity could lead to corner cases
with insufficient voltage stability and jeopardize security. Rising system complexity can
lead to more corner cases, which are left from time-consuming and complex electrical-
level design validation, and elevate these risks.
If an overall path is affected by a voltage drop for a long enough time (min. threshold
time Tt ) and high enough amplitude (Vdrop ), timing faults occur. Such a voltage drop
is also known as a voltage emergency [31, 32]. In addition to timing faults, SRAM bit
cells can lose data when their static noise margin voltage is violated [33].
8
2.3. Mixed-Signal Integrated Circuits
one sampling period TS = (TON + TOF F ) leads to an average DC output voltage VO(DC)
of:
TON
VO(DC) = AV G[VSW ] = · VIN
TS
Figure 2.1.: Basic operating mode of a step-down (buck) switched-mode VRM, taken from
[34].
VP −P = = = F
CO CO 8 · CO
Please note that with higher CO leakage increases through capacitor equivalent series
resistance (ESR). Thus CO is typically chosen to fit an expected maximum Ipk load.
Another drawback is that if CO is not sufficient, the consequence of going beyond its
limits is higher, since the supplied current through IL is also limited, and first has to
charge CO before the circuit is supplied.
9
2. Background
this voltage drop might just be observable inside the chip, with voltage fluctuations
traveling through the chip-internal power supply mesh [16], or the common substrate
of the whole die [38].
An additional effect to voltage fluctuations is chip-internal cross-talk from electro-
magnetic (EM) coupling. Depending on the frequency of a signal pulsed on a wire, the
wire acts as a strong or weak radio transmitter, which also affects nearby wires, biasing
wire delays through inductive or capacitive coupling effects [39]. Digital circuits are
designed with a specific noise margin to prevent bit flips during normal operation, but
analog circuits can be biased through EM [40].
In summary, it is usually hard to guarantee that digital circuits have zero effect on
an adjacent analog circuit. Instead, a mixed-signal chip is designed such that the noise
margin is considered sufficient for the application requirements. We will show that
security requirements may impose much higher restrictions on the allowed noise levels,
at least regarding the noise caused by digital components.
10
2.5. Sensing and Protecting against On-Chip Voltage Fluctuations
during measurement of the LSBs. So overall, we assume chip-internal noise can affect
Σ∆-ADCs about the same as SAR ADCs.
11
2. Background
too slow to sense any fast transients [50], or were not yet further characterized to sense
actual delay [44]. Additionally, post-processing is needed, in case e.g. binary values
are required for further in-circuit processing.
For FPGAs, other primitives are used to resemble the buffers. In Xilinx 6 and 7-
series FPGAs we use a ‘CARRY4’ primitive for each 4 buffers of the Observable Delay
Line [44]. Thus, in an estimation of this sensor, for a Virtex-6 FPGA, one buffer
equals 19.5ps in delay. To improve linearity of the output, we use a two-bit bubble
proof priority encoder [59]. Due to this, the sensor used here can take on values in the
range of 0 − 62.
12
2.7. Leakage Assessment
are explored widely [68–71]. Unfortunately, guarding bitstreams is only a solution for
a trusted software and hardware development approach, where bitstreams need to be
signed or encrypted before the system accepts them. As reconfigurable systems and
user-defined custom accelerators are becoming more widely adopted in FPGA systems,
this fully trusted development approach becomes infeasible.
In Equation 2.1, µr and µfixed are the raw averages of the two sets of traces during
encryption of mr and mfixed , respectively, at a specific sample time step for a first order
t-test or the higher order central moments for a higher order t-test. Likewise, s2r and
s2fixed correspond to the respective variances for a first order t-test and the higher order
standardized moments for a higher order t-test. The amounts of random and fixed
traces are nr and nf ixed .
To prove a design secure against side-channel attacks using leakage assessment, it
is usually recommended to select multiple different fixed plaintexts and perform a
leakage assessment for each of them, and record a significant amount of traces (i.e. ≥
10 Million). As in this work we do not want to prove security but rather want to show
13
2. Background
information leakage, we typically evaluate a single fixed plaintext mfixed , and record
less than 10 Million traces.
Exploitable leakage is assumed for |t| > 4.5, a generally accepted threshold [72, 73].
A value of |t| > 4.5 relates to a confidence of > 0.99999 that the traces collected
from random encryptions and those from fixed encryptions are samples drawn from
different populations. When leakage assessment is performed in this work, sampling
is synchronized with the beginning of the encryption algorithm and we restrict the
leakage assessment to the middle third, as recommended in [72].
In Equation 2.2, HW(x) is the Hamming weight of x, SBoxj (x) is the (inverted)
SubBytes function of the AES algorithm and Khyp is the key hypothesis byte. Depend-
ing on whether the first or the last round of the AES encryption is attacked, Si is one
byte from either the input plaintext or the output ciphertext, where i ∈ 0, 1, ..., 15.
Moreover, SBoxj is the normal (j = 1) AES substitution function when the first round
is attacked, and the inverted (j = −1) substitution when the last round is targeted.
Another possible Hamming weight model is based on the result of a T-table lookup.
The AES algorithm can be optimized by implementing the MixColumn and SubBytes
operations into a single table lookup, where each input byte yields a 32 bit output
word. For CPA, the Hamming weight is then based on the output word of the T-table
lookup function.
The CPA result for a single byte is based on computing the Pearson’s correlation
coefficient ρj between a hypothetical power model Phyp and the actual measured value
Ptracej for every sampling step j, using all collected traces n:
$ $ $
n· Phyp · Ptracej − Phyp · Ptracej
ρj = % $ $ % $ $ (2.3)
n· Ptracej 2 − ( Ptracej )2 · n· Phyp 2 − ( Phyp )2
With a sufficient amount of traces, the correlation for the correct key byte value
will eventually differ significantly at a specific sampling step from the correlations with
14
2.9. Differential Fault Analysis on the AES
the incorrect key byte values, allowing the attacker to determine the correct secret key
byte.
A successful CPA depends on traces that are collected synchronous to the encryption
algorithm. For that, an alignment to reduce synchronization inaccuracies can be per-
formed. In Section 5.3 of this thesis, we use the following approach: We compute the
total average trace over all collected traces and use a normalized cross-correlation based
alignment algorithm. Each trace is shifted within a defined range and the normalized
cross-correlation with the total average trace is computed as follows in Equation 2.4:
1 &
ρcc (s) = (µi − µ) · (ti+s − µt ) (2.4)
σ · σt i
In the above equation, σ is the total standard deviation over all values, σt is the
standard deviation over the current trace, µi is the total average at sampling point i,
µ is the total average over all values, ti+s is the current trace value at sampling point
i + s and µt is the total average of the current trace. The maximum cross-correlation
value defines a new trace shifted by s, which is aligned with the total average.
15
2. Background
Figure 2.3.: Propagation of a faulty byte in the input of the 9th round of the AES algorithm
the entire 10th round key, it is possible to compute the original AES secret key, since
the key schedule is invertible. Similar relations exist for the other bytes of the state
matrix.
Exemplary, we consider single byte faults on the bytes 0, 5, 10 or 15 in the state
matrix before round 9, which can be used to recover bytes 0, 7, 10 and 13 of the last
round key. We initialize a set S containing all possible candidates for those bytes
of the last round key. This set of possible candidates is continuously reduced with
the evaluation of each pair of correct and faulty ciphertext (C, C " ) by inverting the
10th AES round for the two ciphertexts with a candidate k ∈ S . The candidate is
discarded, if the difference between the two state matrices resulting from the inversion
of the ciphertexts is not within the set of possible differences resulting from a single
byte fault on bytes 0, 5, 10 or 15 before round 9.
In [75], the number of candidates remaining after two ciphertext pairs was only one
(the correct) candidate in 98% of all cases. In the other 2% of cases, only two or a
maximum of four key candidates were left. Therefore, to recover the entire round key
of round 10 with faults injected before round 9, a minimum of eight ciphertext pairs are
required: For each of the four key bytes, two pairs are needed. This can be improved by
injecting single byte faults before round 8, which affect four bytes before round 9 and
therefore all bytes of the output at once. Then only two ciphertext pairs are required
to recover the full AES key.
16
Part II.
Contributions
17
3. An Experimental Evaluation and
Analysis of Transient Voltage
Fluctuations in FPGAs
The work described in this chapter was published in [19] and is joint work with
co-authors Fabian Oboril, Saman Kiamehr and Mehdi B. Tahoori. More details on
contributions is found in Section 1.1.
19
3. An Experimental Evaluation and Analysis of Transient Voltage Fluctuations in FPGAs
200 4.0
150 3.0
% Increase of
∆ Delay [ps]
Path Delay
100 2.0
50 1.0
0 0.0
Process Temperature Steady-State Transient
Variation (∆30o C) Vdrop Vdrop
Figure 3.1.: Difference in delay from variation in manufacturing process, temperature and
steady-state or transient Vdrop (measured by different switching activity) of 8%
of the flip-flops available in a Xilinx Virtex 6 XC6VLX240T FPGA, recorded
across eight sensors. Percentage based on the idle sensor path delay at baseline
temperature disregarding process variation.
opment effort. Besides that, FPGAs are very flexible to implement our experiments to
analyze and understand runtime variations.
Traditionally, voltage stability has been analyzed on the level of PDNs for differ-
ent ASICs and FPGAs [17, 29, 79–81], but did not contain insight on the influence
from mapped circuits or workloads and the timing margins they would require. Fur-
thermore, voltage regulators are typically excluded from these analyses. One post-
fabrication analysis of high-performance processors looked into workloads at certain
stimulus frequencies and whether they can be aligned in a beneficial way [82], but did
not consider different workload runtimes. However, different transient behavior can
have undesirable side effects, as seen in chip-testing [83–86], microprocessor-based sys-
tems [15, 16, 58, 87], and FPGAs, as shown in Figure 3.1. For FPGAs, process, voltage
and temperature variations have been analyzed [50, 88], but without considering the
transient effects of voltage fluctuations. Notably, one work shows the implementation
of a TDC based on configurable logic that can sense fast timing transients [44], but is
neither calibrated to absolute delay, nor used to analyze the influence from different
switching activity.
In summary, more analysis considering the on-chip transient voltage drop in FPGAs
is required. In this chapter, calibrated and redesigned TDC sensors to measure fast
timing transients are presented. We design test structures and map them with multiple
sensors into the FPGA, to study and investigate the impact of running workloads and
mapped design on the transient voltage drop induced delay. This is analyzed under
various workload durations (i.e. duty-cycles) and scheduling periods.2 The results and
findings are valuable to system designers by providing a chance to schedule and map
workload for lowering the voltage drop, increasing reliability or performance.
The contributions of this chapter can be summarized as follows:
• Calibrating and analyzing a sensor for voltage drop induced path delay and preparing
it for in-circuit use.
2
These terms are defined and explained in Figure 3.5 in Section 3.2.
20
3.1. Sensing Transient Delay Variation
• Comparing the path delay impact of transient voltage drop with other process and
runtime variations.
• Analyzing the impact of voltage drop on required timing margins caused by different
switching activity, temporally and from various spatial activity patterns.
• Temporal and spatial quantification of voltage drop by analyzing the degree of at-
tenuation or amplification from spatially and temporally distinct activities.
• The generality of the observed trends are verified on another FPGA chip manufac-
tured on a more advanced technology node.
The remaining sections explain an adjusted TDC sensor including an initial analysis
of process, voltage and temperature variation in Section 3.1. After that, Section 3.2
explains the experimental setup with detailed results. Section 3.3 concludes this chapter
on voltage fluctuations.
21
3. An Experimental Evaluation and Analysis of Transient Voltage Fluctuations in FPGAs
Figure 3.2.: Floorplan of the Virtex-6 FPGA with 8 horizontal regions. A delay line including
registers is shown magnified and annotated on the right side.
Table 3.1.: Resource use for one sensor, encoder, and registers between
Resource Utilization
(Xilinx Virtex 6) Sensor Encoder Register
FLOP_LATCH 72 78 64
LUT 8 139 —
CARRY4 19 — —
MUXFX (Multiplexer) — 5 —
INV (Inverter) — 1 —
22
3.1. Sensing Transient Delay Variation
Figure 3.3.: Floorplan of the Virtex-6 FPGA Figure 3.4.: Floorplan of the
with 9 sensors and regions. Kintex-7 to verify
our results.
23
3. An Experimental Evaluation and Analysis of Transient Voltage Fluctuations in FPGAs
Table 3.2.: Calibration data for each sensor in layout with eight horizontal sensors: Time to
bit 0 (i.e. initial delay) and average required delay per bit (linear estimate).
Sensor
Time
0 1 2 3 4 5 6 7
Time to Bit 0 [ns] 4.86 4.76 4.72 4.72 4.70 4.61 4.61 4.68
Average Delay/Bit [ps] 11.3 12.0 11.3 11.5 11.9 12.0 12.3 11.7
The results in Figure 3.1 show that transient voltage drops lead to the highest
increase in delay, while process variations follow close. As our analysis is based on
a Virtex 6 FPGA, fabricated using 40 nm process technology, we expect even higher
variation for newer technology nodes [27].
For the rest of this work, we focus on the delay influence from transient voltage drop.
We consequently introduce a temperature-controlled fan to reduce thermal variation to
a minimum. For process variation, we calibrate the sensors at a defined temperature,
achieved through this fan control. We perform a two-point calibration, where we have
a Gain in Delay/Bit and Offset as the Time to Bit 0. This calibration is done by
using different frequency around our nominal sensor sampling frequency of 100M Hz
(i.e. sampled every 10ns), while the system is idle except that chipscope is active. For
the first calibration point, the sensor is connected to 100M Hz. The FPGA integrated
clock control (Xilinx Multi-Mode Clock Manager) allows to set 92.308M Hz as the next
lowest and 109.091M Hz as the next highest frequency. These were used to acquire
calibration values for all sensors. The clock signal reaches both the first buffer element
and the latches connected to the observable delay (also see Figure 2.2) at the same
time with an uncertainty of 43ps across the FPGA, so we expect much less within
one sensor. Essentially, the sensor bits encode the time until the latches get disabled.
This time is half the clock period, being 5.00ns, 5.42ns and 4.58ns for 100.00M Hz,
92.31M Hz and 109.09M Hz respectively.
This calibration leads to 11.7ps per bit average delay and 4.7ns average time to bit 0
(i.e. initial delay) on one ML605 board, with similar results on another ML605 board,
confirming the results of process variation shown in [50]. The Xilinx ISE software
estimates 78ps for one ‘CARRY4’ element, and by that 19.5ps/bit. This difference
shows that still about 40% timing safety margins are left, when the system is idle at
38o C used for this calibration, and underlines how conservative these margins are. We
repeated similar calibration steps on a Kintex-7 KC705 board. For that platform, an
average delay per bin of 11.5ps was found, which is very similar, despite the technology
difference. On that platform, the tool estimate is only 13.25ps/bit, showing the use of
less safety margins of only about 15%. In all timing estimates, the slow ‘-2’ model was
used, matching the FPGAs. For one of our setups, the detailed per-sensor calibration
values are shown in Table 3.2, including the time to bit 0 and the average delay per bit.
24
3.2. Experimental Setup and Results
Figure 3.5.: Scheme of activating and deactivating workload through Workload Duty Cycle
and Workload Period, activated from the Test Control structure, embedded in
the test setup as shown in Figure 3.6. Flip-flops are reset to the same state after
system startup.
Figure 3.6.: Overview of experimental setup, showing connections of FPGA-internal and ex-
ternal blocks. Explanation of Test Control shown in Figure 3.5
25
3. An Experimental Evaluation and Analysis of Transient Voltage Fluctuations in FPGAs
Exp. 1: Analyze voltage drop depending on the impact of workload period and work-
load duty cycle.
Exp. 2: Evaluate the spatial spread of voltage drop across the FPGA from a single
workload.
Exp. 3: Measure the voltage drop from temporal and spatial interference between
multiple workloads on the FPGA.
26
3.2. Experimental Setup and Results
Figure 3.7.: Example for workload period T , workload duty cycle D, and phase shift ϕ
Exp. 4: Extend upon Exp. 3 by analyzing spatial interference in two dimensions across
the FPGA fabric.
The overview of the system used for these experiments is shown in Figure 3.6, which
shows a configuration with 8 sensors. For all experiments, we place multiple sensors
across a Virtex 6 FPGA on a ML605 REV E Evaluation Board. Some experiments are
cross-checked on a Kintex-7 KC705 board.
Both boards are supplied by a 12V AC/DC power supply on board-level, and use
multiple DC/DC switched-mode VRMs for different voltage levels [90]. The ML605
FPGA-internal VCCINT voltage is generated with a Texas Instruments UCD9240PFC
Digital PWM System Controller and PTD08A020W power conversion module [91],
where we could measure 500 kHz as the PWM frequency. The KC705 power supply is
based on the Texas Instruments UCD9248PFC.
For the first three experiments, we exclusively use 8 sensors placed horizontally from
the left to the right side of the FPGA, later we use 9 sensors in a 3x3 layout. Each
sensor is embedded in a region consisting of the same number of flip-flops connected
to inverters, as shown in Figure 3.5. The floorplans of our test setups are shown in
Figure 3.2 and Figure 3.3. The test structure, when 8 horizontally aligned regions
are used, can be seen in Figure 3.2, including a magnification of one delay line with
registers. In that setup, 14% of the available flip-flops on the FPGA are configured.
A similar setup is available with only 8% flip-flops. Figure 3.3 shows a setup with 9
regions in which again 14% of available flip flops are configured (1.6% per region). In the
following experiments, parts or all of these flip-flops toggle simultaneously, according
to the scheme in Figure 3.5. In most experiments, less than 8% of the flip-flops are
toggling in total.
This value is chosen, as it still reflects a realistic possible amount of toggling flip-
flops of a circuit (as we have commonly seen in simulations of benchmark circuits such
as a Leon3 processor), and leads to fluctuations that still fit inside the observable delay
range of the sensors, as we found empirically. To make a trade-off between development
effort and the time needed to acquire results, we designed a semi-automatic approach
with a simple state machine and on-board control, with PC-based data acquisition.
27
3. An Experimental Evaluation and Analysis of Transient Voltage Fluctuations in FPGAs
28
3.2. Experimental Setup and Results
200 A (90%)
107ps
100
∆ Delay [ps]
−100
Time of Activity
−100
Time of Activity
200 225ps
C
100
∆ Delay [ps]
−100
Time of Activity
20 40 60 80 100 120 140
Time [µs]
Figure 3.8.: Plot of all eight sensors each, while 8% of the FPGA’s flip-flops toggle at 100M Hz
when being active, or are clock-gated/inactive. A 90% / B 10%: 102.4µs
workload period, with 90% / 10% duty cycle, C: Starting switching activity after
several seconds of inactivity. Y-Axis: Delay difference to idle, X-Axis: Time
29
3. An Experimental Evaluation and Analysis of Transient Voltage Fluctuations in FPGAs
100 2.0
50 1.0
0 0.0
2
.8
.6
.2
6
0.
0.
0.
1.
3.
6.
2.
4.
9.
12
25
51
10
20
40
Workload Period [µs]
Figure 3.9.: Average of multiple worst-case delay increases from switching activity-dependent
voltage drop, averaged across all sensors. X-Axis: Workload period length
during the activity. Y-Axis: Delay difference to idle.
shows a corner case when the flip-flops start switching after several seconds of inactivity,
resembling a step response to a sudden increase in current. The delay reaches up to a
worst point of 225ps, even worse than the 195ps of case B (Figure 3.8 B).
We repeat these experiments for several combinations of workload duty cycles (10-
90%) and logarithmic increasing workload periods (200ns–409.6µs), and summarize the
average of multiple worst case delay increases of multiple sensors in Figure 3.9. Please
note, each workload duty cycle (10..90%) relates to a total overall switching activity at
that percentage of time, with only shorter or longer workload periods i.e. alternating
between active or inactive (as previously described using Figure 3.5 in Section 3.2).
These results show the same trend of worst-case delay dependency on duty cycle as
seen in the time series diagrams in Figure 3.8. For the observed workload period range
in our system, we can conclude the following observations:
• Higher workload duty cycles typically lead to lower delay increase due to transient
voltage drop.
• Higher workload periods lead to higher delay increase due to transient voltage
drop.
Intuitively, the di/dt component when the flip-flops start toggling should be the
same for any duty cycle. The IR drop component would be higher, due to overall more
switching activity. This explanation makes our observations counter-intuitive, as we
see higher delay increases from shorter duty cycles. Moreover, as shown in Figure 3.8
C, some time passes until the worst increase in delay is reached.
To explain these counter-intuitive results, we show the impact of duty cycle and
workload period on the average delay (i.e. steady-state) instead of maximum delay.
30
3.2. Experimental Setup and Results
20 0.4
15 0.3
10 0.2
5 0.1
0 0.0
−5 −0.1
2
.8
.6
.2
6
0.
0.
0.
1.
3.
6.
2.
4.
9.
12
25
51
10
20
40
Workload Period [µs]
Figure 3.10.: Overall average delay at different switching activity, showing all relevant work-
load duty cycles (the values of the remaining duty cycles are within the bound-
aries of the shown duty cycles) X-Axis: Workload period length during the
activity. Y-Axis: Delay difference to idle.
These results are depicted in Figure 3.10. In that diagram, only minor differences
among different workload duty cycles can be seen. These additional results can be
explained as follows: The on-board switched-mode VRM (cf. Section 2.2) is regulating
for a defined target voltage, but as its regulation loop is typically working at a much
lower frequency than the circuit behavior (depending on the workload frequency = 1/T ,
and the circuit frequency = 100M Hz), any smaller changes are seen as a contribution to
the average consumed energy, depleted from its output capacitor CO (cf. Figure 2.1),
for which the regulator stabilizes. Thus, the regulator will supply less current after
intervals with less energy consumption (low workload duty cycle), than after high
energy consumption (high workload duty cycle). Less current means shorter TON times
of the regulator output, leading to higher voltage ripple VP −P (cf. Section 2.2). As
a result, the shorter the time a certain activity contributes to the average, the less
weighted it will be, leading to higher transient voltage drop and delay increase.
To validate the generality of the observations made, we repeated some of the ex-
periments on a Kintex-7 KC705 evaluation board (28nm FPGA), and show them in
Figure 3.11. We add one more lower duty cycle of 0.1µs, and also use flip-flops switch-
ing at 200MHz instead of 100MHz. Despite the changes, a trend of lower duty cycles
leading to higher voltage drop is still visible in a major part of the analyzed workload
periods.
31
3. An Experimental Evaluation and Analysis of Transient Voltage Fluctuations in FPGAs
60 1.2
40 0.8
20 0.4
0 0.0
1
.8
.6
.2
6
0.
0.
0.
0.
1.
3.
6.
2.
4.
9.
12
25
51
10
20
40
Workload Period [µs]
Figure 3.11.: Cross-check on the Kintex-7 KC705 (cf. Floorplan in Figure 3.4),
with Flip-Flops switching at 200MHz instead of 100MHz: Average of
multiple worst-case delay increases from switching activity-dependent voltage
drop, averaged across all sensors. X-Axis: Workload period length during the
activity. Y-Axis: Delay difference to idle.
32
3.2. Experimental Setup and Results
50 1.0 50 1.0
Left Active (W =ii000000) Left Active (W =gg000000)
Center Active (W =000ii000) Center Active (W =000gg000)
0 Right Active (W =000000ii) 0.0 0 Right Active (W =000000gg)0.0
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
Sensors Sensors
(a) Spatial Differences with T = 102.4µs, D = (b) Spatial Differences with T = 12.8µs, D =
20% 50%
Figure 3.12.: Average of multiple worst-case delay increases from voltage drop per sensor,
depending on activity in two regions (3.5% of total flip-flops), with two different
D and T , i = (102.4µs, 20%, 0%) and g = (12.8µs, 50%, 0%). X-Axis: Sensors
across the FPGA from Left(0) to Right(7). Y-Axis: Delay difference to idle
• When the left or right regions are active, the gradient shape from either edge to the
other edge shows an almost symmetric profile, potentially making a fitted function
applicable to the whole FPGA.
• Although the shape of the curves is similar, the activity of the same number of
flip-flops on the right chip side leads to an overall lower drop. Mis-calibration can
account for a part of that, but at least the center two sensors would show the same
offset at the same power. As they are different, we assume the right side of the chip
is operating at lower power, which is reasonable according to [50].
• Activating the two regions in the Center shows the worst effect on the delay overall,
as even the right and left side are affected more by activity in the center than their
own region. The Xilinx’ System Monitor logic in the center needs power as well and
could possibly influence this. In addition, the center region has a higher distance to
power/ground I/O pins.
33
3. An Experimental Evaluation and Analysis of Transient Voltage Fluctuations in FPGAs
0 0.0 0 0.0
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
Sensors Sensors
0 0.0 0 0.0
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
Sensors Sensors
(c) T = 102.4µs, with ϕ = 50% in workload l (d) T = 12.8µs, with ϕ = 50% in workload h
Figure 3.13.: Average of multiple worst-case delay increases from voltage drop per sensor,
depending on activity with different phase shifts at different regions. Workloads
summarized in Table 3.4. X-Axis: Sensors across the FPGA from Left(0) to
Right(7). Y-Axis: Delay difference to idle
available flip-flops in the FPGA. Different effects will be explained based on selected
subsets of our results.
One general observation is that among different workloads, a high phase shift and a
high spatial distance reduces the voltage drop. We provide evidence for this observation
with the results in Figure 3.13. Three diagrams (a-c) show the effect of workload
activity patterns that use workloads with either no (j), only ϕ = 10% (k) or the
maximum ϕ = 50% (l) phase shift.
Without any phase shift, shown in 3.13a, the highest voltage drop (i.e., maximum
amplification) occurs when the active regions (with workload j) are close together on
one side as W = jjjj0000. In contrast, the voltage drop is reduced more (i.e., more
attenuation) when the active regions are more apart (W = jj0000jj). When we apply
a minor phase shift of ϕ = 10% to half of the active workloads (workloads k instead
of j), shown in Figure 3.13b, there is no major change to that. However, the voltage
drop reduces significantly when half of the workloads use ϕ = 50% for workload l, as
visualized in Figure 3.13c. The maximum attenuation of voltage drop occurs when
both high spatial distance and high phase shift are applied (W = jj0000ll). We also
verify this trend for a different workload period T = 12.8µs in Figure 3.13d.
In Figure 3.12 we provided evidence that the same activity can lead to stronger or
weaker voltage drop, depending on its location in the system. An active workload in
the center leads to a system-wide higher voltage drop than the same activity at an
edge. This nonuniform spatial influence has to be considered by any approach that
34
3.2. Experimental Setup and Results
0 0.0 0 0.0
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
Sensors Sensors
Figure 3.14.: Average of multiple worst-case delay increases from voltage drop per sensor.
These workload activity patterns show the benefits of high phase shift and
spatial distance among workloads. Workloads are summarized in Table 3.4.
X-Axis: Sensors across the FPGA from Left(0) to Right(7). Y-Axis: Delay
difference to idle
35
3. An Experimental Evaluation and Analysis of Transient Voltage Fluctuations in FPGAs
300 6.0
100 2.0
0 0.0
0 1 2 3 4 5 6 7
Sensors
(a) Variations of n = (51.2µs, 20%, 0) with (b) View on workloads in time when either no
o = (51.2µs, 80%, 20%) or p = overlap happens (between n and o) or with
(51.2µs, 80%, 30%) an overlap of n and p)
Figure 3.15.: Average of multiple worst-case delay increases from voltage drop per sensor.
These workload activity patterns show the high impact a small phase shift can
have. Workloads are summarized in Table 3.4. X-Axis: Sensors across the
FPGA from Left(0) to Right(7). Y-Axis: Delay difference to idle
The important additional observations from this subsection, with respect to inter-
dependence, are:
• The trend of spatial spread is not significantly influenced from changing only workload
period or duty cycle.
• Changing duty cycle or workload is interdependent with phase shift in case of overlaps
in time (cf. Figure 3.15).
• Distributing workloads more uniformly in time reduces voltage drop (i.e. through
appropriate phase shifts).
36
3.2. Experimental Setup and Results
Figure 3.16.: Average of multiple worst-case delay increases from voltage drop per sensor.
These workload activity patterns show the effect of an activity in either of the
two corners, with one activity 50% phase shifted ( ) or without phase shift
( ). Workloads are summarized in Table 3.4. X/Y-Axis: Sensors across the
FPGA. Z-Axis: Delay difference to idle
corners the worst-case voltage drop is reduced when one of the two workloads is phase-
shifted. In Figure 3.16a ( ) we can also see an increase in path delay from inactive
to active regions of up to 4.1×, from 24.5ps in corner x = 0, y = 0 to 99.4ps in corner
x = 0, y = 2, much higher than the 2.5× seen in the horizontal layout.
Additional effects that can be seen in the 3x3 layout is a difference in correlation on
the vertical versus the horizontal axis. Figure 3.17 shows the results from two active
regions on the edges of either the center row ( , y = 1) or center column ( ,
x = 1). The overall voltage drop is more prominent on the vertical axis, especially
when no phase shift is applied. In that situation, a strong voltage drop amplification in
the center region is visible, despite that region itself is not active. This gives valuable
information for floorplanning and runtime scheduling.
In Figure 3.18, we show two patterns of a fully active system (14% of all flip-flops).
The patterns compare the difference between no phase shifting in one workload pattern
( ) with phase-shifting the workload on every second region ( ) by ϕ = 50%. This
phase-shifting can overall reduce the worst-case increase in path delay from about 5%
to 2%. From these results it is also notable that in a fully active system, the center
column (Regions with x = 1) is under the highest voltage drop condition. To conclude
the results, we show two different patterns in Figure 3.19. The orange pattern ( )
has less activity without phase shifts, but still shows a higher voltage drop than the
more active purple pattern with phase shifts involved ( ). Because the purple pattern
adds a component of interference between some of the regions, these regions have less
voltage drop over the others.
In this subsection we can thus confirm and extend some previous observations:
• Results in the previous sections showed that for our FPGA sample, activity in the
37
3. An Experimental Evaluation and Analysis of Transient Voltage Fluctuations in FPGAs
Figure 3.17.: Average of multiple worst-case delay increases from voltage drop per sensor.
These workload activity patterns show the effect of an activity in the edges
of either the center column or center row, with one activity either 50% phase
shifted or not. Workloads are summarized in Table 3.4. X/Y-Axis: Sensors
across the FPGA. Z-Axis: Delay difference to idle
center region lead to higher voltage drop. We can confirm this result and extend it
to a tendency of affecting the whole center column (x=1).
• Due to more distance across the FPGA, a difference in delay increase from inactive
to active regions of up to 4.1× is shown, extending the previously seen 2.5×.
3.2.6. Discussion
The observations from our experiments allow us to formulate the following design rules
or recommendations:
• During floorplanning, increasing distance among active regions as far as possible will
decrease worst-case voltage drop.
• The activity in some regions has higher voltage drop impact compared to other
regions. For instance in our tested Virtex-6 FPGA, an activity on center regions has
higher impact on voltage drop, and should be avoided if possible.
• Distributing activity over time as uniform as possible will also decrease voltage drop.
• If multiple activities exist, spreading them both over time and space without overlap
can be considered good practice.
These results provide useful insight to the design, how different workload behavior,
as well as floorplanning, impact transient voltage drop and in turn circuit delay and
the corresponding timing margin for correct operation. The results and analysis are a
38
3.2. Experimental Setup and Results
Figure 3.18.: Average of multiple worst-case Figure 3.19.: Average of multiple worst-case
delay increases from voltage delay increases from voltage
drop per sensor. These work- drop per sensor. These work-
load activity patterns show the load activity patterns show the
effect of all regions being active benefits of phase shifting with
with ( ) and without ( ) ϕ = 50% to reduce volt-
a ϕ = 50% phase-shift for ev- age drop, even if being ap-
ery second region. Workloads plied asymmetrically ( ) over
are summarized in Table 3.4. less activity without phase shift
X/Y-Axis: Sensors across the ( ). Workloads are summa-
FPGA. Z-Axis: Delay differ- rized in Table 3.4. X/Y-Axis:
ence to idle Sensors across the FPGA. Z-
Axis: Delay difference to idle
first step to guide design optimization in terms of both performance and reliability. In
future work, we will focus on models to be used by designers to accurately guardband
against such effects, and also be able to optimize their design to be resilient against
them.
Our experimental results show a high dependency of voltage drop related to how
activities are distributed in time and space, such that even overall lower activity through
lower duty cycles or lower spatial distribution can induce more delay. This can be
explained by:
• Less distribution in time means having more sudden increases in power and thus will
lead to a higher di/dt voltage drop component, which dominates constant IR drops.
• Lower spatial distribution means higher power density and thus more load on the
same number of decoupling capacitors.
• A switched-mode VRM typically operates at lower frequencies than the circuit, and
therefore has an averaging nature. The shorter a certain timeframe of high power
(i.e. switching activity) is, the higher voltage drop can be expected.
Having full control over the placement of active regions and type of running work-
loads enabled these observations. Real workloads would give a more convoluted view
39
3. An Experimental Evaluation and Analysis of Transient Voltage Fluctuations in FPGAs
that is less clearly observable. In this way, we could see up to 3× difference in delay
increase from two cases with the same overall activity (duty cycle), showing a high
potential to mitigate this by scheduling. For example real-time systems typically work
with fixed scheduling/workload periods, and their task execution times can resemble
the used workload duty cycles.
The experiments involving spatial observations and interdependence reveal a spread
of voltage drop across the FPGA. Furthermore, amplification and attenuation effects
from positive and negative interference among distinct activities (i.e. workloads) were
made visible. While voltage drop gradients have been analyzed in [50] for steady-state,
we made the first analysis that includes transient voltage drops and the interdependence
of a range of workloads. In combination with the first results, sufficient qualitative
and quantitative evidence was collected to be useful for floorplanning and runtime
scheduling decisions of on-chip activities.
Using the results and methodology from the first experiment, a system architect can
find and work around the worst-case situation voltage drop hotspots (e.g. scheduling
periods around 204.8µs in our case). On the other hand using the results from Exper-
iment 2, a system designer can properly place and separate the active blocks, based
on their expected switching activity, and therefore voltage gradient. These results can
then be combined with Experiments 3-4, and lead to a joint strategy of scheduling
block-level activity at design time and runtime. For a new system, our approach can
be used to acquire similar results and use them for scheduling and partitioning deci-
sions, or directly evaluate existing circuit designs for critical operating conditions. We
believe that in a larger system, there is still sufficient freedom left for floor-planning
on the coarse granularity of IP blocks to apply these strategies, while still satisfying
all other constraints. The global interconnections between IP blocks are typically less
than the connections within a block.
As the overhead of these sensors is fairly small, they can also be used during the
system evaluation phase for monitoring the system behavior while running a mapped
design with realistic workloads. This can greatly help if the FPGA is used in critical
application domains with high reliability requirements.
On a final note, most of the overall observations and analysis are transferable to
more recent technology nodes, ASICs, and processor designs, since they are based on
similar underlying fabrication technologies, and share similar concepts in PDN design.
PDNs do not scale or improve as fast as feature sizes do [17].
40
3.3. Chapter Conclusion
depending on the transient behavior in a full system that includes VRM. Considering
the same activity, up to 3× difference in induced delay could be observed from transient
voltage variance, while the spatial spread across our chip showed up to 4.1× delay
variation from active to inactive region. Using our characterization approach, the
acquired system-specific knowledge can be used as a guidance during circuit mapping as
well as workload scheduling, leading to more reliable systems at increased performance.
Future work will include the investigation of design implications of observed trends and
how to optimize a design accordingly at physical design and runtime scheduling phases.
41
4. PDN-based Voltage Drop-based
Fault Attacks
The work described in this chapter was published in [20], [21], and [25] and is joint
work with co-authors (in no particular order) Fabian Oboril, Jonas Krautter, Falk
Schellenberg, Amir Moradi and Mehdi B. Tahoori. More details on contributions is
found in Section 1.1.
Fault attacks can break cryptographic implementations without the need of an algo-
rithmic weakness, by injecting errors in the computation and gaining knowledge based
on the faulty output. These attacks have been spread to various devices and types
of algorithms [92–94]. A traditional requirement for these attacks has been unlimited
physical access to the device. Before this thesis, the famous rowhammer attack has
proven that software can also be used to cause faults. For that, the rowhammer attack
uses specific memory access patterns, also affecting neighboring memory cells [95, 96].
In this chapter we will show that similar problems are more widespread and can also
be applied inside FPGAs. The previous Chapter 3 has shown that specific switching
activity can lead to higher or smaller voltage drops. Here we will show that such voltage
drops can be escalated to cause actual faults in the system, both for injecting errors or
bringing down the whole system. While in Chapter 3 we discussed only the reliability
implications, we more specifically discuss it as a potential threat to security.
43
4. PDN-based Voltage Drop-based Fault Attacks
Our proposed attack uses a relatively small number of ROs to crash the FPGA (about
12% of total available Look-Up Tables (LUTs) used). Moreover, it just requires a few
microseconds to crash the system, and thereby can escape various monitoring schemes.
Additionally, it is not detectable by a thermal sensor, and finally, it keeps the FPGA
inaccessible until it is fully power-cycled. This has been shown successfully in two
different generations of FPGAs and could crash a full SoC, containing two processor
cores with FPGA fabric, only using the vulnerability of this fabric.
The revealed vulnerability can be exploited for attacks with the goal to cause a DoS,
affecting other users and applications. The focus of this section is to fully analyze this
vulnerability and its overall impact. We analyze the exact conditions under which the
system crashes, and how systematically it can be reproduced. Furthermore, we provide
a first analysis and guideline for possible mitigation of this vulnerability.
44
4.1. Voltage Drop-based Fault Attacks on FPGAs using Valid Bitstreams
60
Propagation Depth
50
[# buffers]
40
30
20
10
0
20 40 60 80 100 120 140
Time [mu s]
Figure 4.1.: Change in propagation depth into a chain of ‘CARRY4’ elements of a Xilinx
Virtex 6 FPGA due to sudden current increase, caused by enabling 18720 ROs
and keeping them enabled. Temperature 38–40 °C.
Figure 4.2.: Left to Right: Virtex 6 Floorplan with ROs and Sensors; Kintex 7 Floorplan
with ROs; Zynq-7020 Floorplan with ROs.
of buffers, whose delay is affected by voltage fluctuations. Thus, higher values relate to
faster buffers operating at a higher voltage, and as buffer delays increase from voltage
drop, their propagation depth will decrease, shown as lower values on the y-Axis.
Before the voltage drop within time 0 − 10µs, the sensor is saturated at 62. At about
10µs the voltage starts to drop and the sensor shows that less bits can be propagated
due to increased delay td of the transistors in the buffers. This voltage drop reaches
its lowest point at 30µs and recovers back until 100µs. It demonstrates how on-chip
activity can lead to a strong increase in path delay. The delay is affected by the whole
sensor range of the Observable Delay Line, corresponding to a change of about 12.5% of
the total path (including the Initial Delay) according to timing analysis. In [44], 14%
delay change was reported with a different method and FPGA. When sensors values
are shown in the remaining work, they are based on two sensors, placed on the left or
right side of the System Monitor, near the center of the FPGA, as shown in Figure 4.2.
45
4. PDN-based Voltage Drop-based Fault Attacks
Figure 4.3.: Single ring oscillator made out of one LUT5 primitive
46
4.1. Voltage Drop-based Fault Attacks on FPGAs using Valid Bitstreams
Figure 4.4.: Principle of connecting ROs to a clock fRO−t that toggles their allowed activity
(inside which they toggle themselves as fast as physically possible).
amount of ROs is available (when about >7% of the LUTs are used, tested on the
ML605 board). Hence, in the following, we limited the description of our analysis to a
fixed number of ROs, for the sake of brevity.
When testing different frequencies for fRO−t , one needs to take into account that the
worst critical situation takes some time to build up. When simply enabling the ROs,
on the ML605 board, it takes about 10 − 20µs from the start of the path delay increase
until it reaches the lowest point and starts to recover again, as shown in Figure 4.1.
That means, we should keep the ROs oscillating until they cause a high enough voltage
drop, and start over again, before voltage starts to recover. Due to that, reducing fRO−t
will crash the FPGA more effectively, until the range of recovery (< 25 − 50 kHz). To
reduce the thermal influence on our experiments, we keep the FPGA within 38-40 °C
using the on-board fan.
We also checked if timing failures could be caused when ROs are added to a legit-
imate design. Our investigation showed that it either operated correctly, or crashed
entirely. Follow-up work is required to do a more detailed analysis.
47
4. PDN-based Voltage Drop-based Fault Attacks
Figure 4.5.: Range of frequencies to toggle ROs on/off that cause the ML605 to crash after a
certain number of attempts. Temperature regulated within 38-40 °C.
least the ML605. Thus, restarting the PC while keeping the board running does not
resolve the situation. A manual power-cycle of the FPGA board is required before any
access (i.e. JTAG) is possible again. On the KC705 we also tried to by-pass the on-
board USB JTAG with a stand-alone JTAG dongle, which showed to be not sufficient
to reactivate and access the FPGA board again.
The Zedboard has one of three conditions occur randomly. In one of them it stops
being accessible to JTAG, like the other boards. In another case, it resets, which also
deconfigures the FPGA part and resets the ARM cores. In the last case, it looks like
a reset as well. However, when trying to reprogram the Zynq in Vivado, it locks up
in the middle of reprogramming the bitstream. The software has then to be forcefully
terminated. After that, the SoC stays inaccessible like in the first condition.
If the ML605 is connected to Chipscope during the crash, we get the following
message:
WARNING: System Monitor Die Temperature has invalid data. [..] See Answer Record
24144.
Where a lookup of this Answer Record does not lead to relevant information. For
the KC705 board, if connected to Vivado Hardware Manager, the software quits this
program, and shows in a popup:
[Labtoolstcl 44-153] HW Target shutdown. Closing Target: localhost:[..]
For the Zedboard, when connected to the Vivado Hardware Manager and crashes in
any way (inaccessible or not), the crash is not immediately recognized and will only be
seen when trying to reprogram the board as:
ERROR: [Labtools 27-3165] End of startup status: LOW
ERROR: [Common 17-39] ’program_hw_devices’ failed due to earlier errors.
If it is connected for debugging during the crash, it will immediately show by:
ERROR: [Xicom 50-38] xicom: Core access failed. [..]
ERROR: [Labtools 27-3176] hw_server failed [..]
Resolution: Check that the hw_server is [..]
If the ARM cores are on a debug connection in the Xilinx SDK, this connection is
terminated.
In the following, we give more detailed results on the specific frequencies required
for the crash on the ML605 board:
48
4.1. Voltage Drop-based Fault Attacks on FPGAs using Valid Bitstreams
Table 4.1.: Different conditions for the tested boards. Power consumption measured with wall
plug, default power supplies.
% LUTs Standalone Power Consumption Crash Recovery (Inaccessible)
Board used for After
ROs After
Flash-
All Crash PCIe
After ing/ Stand-
ROs (Inac- Con-
Reset Pro- alone
Active cessi- nected
gram-
ble)
ming
Off/On Off/On
ML605 12.4% 14.3W 16.4W 36.5W 11.1W
Board Board
PC
Off/On
KC705 11.8% 21.8W 13.5W 29.4W 7.0W Power
Board
Off/On
not
Off/On
Zedboard 12.8% 3.3W 3.3W 6.2W 3.3W avail-
Board
able
We activate 18720 ROs (about 12% of the LUTs in this specific Virtex 6) in a range of
fRO−t of 20 kHz−2 MHz. For each attempt to crash the FPGA, we program it and start
an activity at a single fRO−t . We then check how many attempts we require in order
to crash the system. Since the recovery from a crash is time-consuming and not easily
automated, we ran more experiments for the corner cases. The results in Figure 4.5
show how many attempts are required to cause a crash, depending on the chosen fRO−t .
Above 1 MHz, the voltage does not drop enough, and no crash happens with at least
99% probability (based on 100 attempts). Within 80 kHz−1 MHz is a ‘greyzone’ in
which the crash occurs in a non-deterministic way. For the fRO−t tested below 80 kHz,
the crash happens always at first try.
To check the maximum time the crash requires, we set the Xilinx Chipscope Inte-
grated Logic Analyzer (ILA) to trigger on the start of our malicious activity. After the
trigger condition, we set it to collect only 16 samples before sending them to the PC.
The expected time required in total from the trigger condition until received by the
PC is less than 150µs, based on the JTAG frequency, data size and internal sampling
rate. Thus, the crash happens in less than 150µs.
We additionally experimented with the two FPGA boards being connected inside a
workstation to PCIe, in which the crash also happened. By using on-board switches to
power either of the two boards off and on, the ML605 can be accessed again. However,
the KC705 requires a cold reboot of the workstation (i.e. even the workstation’s power
supply), which can be a permanent DoS in a server environment that requires manual
power cycling.
Interesting for these two boards is their power consumption after the crash, which
is less than in any other condition by at least 20%, because one voltage regulator
stopped operating. However, for the Zedboard there is no power difference. Table 4.1
summarizes the conditions.
49
4. PDN-based Voltage Drop-based Fault Attacks
Exp. Xilinx Virtex 6 ML605, 38-40°C, 18720 ROs Sensor Left of System Monitor
Setup: based on LUT5, 100MHz Sensor sample rate. Position: Right of System Monitor
Propagation Depth [# bits]
Figure 4.6.: Influence on the increase in path delay when applying different activation fre-
quencies fRO−t to the ring oscillators. Case (b) almost always leads to a crash,
because at 80µs it is not yet recovered as (a) and the voltage drops last longer
than (c) or (d). For (d) it never crashes, probably because the value never goes
below 10.
The vulnerability exposed in this work is caused through voltage emergencies. In the
background section, we reviewed various failure causes due to timing faults, SRAM
state retention loss, or resonance in the PDN that supplies the FPGA. For the FPGAs
tested here, both internal BRAM and the configuration memory are based on SRAM.
The required time Tt and amplitude Vdrop for a voltage emergency are different for a
timing fault or SRAM state retention loss, where power distribution networks can have
weaknesses when stressed at the right frequency.
On the ML605 board, 0V can be measured for the FPGA core supply voltage VC-
CINT after the crash. This situation shows that the respective on-board voltage regu-
lator was shut off and is causing the permanent nature of the crash until power cycling
of the board. The other voltage regulators on the board still operate normally and
keep some LEDs lighted up and the fan spinning.
The sensors mapped to the FPGA can show some more details and evidence why
50
4.1. Voltage Drop-based Fault Attacks on FPGAs using Valid Bitstreams
this situation occurs, which we show with some example traces in Figure 4.6. These
were collected when the system did not crash (therefore we can only show a minimum
frequency of 83.333 kHz).
A larger Vdrop can be seen when the ROs are activated suddenly and stay activated,
as in Figure 4.6a, which recover after around 80µs. When we cause repetitive events
however, we can see a certain amplitude of Vdrop to repeat itself for a longer stress
period, shown for fRO−t = 83.333 kHz in Figure 4.6b. When the frequency of fRO−t is
varied in Figure 4.6b-4.6d the worst-case Vdrop of each frequency monitored with the
sensors is very similar. However, the 500 kHz shown in Figure 4.6c does only rarely
cause a crash, potentially because the sensors reach below a value of 10 only for very
short times. With 2 MHz shown in Figure 4.6d we have never got a crash, and in that
case, none of the sensors reach below 10.
Thus, a possible conclusion is that the board crashes directly due to extreme stress
from a frequency-dependent resonance in the on-chip PDN or on-board voltage regu-
lator. However, other components, like the System Monitor or configuration memory,
might first become faulty, and subsequently lead to the voltage regulator getting dis-
abled. The requirements to cause voltage emergencies in these components might differ
from each other, and with that also the sensitivity to different fRO−t frequencies. For
instance, high frequency but short-timed Vdrop ‘spikes’ can be absorbed in longer Tt
times. Since in the experimented boards the memory and logic subsystems of the
FPGA use the same supply voltage (and likely same PDN), the excessive voltage drop
in the logic part (ROs) can cause voltage emergencies in the memory subsystem as
well.
To collect evidence for a retention failure of SRAM, we can check either BRAM or
configuration memory. We use BRAM, as it is easier accessible in the fabric, and Chip-
scope actually uses it for its sample memory. Thus we can see its failure by receiving
corrupted data in Chipscope. We received such corrupted data, when experimenting
with fRO−t within the non-deterministic greyzone. In that range, we can sometimes still
receive partial traces, short before a crash, showing evidence of an anomalous BRAM
behavior.
In Figure 4.7 we show part of such a corrupted trace. In this trace until 131.66µs, we
receive typical fluctuation data. After this time, all data bits received from Chipscope
are ‘0’ for exactly 150 ns, then they are ‘1’ until the crash. In all the partial traces we
60
Propagation Depth
50
[# buffers]
Time [µs]
Figure 4.7.: Detail of path propagation depth when a crash happens, but some data was still
transmitted. After the undershoot to all zero at 131.66µs, all data received by
chipscope is ‘1’ – potentially the reset state of BRAM output latches.
51
4. PDN-based Voltage Drop-based Fault Attacks
60
Propagation Depth
50
[# buffers]
40
30
20
Left of System Monitor
10
Right of System Monitor
0
0 10 20 30 40 50 60 70 80 90 100 110 120
Time [ns]
Figure 4.8.: Detail of path propagation depth when disabling the ring oscillators after detect-
ing a drop in sensor value with previously idle activity.
recorded, the behavior is similar, and the intermittent ‘0’s always appear for 150 ns on
a systematic basis.
Please note, in the case of receiving all-‘1’, it is not the sensor being saturated again,
as the sensor saturation value is 62 and not 63, but the reserved part in BRAM is 6
bit. We assume itself, or its output latches, got reset to ‘111111’= 63.
In conclusion, the permanent nature of the crash depends on the voltage regulator
crashing or shutting down for safety reasons. This situation is in turn either caused
directly by resonance in the PDN, or by causing other problems in the logic and con-
figuration memory of the FPGA. More details will be analyzed in future work.
4.1.3.3. Discussion
In the previous sections, we showed the nature of deliberately caused voltage emergen-
cies that lead to a DoS. Specifically, the attack leads to a DoS situation much quicker
than by overheating, and with the addition of keeping the FPGA inaccessible, even to
JTAG, until its power supply is reset.
Allowing user-configurable accelerators in such systems create security vulnerabili-
ties that can compromise the availability of FPGA resources. A complete server might
require reboot, including full power cycling. In the worst-case, a system based on SoCs
with included FPGA can get into a permanent DoS, for instance when they are pow-
ered by a non-removable battery. Threats like these could be reduced, given a scheme
to detect and disable the excessive switching activity, before it escalates to voltage
emergencies for the complete chip.
In an additional experiment, we show how fast voltage recovers when all ROs are
disabled after a sensor value below ‘30’ is detected. This way, we estimate the latency
of the sensor and how fast the voltage drop wears off. Figure 4.8 shows the sensor
readings for this experiment. The sensor values show recovery after two samples (20ns).
However, as the sensors are saturated at 62, the system might still require more time
to fully recover.
Applying this option to a real design would require to reserve area for the sensors
and one input on each LUT. The sensor thresholds would need to be chosen such that
legitimate activity is not affected. Additionally, all LUTs would effectively have one
input less available to the design, and no option for a disable switch might exist for
other FPGA primitives, making it rather infeasible and requiring other options These
52
4.2. FPGAhammer: remote voltage fault attacks on shared FPGAs
options could be based on power gating or a way to quickly disable the interconnects
in affected areas.
To even prevent malicious bitstreams from loading, they can be sanity checked in
software, with the challenge of keeping legitimate bitstreams working, but not leaving
loopholes for malicious ones. One option that recent FPGA tools already use as default
constraints is checking for combinational loops during bitstream generation, which can
be deactivated by the user. Thus, the check would need to be done at a privileged
system software level, inaccessible to the user or application developer.
For all of these possibilities, new experiments are required. To be able to deactivate
arbitrary malicious circuits fast enough, new FPGAs might need to be manufactured.
53
4. PDN-based Voltage Drop-based Fault Attacks
• We establish a generic threat model for an attacker and a victim using a shared
FPGA resource in an active fault attack scenario.
• We show that a spatially and logically separated attacker in one region of the
FPGA fabric can attack a victim in another region.
• We test and prove the general vulnerability to on-chip voltage drop fault injection
on a range of FPGA platforms, and elaborate an automated way to inject faults
more precisely.
• We empirically prove that fault injections achieve a precision high enough for
a successful DFA and key recovery on the AES, regardless of FPGA model or
process variation within the tested devices.
The remaining section is structured as follows: Section 4.2.1 explains the proposed
threat model, the related work, and background on how voltage fluctuations occur
inside chips. Moreover, we briefly outline the DFA method we apply in our attacks. In
Section 4.2.2, we present an initial attacker design and describe the behaviour of FPGA
boards from different manufacturers under the influence of malicious switching activity.
In Section 4.2.3, we elaborate on how a fault attack and subsequent DFA can be carried
out with the proposed method on an FPGA AES implementation. An overview of the
hardware used in the experiments and implementation details are provided. We also
present results of analyzing injection rates and key recovery success. We discuss some
ideas for future research based on our findings in Section 4.2.4 and conclude our work
in Section 4.2.5.
4.2.1. Preliminaries
Before elaborating a full attack on the AES, we put our work in the context of other
publications, which are relevant to our findings or assume a similar threat model.
Moreover, we briefly explain the theoretical background of our experiments and the
basics of causing a voltage drop, leading to faults in FPGAs.
54
4.2. FPGAhammer: remote voltage fault attacks on shared FPGAs
Figure 4.9.: Overview of the threat model considered in this work: Attacker and victim share
an FPGA resource with a common power supply network, but isolated, logically
disconnected partitions on the fabric
• The adversary can issue arbitrary plaintexts to a public interface of the victim
process either locally or remotely through a network.
• The victim outputs the ciphertext of the provided plaintext, encrypted with the
secret key, only known to the victim.
We remark that we assume an attacker that can encrypt arbitrary plaintexts in the
generic threat model. However, the actual content of the plaintext is irrelevant for
a successful DFA and may even be unknown. The attacker only needs to be able to
enforce encryption of the same plaintext twice, where one case is a fault-free encryption
and in the other case faults are injected. Therefore, a DFA based attack on AES like the
one presented in this section, can be applied to situations where replaying encryption
requests to the target module is possible. Please keep in mind that attacks with faulty
ciphertext only are also possible [99], but are subject to future studies to reveal whether
they are feasible by internal voltage-drop based fault generation in FPGAs.
55
4. PDN-based Voltage Drop-based Fault Attacks
56
4.2. FPGAhammer: remote voltage fault attacks on shared FPGAs
the input. Since the gate is usually implemented as a single LUT in the FPGA, the
oscillation frequency depends mostly on the loopback routing.
57
4. PDN-based Voltage Drop-based Fault Attacks
Figure 4.11.: Trace of FPGA supply voltage VCCINT, measured externally with an oscillo-
scope during a frequency sweep leading to a crash of the device
In Figure 4.12, we present results of comparing the two RO variants with (b) and
without (c) virtual output pins, where each implementation has the same amount of
logic utilization. We stressed a simple test design, which is detailed in the next section.
We collect the number of faults in a series of 10 trials for 5 seconds each. The amount
of recorded errors in the test design proves the higher effectiveness of the virtual pin
Figure 4.12.: Amount of errors detected in a simple adder test design during 5 seconds of
RO toggle activation with respect to the RO implementation option. Tested on
DE1-SoC.
58
4.2. FPGAhammer: remote voltage fault attacks on shared FPGAs
option. Despite less ring oscillators are used in variant b, the additional interconnect
resources connected to each single oscillator cause more faults.
Voltage drops can be further increased by enforcing high interconnect utilization
when separating ROs from their respective virtual output pins. However, we have
observed that the design which works the best, strongly depends on the used device,
especially if devices from different manufacturers are considered.
59
4. PDN-based Voltage Drop-based Fault Attacks
that on most of the tested platforms it is also possible to inject faults into designs that
meet the timing constraints of worst-case estimation models.
In Table 4.2, we summarize our results across several platforms, which we evalu-
ated regarding the feasibility of fault attacks in designs that do not meet the timing
constraints (unmet) and designs that meet the constraints. Although some platforms
seem to be not vulnerable, it is more likely that we simply failed to find the appropriate
parameters to activate the oscillators in a way to trigger the necessary voltage drops
yet. These initial results promise a possible success regarding the application of fault
injection and DFA to cryptographic modules on FPGAs.
60
4.2. FPGAhammer: remote voltage fault attacks on shared FPGAs
Figure 4.14.: Flowchart of the calibration algorithm to find the appropriate parameters for
injecting faults at the desired moment
and activation time of the RO grid to provoke faults at the proper moment of the
encryption.
Any fault before the 8th round of the AES leads to all bytes of the faulty output
ciphertext to be different from the correct one, whereas any fault after the 9th round
leads to less than four bytes to be different. Faults that are injected into the input
state matrix of the 9th round are revealed in exactly four bytes of the faulty output
ciphertext being different from the correct one. This allows us to verify a successful
fault injection using the output ciphertext. Therefore, we decide to aim for injecting
faults not before the 8th round of the AES but before the 9th round only.
To make use of the possibility for injection success verification, we develop an auto-
mated calibration algorithm, to be executed before evaluating injection rates or attack
success for a given design and device. The algorithm allows to use the attacker design in
different setups, without the need for finding appropriate parameters in time-consuming
trial-and-error experiments manually.
In Figure 4.14, we present an overview of the full calibration algorithm we use before
evaluating injection rates or key recovery success. We adapt the signal for activating the
ROs in three parameters: The toggle frequency, the duty-cycle and the delay between
61
4. PDN-based Voltage Drop-based Fault Attacks
Figure 4.15.: Trace of FPGA supply voltage VCCINT, measured externally with an oscillo-
scope during a single fault injection attempt
starting the encryption and activating the RO grid. On the left side of the flowchart, we
depict the process flow on the software side, whereas the right side enlists the actions
carried out on the FPGA by both attacker and victim design. The algorithm performs
as follows:
a) The attacker activates the calibration process on the FPGA. A random input
plaintext is drawn and encrypted without RO activity. The result is stored as
the correct ciphertext.
62
4.2. FPGAhammer: remote voltage fault attacks on shared FPGAs
plaintexts result in a faulty output even with the ROs active. After a successful cali-
bration process, the three parameters are fixed and used for subsequent fault injections.
In Figure 4.15, we show an externally acquired trace of the FPGA supply voltage
VCCINT during a single fault injection by the attacker design on the FPGA. The AES
reset signal (aes_rst_n), which resets the AES encryption module when low, indicates
the start of an encryption. To provoke a fault, the attacker design pulses the RO grid
(ro_ena signal) with the previously determined frequency, duty-cycle and activation
delay. The voltage fluctuations (VCC ) cause a critical delay at the desired moment with
a higher probability and therefore, a fault is injected.
In conclusion, our approach requires eight instead of only two ciphertexts compared
to [75] to recover the secret key, but makes the attack feasible in practice, where a
lot more encryption requests are required to have the attacker design affect the AES
module at the desired moment. In all subsequent experiments, we apply this variant
of the attack, aiming to inject faults before the 9th encryption round. The calibration
is executed only once, at the beginning of any evaluation. However, we continue to
filter faults that have been injected at an earlier or later encryption round during the
collection of ciphertext pairs. This method maximizes key recovery success, although
the injection precision achieved by our calibration is very high, as we show in the
results.
Furthermore, we remark that the acquired calibration parameters can even be reused
on different devices of the same type. Therefore, there is no need to perform specific
calibrations on the board on which the fault attack is to be performed. In our exper-
iments on the Terasic DE1-SoC for example, we find that a toggle frequency of 1.16
MHz with a 56% duty-cycle is selected most frequently in a run of 1000 calibrations.
Fixing those parameters and reusing the design on a different DE1-SoC board leads
to similar or even better fault injection rates, depending on the general vulnerability
(process variation) of the board.
Note that this method filters out all faults that are caused at the wrong AES en-
cryption round but does not necessarily discard ciphertexts, which result from fault
injections that affect multiple bytes before the 9th round. Some multi-byte faults can
also lead to four faulty bytes at the desired positions in the output ciphertext. There-
fore, we still have some amount of keys, which can not be recovered during evaluation,
because multi-byte faults are not covered by the theoretical fault model of the DFA.
Further details regarding unsuccessful key recoveries can be found in the results in Sec-
tion 4.2.3.4.
63
4. PDN-based Voltage Drop-based Fault Attacks
Table 4.3.: STA corner timing-models available in the Quartus STA from fastest to slowest
model for a given device speed grade at 1100 mV supply voltage
which incorporate an Intel FPGA and a 925 MHz Dual-Core ARM Cortex-A9 processor
inside a single die. The 5CSEMA5F31C6 chip is embedded on the Terasic DE1-SoC
board. We studied injection rates and attacks on three Terasic DE1-SoC boards of
different age and usage history to account for process and aging variation. A smaller
variant of the Cyclone V SoC, the 5CSEMA4U23C6N, with only half the amount of
logic elements is present in the Terasic DE0-Nano-SoC board, which we investigated
as well. The devices are used with their standard, unmodified power supplies.
Both boards have an SD card slot, which we use to boot a Linux system and run
user applications, that interact with the FPGA fabric, on the ARM processor. We
encapsulate the AES cryptomodule as an Intel Avalon Memory-Mapped slave device,
which allows access from programs running within the Linux system on the CPU.
The Intel Quartus Prime software offers tools for Static Timing Analysis (STA),
which analyzes the design in terms of timing violations under four different models
(corners) for a given device with a specific speed grade [100]. We enlist the available
timing-models in Table 4.3. The fast/slow classification of silicon for the given device
speed grade refers to propagation delay variations caused by intra-die process variation.
If the timing analysis reports timing violations at the time of implementation, the
design is not guaranteed to work reliable under all operation corner cases in terms
of temperature and voltage levels, according to official chip specifications as found in
the FPGA datasheet. We focus on attacking designs that do not violate any timing
constraints, even at the worst-case 85° C corner, but investigate the influence of worst
case path slack in designs that violate the constraints as well. For each experiment, we
explicitly report whether timing constraints are violated in the respective subsection
later.
We implement the attacker design as a grid of ROs as described in Subsection 4.2.1.2.
A single RO is composed only of the combinational part of a single ALM on the Cyclone
V. The output is directly routed through local interconnect back to the input. This way,
we achieve the fastest possible switching frequency for the oscillators. In Figure 4.16a,
an example of how the Intel Quartus Prime software synthesizes and fits a single RO
into the bottom part of one ALM can be examined. The used output on the right
and input on the top left of the ALM are the same and the additional enable signal is
connected to the bottom left input of the ALM.
The Intel Quartus Prime software reports the worst case delay through the LUT
to be about 0.08 ns and the loopback routing delay through local interconnect around
64
4.2. FPGAhammer: remote voltage fault attacks on shared FPGAs
(b) ROs in a Logic Array Block (LAB) of the Cy- (c) Design for evaluation of fault injec-
clone V SoC on the left with the interconnect to tion after fitting as displayed in the
their respective virtual output pins on the ad- Quartus Chip Planner on the Tera-
jacent LAB on the right (continuous lines) and sic DE1-SoC board with the AES
other LABs (dashed lines) as well as loopback module in the bottom area and the
routing (dotted lines) ROs grid in the top left region
Figure 4.16.: Implementation details of the RO attack design on the Intel Cyclone V SoC as
displayed in the Quartus Chip Planner
0.21 ns. Therefore, assuming a maximum delay of 0.3 ns through gate and loopback,
the RO can achieve frequencies of 3 GHz and more.
As explained in Subsection 4.2.2.1, an RO-based design with a virtual pin (variant b)
is most efficient to provoke critical voltage drops, which is why we choose this design
variant for our attack on AES as well. In Figure 4.16b, we show how several ROs
defined as an oscillating LUT and a virtual output pin are mapped into a LAB of the
Cyclone V SoC as presented in the Quartus Chip Planner. The schematic shows the
loopback routing (dotted lines) of each RO and the routing to their respective virtual
output pins, two of which are placed in an adjacent LAB on the right (continuous lines)
and some in different regions of the FPGA fabric (dashed lines). The relevant Verilog
code parts for implementing an RO grid on Intel FPGA devices can be found in the
Appendices Section A.1, Section A.2 and Section A.3.
Moreover, it is necessary to drive the oscillator grid with a very specific frequency
and duty-cycle. We therefore add an enable signal to trigger each of the implemented
ROs, which is routed through a global clock buffer on the Cyclone V SoC. The use of
this type of signal, originally intended for distribution of clock signals on the FPGA,
allows to save on routing resources from the toggle frequency control design block to
all of the ROs and accelerates the compilation of the entire design significantly.
65
4. PDN-based Voltage Drop-based Fault Attacks
Since our attack scenario, elaborated in Section 4.2.1.1, assumes a shared multi-
user FPGA use-case, we constrain each design using the LogicLock feature of the Intel
Quartus Prime software to keep victim and attacker design blocks within designated
areas of the FPGA fabric. To avoid any variation from other components on the chip,
we additionally activate the region reservation parameter of the LogicLock region, that
contains the AES module, which prevents the fitter from placing any other logic than
the AES module and its Avalon MM encapsulation into this area. Figure 4.16c shows
the ROs mapped into the top left area and the AES module in the bottom region
as displayed in the Quartus Chip Planner for the design on the larger Cyclone V
SoC on the Terasic DE1-SoC. On the software side, we implement tools for controlling
encryption and fault injection to be executed within the Linux system on the ARM core
of the Cyclone V SoC. The evaluation of the collected ciphertext pairs and respective
DFA is performed on a standard host computer with an Intel i7-7700HQ Quad-Core
processor.
66
4.2. FPGAhammer: remote voltage fault attacks on shared FPGAs
Figure 4.17.: Total measured fault injection rates Ftot and measured injection rate of faults
usable in DFA FDFA with respect to the amount of logic utilization (percentage
of total LUTs) by the attacker design for three different Terasic DE1-SoC boards
and two different random encryption keys
of injecting faults before the 9th AES encryption round, the calibration algorithm and
filtering of undesired faulty ciphertexts. For each of the three boards, which we already
used to investigate fault injection rates, we use the amount of ROs that lead to the
highest injection rate FDFA of faults usable for DFA. Again, the AES module has an
operating frequency of 111 MHz, where no timing violations are reported by the STA.
We generate a set of 5000 random AES keys and collect a minimum of two ciphertext
pairs, which exhibit faults at the desired positions, for each four bytes of the last AES
67
4. PDN-based Voltage Drop-based Fault Attacks
Figure 4.18.: Amount of key candidates remaining for each DFA key recovery attempt on
5000 randomly drawn AES keys on three different DE1-SoC boards
round key. The ciphertexts along with each key are stored on the SD card of the board
and later transferred to a host computer. After collecting the minimum amount of
faults required, we apply the DFA from [75] with the slight adaption of assuming single
byte faults before the 9th round instead of the 8th round. In that case, we require a
minimum theoretical limit of 8 ciphertext pairs per key.
Figure 4.18 summarizes the results of the key recovery attempts on our three DE1-
SoC boards. On all three boards, we are able to deploy the attack completely successful
for at least 87.9% of the 5000 random keys. All recovered keys are correctly recovered,
so no false positives are encountered. On all boards, we have a small amount of around
2 − 3% of all keys which can not be recovered, but less than four candidates for the
last round key remain. This ratio confirms the results in [75], showing that in about
2% of the cases more than two ciphertext pairs are necessary to recover the AES key.
If a sufficiently small amount of key candidates remains, the correct key can be easily
recovered with an exhaustive search. We encounter, however, some keys, where more
than 232 or even 264 candidates remain. Across all our experiments, an average of
22 usable faults were required to gather the required two ciphertext pairs per four
bytes of the round key. To collect these pairs, the attacker design needs to issue 17979
encryption requests on average to the AES module, which took on average 2344 ms.
The average time for the evaluation of one attack until key recovery on the described
host machine is about 107 ms.
Ultimately, the attack can therefore recover a secret AES key in about 90% of
cases. In the remaining cases, fault injection itself fails. Our calibration algorithm and
subsequent filtering of faults, which can not be used in DFA, prevents the gathering of
faults that have been injected at any other stage of the AES encryption than before
the 9th encryption round. However, as mentioned in Section 4.2.3.1, the method is
unable to distinguish some multi-byte from single-byte fault injections. The adapted
fault model from [75] assumes single byte faults before the 9th encryption round, which
is why key recovery attempts are unsuccessful, if the faulty ciphertext is the result of
a multi-byte fault.
68
4.2. FPGAhammer: remote voltage fault attacks on shared FPGAs
Figure 4.19.: Total measured fault injection rates Ftot and measured injection rate of faults
usable in DFA FDFA as well as setup slacks reported by Quartus STA for differ-
ent AES operating clock frequencies fop with preserved design placement but
remaining routing randomization on the DE0-Nano-SoC
69
4. PDN-based Voltage Drop-based Fault Attacks
recorded. Furthermore, we note the setup slack values for each design as provided by
the STA in the worst-case corner (slowest silicon, 85° C) and in the best-case corner
(fastest silicon, 0° C). For reference, we also show the worst-case path slack value for
the design running at 111 MHz on the DE1-SoC.
In Figure 4.19, we show the fault injection results together with the respective slack
values at different operating frequencies. The results show that the reported worst-case
and best-case slacks for the design do not directly correspond to the respective operating
frequency fop linearly, due to the remaining heuristic algorithm in the routing stage.
However, the trend is that slack values increase for lower frequencies and decrease for
higher frequencies. Both Ftot and FDFA increase together with operating frequency
fop . However, the increase is more steep within the threshold range 145 MHz ≤ fop ≤
151 MHz. A divergence between FDFA and Ftot with increasing frequency is not as
significant as in the experiments with respect to logic utilization. Single experiments
with fop = 170 MHz imply, however, that the injection becomes less precise with
increasing fop as well. We were unable to inject faults into the design running at
fop = 142 MHz.
70
4.2. FPGAhammer: remote voltage fault attacks on shared FPGAs
71
4. PDN-based Voltage Drop-based Fault Attacks
injection rates showed how an attacker can provoke sufficient faults for a key recovery,
with logic utilization in the range of only 35% to 45% of the LUTs on a Terasic DE1-
SoC board based on an Intel Cyclone V SoC. Since our calibration algorithm allows
precise injections, independent of target parameters, we were able to recover at least
90% of secret AES keys from a set of 5000 randomly drawn keys on three different
boards. The results in this work highlight the importance of further research, before
FPGAs can be adopted widely in multi-user scenarios.
72
4.3. Further Fault Attack Investigations
Figure 4.20.: Range of frequencies to toggle ROs on/off that cause timing faults or crashes in
the Xilinx VCU108 Virtex Ultrascale Board.
system integrity become feasible without accidentally causing a crash that would be
more obvious to the victim.
73
5. PDN-based Power Side-Channel
Analysis Attacks
The work described in this chapter was published in [22], [23], [24] and [25] and is joint
work with co-authors (in no particular order): Falk Schellenberg, Jonas Krautter, Amir
Moradi and Mehdi B. Tahoori. The works in [22], and [23] have also been included in
the PhD Thesis of Falk Schellenberg, in [110]. More details on contributions is found
in Section 1.1.
In this chapter we present three main works looking into the PDN as an attack
vector that can be exploited remotely through software access. By that, the security
of a full system can be undermined. We show this problem in three main stages:
• Section 5.1 shows that power analysis side-channel attacks are feasible within an
FPGA chip.
• Section 5.2 shows that these attacks can even be escalated to a full board-level
system, such that one FPGA can attack another Integrated Circuit (IC) on the
board that shares the same power supply.
• Section 5.3 shows a software-based power analysis attack within a mixed-signal
chip, and by that generalizes shared power supplies as a new type of attack vector
within an IC.
75
5. PDN-based Power Side-Channel Analysis Attacks
76
5.1. An Inside Job: Remote Power Analysis Attacks on FPGAs
Figure 5.1.: Two scenarios of SCA attacks, where the circuits are logically separated, but
share the same PDN. a) In a shared FPGA, one user (A) can attack another
(B). b) In an FPGA SoC, a user with current access to the FPGA accelerator
can attack any software or operating system on the CPU.
5.1.1. Preliminaries
5.1.1.1. Adversary Model and Threat Analysis
Considering Figure 5.1, we assume two scenarios for the adversary in this section. In
both systems, the adversary’s goal is to extract secret information from the other system
components, with only access to the PDN, and no signal connections. In the first one,
the adversary has partial access to the FPGA fabric, whose resources are shared among
multiple users, e.g., FPGA accelerators shared in the cloud [4]. When the sensors are
hidden in a complex application – which for instance needs to communicate with the
outside of the FPGA – the inspection over the design would not detect any connection
between the cryptographic module and the rest of the application, i.e., low chance for
the Trojan to be detected. Automated isolation and verification countermeasures for
FPGAs with user-controlled logic especially in data centers have already been proposed
in [123]. Such techniques usually employ some physical gap between different IP cores
with well-formed interfaces [98, 124]. Yet, we later show that such a barrier might be
breached with internal voltage sensors, even when the sensors are placed far away from
the target.
In the second scenario, an attacker has full access to the FPGA while the FPGA is
part of a large system like an SoC where CPUs reside on the same die. For example,
we can recall reconfigurable fabric of an SoC, where 3rd party users are allowed to
use the FPGA. Any underprivileged user with access to the FPGA fabric can embed
the sensors, thereby potentially monitoring the voltage of the whole SoC. This is an
increasing threat under the trend of accelerator-use.
Note that, as opposed to the previous works about covert channels passing through
this isolation, e.g., by electrical coupling [125] or even by temperature [126], we do
not alter the attacked IP core in any form and only monitor the unintentional power
consumption. The effects of electrical coupling have been further investigated in [127,
128].
77
5. PDN-based Power Side-Channel Analysis Attacks
78
5.1. An Inside Job: Remote Power Analysis Attacks on FPGAs
Figure 5.2.: Floorplan (rotated right) of one TDC Sensor with 18×(LUT, Latch) as part of
the Initial Delay.
Figure 5.3.: Architecture of the underlying AES encryption core (ShiftRows and KeySchedule
not shown)
lay, the more variation is zoomed into by the observable part. Thus, more fine-grained
quantization levels are seen with higher initial delay, when checking the peak-to-peak
variation of a given voltage fluctuation. In Table 5.1, we show the initial delay and
resulting variations observed in our experiments.
The AES module is a relatively small implementation with a 32-bit datapath, occupying
265 Flip-Flops and 862 LUTs. The 128-bit plaintext, after being XORed with the first
roundkey, is loaded into the state registers si . At every cipher round, which takes five
clock cycles, first ShiftRows is performed. Afterwards, as shown in Figure 5.3, at each
clock cycle, four Sboxes followed by a MixColumn and AddRoundKey are performed
while the state register is shifted column-wise. The four Sbox instances are shared with
the (not shown) KeySchedule unit while ShiftRows is being performed. By bypassing
the MixColumns during the last cipher round — in total after 50 clock cycles — the
ciphertext is taken from the state register.
This AES module should generate much less voltage drop than seen in [18], since its
footprint in this FPGA is only 0.3% of the total flip flops and 0.9% LUTs, versus 8% of
flip flops in [18]. However, we show in the following that we can still gather sufficient
information for the attack.
79
5. PDN-based Power Side-Channel Analysis Attacks
Figure 5.4.: Experimental setup showing the Sakura-G Board connected to our measurement
PC, with Chipscope ILA used for data acquisition.
80
5.1. An Inside Job: Remote Power Analysis Attacks on FPGAs
Figure 5.5.: Floorplans showing the Experimental Setup with all the relevant parts. Left:
the internal sensor is placed close to the AES module. Right: the internal sensor
is placed far away from the AES.
our developed sensor. Thus, in contrast to the oscilloscope that has an independent
time base, the internal sensor can sample the power consumption synchronously. The
side-channel information is expected to be amplitude-modulated over the clock sig-
nal, i.e., it is visible at the clock peaks. Therefore, it would be enough to sample the
power consumption (only) at this exact moment when the side-channel information
leakage occurs. This drastically lowers the required sample frequency for a successful
attack [130]. To verify this, we conducted different experiments by supplying the sen-
sor with different frequencies (24 MHz, 48 MHz, 72 MHz, and 96 MHz) while the AES
module always runs at 24 MHz. To this end, we used a Digital Clock Manager (DCM)
to generate the desired clock frequencies based on the external 24 MHz clock.
81
5. PDN-based Power Side-Channel Analysis Attacks
Table 5.1.: Overview of different sensor’s sampling frequency with AES module @ 24 MHz.
Sampling frequency (MHz) 96 72 48 24
No. of primitives used for initial delay 10 14 22 46
Observed peak-to-peak variation 6 6 8 15
two samples can be covered. This is an advantage over oscilloscope based samples, so
even when the sensors clock domain would be separated, enough information can be
inferred.
Although we use JTAG to connect to Chipscope in our experimental setup, an actual
attacker would easily be able to use whatever remote connection he has, to transmit the
sensor values from internal BRAM to the outside. Since no logical signaling between
the attacked module and the sensor is desired, the attacker would need to adjust a
mechanism to trigger the start of saving the samples e.g., into the BRAM. This can be
achieved by observing the measured signal itself and trigger by detecting a large peak.
Indeed, the sensor value varies only slightly, indicating that the AES is inactive (cf.
Figure 5.6). The power consumption of the first round of the AES module results in a
large negative peak in the sensor value, enabling a stable reference point for aligning
the traces.
As described in Section 5.1.1.2, for each sensor frequency, the initial delay of the
sensor has to be adjusted. This leads to different levels of quantization, and thus the
observed peak-to-peak variation. This relationship is verified by our experimental data
in Figure 5.6, where sensors at lower operating frequencies show higher peak-to-peak
variation (cf. Table 5.1).
5.1.3. Results
In the following, we provide experimental results showing a successful attack using
the traces measured by the internal sensor. We compare the results to a traditional
measurement setup, i.e., measuring the power consumption externally.
As an example, we use a standard CPA attack on the AES module. Only a few
bits within each byte of the internal state showed a strong leakage. Hence, we chose
to predict only a single bit b to evaluate our key hypothesis khyp . Note that this was
identical both for the oscilloscope as well as the internal sensor. We ran the attack on
all bits of the state. The results in the following correspond to the bit position bitpos
showing the highest correlation. We have chosen the state just before the SubBytes
operation at the last round. Based on a ciphertext byte ci , our model is
82
5.1. An Inside Job: Remote Power Analysis Attacks on FPGAs
Figure 5.6.: Single traces measured using an oscilloscope (top) and using our developed sensor
at different sampling frequencies (below). Time samples refer to the individual
samples captured at the respective sampling rate.
the number of traces are shown. Starting with the result using the oscilloscope, we
observed the maximum correlation of approximately −0.3 for the correct key hypoth-
esis. As shown, the attacks using the internally-measured traces by the sensor are also
successful. The correct key hypothesis is clearly distinguished from the others, but
with a slightly lower maximum correlation of about −0.2.
Comparing the results of the sensor at different sampling frequencies, we do not
observe a large deviation. This is caused by the synchronous sampling as most of
the information is contained in the respective peak anyway. Finally, we can observe
that the higher resolution (more quantization steps) slightly improves the maximum
correlation.
83
5. PDN-based Power Side-Channel Analysis Attacks
Figure 5.7.: Results using the oscilloscope (top row), using the internal sensor at different
sampling frequencies (rows below), for each the correlation by means of 5 000
traces (left) and the progressive curves over the number of traces (right). The
correct key hypothesis is marked in black. Time samples refer to the individual
samples captured at the respective sampling rate.
the opposite region as far away as possible from the AES module. The right part of
Figure 5.5 depicts the corresponding layout. We examined this situation only with
96 MHz sampling frequency, i.e., the worst case in Figure 5.7. The corresponding CPA
results are depicted in Figure 5.8, indicating that the successful attack is still possible
with only a slight decrease in the correlation. This highlights the high risks involved
when sharing an FPGA among multiple users. Note that for a real-world design,
additional logic might be placed in between the AES and the sensor, resulting in noise
and an increased number of required traces for a successful attack. Anyhow, such effects
are also present for an external measurement. As stated, for the presented results we
made use of a SAKURA-G board optimized for SCA evaluations. However, we were
able to collect similar traces on standard Artix-7 and Zynq-7000 FPGA evaluation
boards as well.
84
5.1. An Inside Job: Remote Power Analysis Attacks on FPGAs
Figure 5.8.: Correlation using 5 000 traces (left) and progress of the maximum correlation
over the number of traces (right) using the internal sensor at 96 MHz sampling
frequency, placed far away from the AES module.
85
5. PDN-based Power Side-Channel Analysis Attacks
Figure 5.9.: CPA attack through on-chip sensors on the Lattice ECP5 FPGA; Correlation
progress over 10000 samples for all 256 secret AES key byte candidates with the
correct key byte marked red.
The Trojan can be inserted remotely without requiring physical access and with
no signal connection to the attacked module. Further, it provides a very strong side
channel to the entire device, even if the sensor is not placed in proximity of the attacked
module. In fact, our work is a proof of concept and warns that even with 100% logical
separation between the modules, the PDN carries SCA information which makes many
security threats and attacks possible. This reveals a major vulnerability in emerging
applications of FPGAs, such as FPGA fabric being shared between multiple users.
While we have used an FPGA for our experiments, this type of attack can be transferred
to other ICs and SoCs as well.
86
5.2. Remote inter-chip power analysis side-channel attacks at board-level
system might not support the mechanisms required for trusted firmware updates at
all [135].
Full system integrity is also hard to guarantee if software or firmware from a 3rd
party is run on any chip in the system. In these situations, malicious applications can
be introduced accidentally, for instance by executing any content from the Internet,
which might just be javascript on a website [96]. In those cases, it is of high importance
to provide proper isolation of individual system components, typically handled at the
logical level (cf. javascript sandboxes). For instance, in the recent Meltdown and
Spectre [11, 12] attacks it has been shown that such isolations can be broken, and a
user can escalate their privileges and gain superuser access.
Particularly FPGAs are increasingly used as accelerators in many systems, ranging
from Cloud-computing appliances [2–4] to getting integrated in complex SoC devices.
Very small FPGAs are often inserted as glue-logic as a translation layer between other
existing devices. What all FPGAs in these use cases have in common is that they are
part of a bigger system, probably sharing the power supply with other components, i.e.
even as a PCIe device.
As the previous section and [105] have shown, a chip containing FPGA fabric can
be used to implement sensors that are sufficient for remote power analysis side-channel
attacks within the FPGA. This threat from power analysis attacks was previously
assumed to require an attacker with physical access.
In this section, we escalate the risk of remote power analysis attacks from the chip
to the board-level, affecting much more potential components of an entire system.
We show that even through multiple levels of a power distribution network, in which
capacitive, inductive, and resistive effects persist, sufficient side-channel information
can be extracted to attack a cryptographic module in another chip placed on the same
board.
Contributions: Our main contribution is a first proof of a board-level SCA attack
from one chip to another, based on software-reconfigurable firmware. In short, the
contributions of this work can be summarized as follows:
• Our results prove that board-level power analysis attacks are possible through
firmware, and also highlight the threat of a malicious chip introduced in the
supply chain.
• We provide two case studies on an inter-chip attack on AES and RSA, proving
the high risk of this threat.
• If local access to a system is given only for a short time, the attack can infect a
system in a covert way, because no external or dedicated measurement equipment
is required.
Outline: The rest of this section is organized as follows: Section 5.2.1 elaborates our
adversary model in more detail and explains some background knowledge on board-level
supplies and power-analysis side channels. In Section 5.2.2, our experimental setup will
87
5. PDN-based Power Side-Channel Analysis Attacks
1. Chip C is provided by an attacker which can access the supply chain to introduce
a malicious chip into the system.
2. Chip C is a benign chip by design but can run different software or firmware (for
instance, a cloud accelerator). The adversary is restricted to reprogram the single
device Chip C, which is logically isolated from the enclave, but shares the same
power supply with the victim in the enclave.
By means of measurements on the power supply, Chip C can thus attack Chip A in
the enclave, even when a TCB was logically established. After the attack, the adversary
Figure 5.10.: Scenario of a shared power supply leading to a risk of side-channel attacks.
In this example Chip C tries to deduce information from the power supply on
Chip A.
88
5.2. Remote inter-chip power analysis side-channel attacks at board-level
Figure 5.11.: Experimental setup showing the SAKURA-G Board connected to our measure-
ment PC.
with access to Chip C can use any type of communication channel to transmit the side-
channel information remotely and analyze it, to extract secret keys from the victim
system. If no proper communication channel exists, covert channels can be used to
transmit this information [126, 138, 139].
89
5. PDN-based Power Side-Channel Analysis Attacks
Figure 5.12.: Setup showing the configuration of the two FPGAs on the SAKURA-G board.
The sensor to attack is in the main FPGA, while the cryptographic module
(AES or RSA) runs on the auxiliary FPGA.
Our victim designs are implemented on the smaller auxiliary FPGA. This FPGA
is still large enough to fit an AES module and a small RSA implementation, both
described in the following.
The AES module we use here is a simple implementation that is not side-channel pro-
tected and follows the same design principle that was explained in Section 5.1, to be
comparable. On systems that are just remotely-accessed, considering such an unpro-
tected module is a valid assumption, since the threat of power analysis side-channel
attacks is not considered in remote adversary models. We use a 128-bit AES implemen-
tation that is based on a 32-bit datapath. The 128-bit plaintext is XORed with the first
roundkey and then loaded in the state register si . Each cipher round requires 5 cycles
in this implementation. In each subsequent cipher round, the respective operations for
AES, Byte Substitution in the Sbox, ShiftRows (not shown since it is only re-wiring),
MixColumn and AddRoundKey are performed. In total the encryption takes 50 clock
cycles and the resulting ciphertext can be acquired from the state registers. In the
used auxiliary FPGA (Spartan-6 XC6SLX9), the resource utilization is minimal, but
percentage-wise higher compared to the larger FPGA used in Section 5.1. Here ,the
AES module is also running at 24 MHz.
90
5.2. Remote inter-chip power analysis side-channel attacks at board-level
91
5. PDN-based Power Side-Channel Analysis Attacks
5.2.3. Results
In the following, we present experimental results attacking AES using CPA and RSA
using Simple Power Analysis (SPA). For both cases, the respective cipher was running
on the auxiliary FPGA while the TDC sensor captured the inter-chip voltage drop
through its supply pin on the main FPGA.
92
5.2. Remote inter-chip power analysis side-channel attacks at board-level
Figure 5.13.: Averaged traces measured during AES using the TDC sensors at different sam-
pling frequencies.
close to the FPGA chip were placed. A direct comparison at the sampling frequency
of 24 MHz is depicted in Figure 5.15. As expected, such additional capacitance acts as
a low pass filter that can be compensated by increasing the number of captured traces.
Indeed, when sampling at 24 MHz the correct key candidate started to stand out after
approximately 2.5 million traces using the default configuration but powering through
the same power supply. Note that 2.5 million traces still only correspond to around
38 Mbyte of encrypted data when using AES-128.
Based on the results for AES, we chose to measure the RSA core using a sampling
rate of 24 MHz. As before, both FPGAs share a power supply and all capacitors are
in place. The RSA is running at 24 MHz. Thus, we require at least 50 176 cycles to
capture the whole binary exponentiation (224 clock cycles for each of the 224 steps).
Figure 5.16 depicts the raw trace with an already visible variation over a time span of
approximately 2100 µs.
93
5. PDN-based Power Side-Channel Analysis Attacks
Figure 5.14.: CPA attack on AES: Results to estimate the sensor quality at different sampling
rates, with a board when all relevant capacitors are removed. Each row shows
the correlation using 500 000 traces (left) and the progressive curves over the
number of traces (right). The correct key hypothesis is marked in black. Time
samples refer to the individual samples captured at the respective sampling rate.
We recall that the adversary’s goal is to recover the secret exponent by identifying
whether the multiplication took place or not. Every time a multiplication is performed
(in parallel to a squaring), the circuit consumes more power. Figure 5.17 depicts a
detailed view after applying a low pass filter with a cut-off frequency of 900 kHz. Instead
of simply capturing the increased power consumption during the multiplication, we can
observe that the TDC sensor receives a differential signal of the encryption. Thus, we
have to consider three different cases how the conditional multiplication in the binary
exponentiation will affect the TDC sensor:
94
5.2. Remote inter-chip power analysis side-channel attacks at board-level
Figure 5.15.: CPA attack on AES: Progressive curves over the number of traces with a board
when all relevant capacitors are removed (left) and with the default capacitor
configuration (right), both sampled at 24 MHz. The correct key hypothesis is
marked in black. Time samples refer to the individual samples captured at the
respective sampling rate.
• In case the multiplication is switched from the on-state to the off-state, i.e.,
the FPGA suddenly consuming less power, the voltage overshoots briefly until
compensated. This leads to a positive peak in the trace due to the accelerated
sensor. This case is marked using an arrow pointing upwards. Also note the very
large positive peak at the end of the exponentiation in Figure 5.16, indicating
that both the multiplier and the squaring module got deactivated.
• If the state of the multiplication does not change, i.e., either staying enabled or
staying off, the power consumption remains identical. Thus, the voltage level is
constant, causing a steady sensor value. This is indicated by a dash.
These three cases are marked in the magnified view of Figure 5.17. Indeed, the se-
cret exponent can be read out easily even though the RSA and the TDC sensor are
implemented on separate FPGAs sharing the same source of supply voltage.
5.2.3.3. Discussion
Our results prove that board-level power analysis side-channel attack threats exist, even
in the presence of decoupling capacitors. Of course, same or even better results can be
achieved when adding a dedicated ADC directly to the power rails, but obviously not
without raising questions of its use. Instead, a seemingly disconnected FPGA would
not raise any alarm even in a fully trusted supply chain. The malicious behavior can
be enabled later on by a firmware to measure the supply voltage with the sensors.
Note that this threat is not limited to FPGAs in a board-level integrated system
when full supply chain trust is not ensured. Instead, such a threat exists for any
untrusted chip on the board. For example, an attacker could use an undocumented
internal ADC connected to the shared power supply as a power analysis attack vector.
The same is true for any other chip on the board that can be used as a measurement
device through maliciously altered firmware. An increasing number of sensors are
integrated into all kinds of chips for increased reliability and monitoring purposes,
even for voltage fluctuations [56, 142], elevating the risk of maliciously measuring them
for power analysis attacks. In a system where only remote access is considered to be
95
5. PDN-based Power Side-Channel Analysis Attacks
Figure 5.16.: Binary exponentiation for RSA captured with the TDC sensor on separate
FPGAs, sampled at 24 MHz (raw trace).
Figure 5.17.: Detail of the binary exponentiation captured with the TDC sensor after applying
a 900 kHz low-pass filter. Dotted lines mark the time-span of an individual
step in the binary exponentiation. Arrows indicated whether the state of the
multiplication module changed (on to off: arrow upwards, off to on: arrow
downwards, and no change: dash). The bits above are a part of the (correctly)
recovered secret exponent according to this classification.
an attack vector, integrated cryptographic accelerators are often not protected against
power analysis side-channels (i.e., only timing side-channels are avoided). Such systems
could thus be attacked remotely if proper electrical isolation on the board integration
level does not exist.
96
5.3. Leaky Noise: New Side-Channel Attack Vectors in Mixed-Signal IoT Devices
on the electrical level, as even a remote attacker might get hold of sufficient sensors for
inter-chip board-level power analysis attacks.
97
5. PDN-based Power Side-Channel Analysis Attacks
In the remaining section, we first explain preliminaries in Section 4.2.1 regarding our
adversarial model and the essential background information, including related work.
We then explain our experimental setup in Section 5.3.2. Our results are presented
in Section 5.3.3, and discussed in Section 5.3.4. Finally, the section is concluded in
Section 5.3.5.
98
5.3. Leaky Noise: New Side-Channel Attack Vectors in Mixed-Signal IoT Devices
(a) Variant of the adversarial model in (b) Variant of the adversarial model where
which a malicious task (Task B) could side-channel leakage is embedded in
gain knowledge of secret information sensor data that leaves the system. An
processed in the victim (Task A), cir- external attacker can then use the sen-
cumventing any access restrictions. sor data to retrieve secrets from Task A.
Figure 5.18.: Basic principle of the two variants of our adversarial model considered in this
section. In both cases an ADC in the Analog Subsystem is biased from Task A
in the digital subsystem. This bias can contain secret information that Task A
processes.
for 32-bit microcontroller systems that can be used in IoT applications. These are the
ESP32-devkitC, STM32L475 IoT Node, and two copies of the STM32F407VG Discov-
ery, which were bought apart from each other. These two boards were checked in order
to see how sample variation affects the results.
In the two STM32 microcontrollers, a Memory Protection Unit (MPU) is integrated
to prevent operating system tasks from reading memory outside their allowed range.
We did not use that unit, but it shows that these systems actually support a certain level
of isolation, which could potentially be broken through the ADC noise side-channel.
All platforms run in an operating frequency range of 80-168 MHz, and respective
ADC sampling frequencies were chosen, such that a whole trace of one cryptographic
operation can be saved in internal SRAM memory. The ADCs of these platforms all
support a 12-bit operation mode, which we selected when not noted otherwise.
The power supply on all the microcontroller boards uses the 5 V USB power as
input, which we supplied from a standard PC USB output. All boards use a volt-
age converter to produce a 3.3 V voltage for the Vdd of the respective controller. In
the STM32F407VG Discovery and STM32L475 IoT Node platform, the manufacturer
added a compensation network of capacitors and inductors through which the 3.3 V is
connected to the ADC reference pin. We did not do any modifications on any of the
boards, and thus also kept this compensation network. In the ESP32-devkitC platform,
only an internal ADC reference exists. The ADC of this platform can also be inter-
nally connected to Vdd, which we used throughout the results in this section for ’Vdd’,
instead of an external connection. The ESP32-devkitC contains three CPU cores, with
two Xtensa 32-bit CPUs and a ultra-low-power (ULP) core that can run independently
and also collect ADC samples. The STM32F407VG Discovery and STM32L475 IoT
Node are single core platforms with Cortex-M4 CPUs, such that DMA is required to
99
5. PDN-based Power Side-Channel Analysis Attacks
Figure 5.19.: Overview of our common experimental setup, shared among the used platforms.
sample the ADC in parallel to a running CPU. This information is listed together with
details on sampling in Section 5.3.2.3, Table 5.3.
100
5.3. Leaky Noise: New Side-Channel Attack Vectors in Mixed-Signal IoT Devices
Table 5.2.: Used vendor toolchain versions and respective library and compiler versions
Platform Framework mbedTLS FreeRTOS Compiler(s)
xtensa
Espressif 8.2.0 Xtensa gcc 5.2.03
ESP-IDF 3.11 2.12.0
ESP32-devkitC Port2 esp32ulp
2.28.514
ST Microelectronics
STM32CubeMX5
STM32F407VG 2.6.1 9.0.0 arm gcc 7.3.16
4.26.1, 5.0.1
Discovery
ST Microelectronics
STM32CubeMX5
STM32L475 IoT 2.6.17 9.0.0 arm gcc 7.3.15
4.26.1
Node
1
Espressif IoT Development Framework https://github.com/espressif/esp-idf/
2
Espressif explains the Xtensa Port in https://docs.espressif.com/projects/esp-idf/en/v3.
1/api-reference/system/freertos_additions.html, which mainly adds multicore support
3
crosstool-ng-1.22.0-80-g6c4433a-5.2.0 as linked in https://docs.espressif.com/projects/
esp-idf/en/v3.1/get-started/linux-setup.html
4
v2.28.51-esp32ulp-20180809, as linked in https://docs.espressif.com/projects/esp-idf/en/
v3.1/api-guides/ulp.html
5
STM32CubeMX Eclipse plug in https://www.st.com/en/development-tools/stsw-stm32095.
html, 4.26.1 was used for leakage assessment, 5.0.1 was used for the CPA attack in Section 5.3.3.5.
6
GNU MCU Eclipse, based on arm-none-eabi-gcc 7.3.1-1.1-20180724-0637 from https://
gnu-mcu-eclipse.github.io/blog/2018/07/24/arm-none-eabi-gcc-v7-3-1-1-1-released/
7
For this platform, none was provided in CubeMX, but the version from STM32F407VG worked
directly
principal information leakage. Please note that advanced operating modes of AES, like
counter-mode, are on principle also vulnerable to power analysis [156], but require more
effort. A knowledgeable attacker could deploy such attacks. For modular exponentia-
tion, we use mbedtls_mpi_exp_mod, which is also used in the RSA implementation of
mbedTLS, but does not prove its overall vulnerability. Please note that we only use
mbedTLS such that we have cryptographic relevant code for the leakage assessment,
but not to show any new attack on this specific library.
101
5. PDN-based Power Side-Channel Analysis Attacks
the performance of various tasks. Other controllers offer programmable state machines
or specific low-power cores that can control the ADC and other peripherals in parallel
to normal task execution on the main CPU(s).
In Figure 5.20 we show a more detailed description how the two tasks are executing in
parallel in our experiments. More detailed example code can be found in Section B.3.
At the beginning, Task A receives a message to be encrypted through UART and
notifies a sleeping/waiting Task B using a FreeRTOS notification (as the helper signal).
Task B starts collecting ADC data in parallel to Task A, either in dual-core operation
or using DMA operation of the ADC. Task A then encrypts the message using a
previously stored key, while the ADC is collecting data. After the encryption, it waits
for a notification. Upon finishing the fixed number of ADC samples, Task B sends
them to the workstation, and notifies Task A so everything can start afresh for another
message. Due to differences in the systems, we use DMA transfer in the two STM32
systems, and dual-core operation in the ESP32. However, after we had performed all
experiments, we found the ESP32 also has a sampling mode that does not require the
second core, by using its i2s-module.
There is still a fundamental difference between using the ADC with DMA, versus
using software on a CPU to acquire individual ADC samples in a loop. This difference
is visualized in Figure 5.21. Running the ADC in a continuous mode with DMA
at lower speeds will basically use more time in the ADC conversion itself, but will
not reduce the total time range in which the measurement is influenced by noise.
In comparison to that, in single-conversion software-based sampling, a certain time
between the ADC samples will be used to run software to store data and prepare the
next ADC acquisition. If we want to change the sampling rate, we will need to add
delay in software. Any analog noise in that time will not affect the ADC result, and
thus some side-channel leakage might not be captured in the acquired ADC data. In
our experiments we also introduced delays in CPU-based sampling, because of internal
memory size limitations. Since sampling with the CPU is usually slower, too, the
sampling rate can also be negatively impacted by CPU-based sampling.
Usually a sampling frequency above the CPU or circuit frequency is recommended
for power analysis attacks, but is not necessarily required for successful attacks [157].
For the platforms we use, an additional limitation is the amount of available internal
memory to store all samples. For RSA, we typically had to sample rather slow to not
fill the memory (2000-4000 samples per encryption). For AES we had to sample almost
uart_read(message)
xTaskNotifyGive(..)
ADC sample collection
mbedtls_mpi_exp_mod(msg) or
(DMA or Task)
mbedtls_internal_aes_encrypt(msg)
Figure 5.20.: Description of one loop iteration of the two FreeRTOS tasks.
102
5.3. Leaky Noise: New Side-Channel Attack Vectors in Mixed-Signal IoT Devices
Figure 5.21.: Principle how different ADC sampling styles will cover less or more parts of the
voltage noise affecting the ADC result. DMA needs to be used for continuous
sampling, while software-based sampling (in the multi-core scenario) will always
introduce some gaps.
Table 5.3.: Overview of Leakage Assessment Experiments, repeated for ADC Pin =
{Vdd, GND, N/C}
ADC Samplerate
Sampling
Platform and Experiments Ref. Algorithm /
Style
Filter #Samples
ULP-CPU AES-128 104 kHz / 16
ESP32-devkitC @80MHz No
CPU Exp-512 20.4 kHz / 2600
AES-128 980 kHz / 32
STM32F407VG Discovery @168MHz DMA Yes
Exp-512 88 kHz / 4096
AES-128 684 kHz / 64
STM32L475 IoT Node @80MHz DMA Yes
Exp-512 40 kHz / 4096
as fast as possible, to achieve a reasonably high sampling rate (100-1000kHz). For the
two STM32 devices this could be achieved by different ADC-DMA setups. For the
ESP32-devkitC we had to use the ULP-CPU to sample the ADC fast, and a normal
CPU task to sample it slow. That is because the ULP has only access to limited
memory below 2048 bytes in total, but also has a dedicated ADC instruction for fast
sampling. The finally-used sampling rates depending on the used board and algorithm
are shown in Table 5.3. The shown values were estimated by measuring on an external
pin, since it was not always clear from the software to find out the actual sampling rate.
It might be feasible to achieve higher sampling rates on the ESP32 with its i2s-module.
103
5. PDN-based Power Side-Channel Analysis Attacks
source of randomness is provided to the library. The RSA private key function of
mbedTLS (mbedtls_rsa_private) uses bigint arithmetic, and among other functions
calls mbedtls_ mpi_exp_mod, which performs a sliding window modular exponentia-
tion. That function is where we perform some of our leakage assessments on, with a
512-bit exponent and modulus. We use the message transmitted by UART as the base
of this exponentiation. The other function we analyze is from the AES algorithm.
The mbedTLS library implements the AES using a T-table lookup based approach.
Originally, the AES is a round-based block cipher, where four different operations Sub-
Bytes, ShiftRows, MixColumns and AddRoundKey are repeatedly applied on a 128 bit
data block. A popular optimization is to implement those operations as a combination
of multiple table lookups and a subsequent XOR operation. This optimization requires
the precomputed T-tables, which take an input byte and output a 32 bit word. Besides
the addition of the first round key and the last round, the remaining rounds are exe-
cuted in pairs of two inside each loop iteration. Apart from these optimizations, the
mbedTLS AES implementation does not diverge from the textbook AES encryption
algorithm and does not include any side-channel countermeasures in particular.
5.3.3. Results
Before doing leakage assessment or CPA, we first show that ADC noise correlates with
the power consumption of the board. Subsequently, leakage assessment is performed
on AES and modular exponentiation of the mbedTLS library.
In all of the tested platforms, side-channel leakage was found in at least one of the
tested cryptographic algorithms and operation modes (configurations) of the ADC. In
many setups, generic noise was observable on the ADC, even when it was pulled to Vdd
or GND. Just in a few setups, we actually observe zero variance in the ADC output,
such that information leakage is impossible.
For this experiment, we use the STM32F407VG Discovery #1, and run the CPU in
high and low activity phases that should be easily distinguishable. In high activity
phases, we perform floating point operations, whereas during the intermediate low
activity phases, we issue nop-commands.
In Figure 5.22, we show the average of 1000 traces of the supply voltage Vdd and
supply current Idd measured with an oscilloscope. Concurrently, we show the acquired
samples of 6-bit ADC values at maximum sampling rate using DMA mode. Both were
recorded during the same activity phases of the CPU on the STM32F407VG Discovery
board. The ADC is connected to a floating pin, and neither to GND or Vdd . The
different phases of workload activity phases can easily be visually distinguished in both
of the traces. Our activity pattern can be identified in the externally collected traces for
both Vdd and Idd in the timeframe from 0µs to about 160µs, whereas the data transfer
of ADC traces to a workstation occurs after 160µs. Likewise, the ADC average values
in the bottom diagram reflect the activity in a clearly distinguishable way, albeit not
with linear correlation.
104
5.3. Leaky Noise: New Side-Channel Attack Vectors in Mixed-Signal IoT Devices
Figure 5.22.: Average over 1000 traces for oscilloscope measurements on Vdd and Idd in the
first two plots and average of concurrently measured ADC values when the
ADC was set to 6-bit and the pin was not connected (N/C) on the bottom.
High activity phases are marked grey.
The preliminary experiment to compare CPU activity phases already shows promising
results. However, this experiment is a synthetic test case in which extremely high and
low activity phases were chosen on purpose. Subsequently, we prove that even minor
differences in the data processed by cryptographic algorithms affects the ADC noise in
a systematic way. For that proof, we can already use the data required as a prerequisite
for the leakage assessment we explained in Section 2.7. We collect two sets of traces
while performing encryption operations. One of the sets contains the fixed traces, while
the other set contains the random traces. For this example, we connect the pin used for
ADC to GND and again use the same board as used in Section 5.3.3.1, STM32F407VG
Discovery #1.
In Figure 5.23, we compare the ADC noise that occurs during sliding window expo-
nentiation with a 512-bit secret exponent and modulus. In the first plot, an average
over 100,000 (100k) ADC traces is shown, where an exponentiation is performed on
the same fixed base. The second plot also shows the average, but with exponentiation
on 100k different random bases. The red lines show the averages, while the grey back-
ground contains all the single traces. The differences between these plots are indeed
distinguishable without further processing. Even more, a pattern is already visible in
105
5. PDN-based Power Side-Channel Analysis Attacks
Figure 5.23.: Average over 100k traces for a fixed base exponentiated in a mbedTLS sliding
window exponentiation, and 100k traces with each a random base exponentiated
with the same secret exponent on the STM32F407VG Discovery #1. ADC
connected to GND.
Figure 5.24.: Average over 1M traces for a fixed message encrypted with mbedTLS AES and
the FIPS key, and 1M traces with each a random message encrypted with the
same key on the STM32F407VG Discovery #1. ADC connected to GND.
the average of the fixed traces, which is smoothed out in the case of random traces.
This example already shows that a power analysis attack for secret key extraction later
on might be feasible.
Additionally to modular exponentiation (Mod-Exp), we also show an average of
AES-128 for fixed and random messages in Figure 5.24. Since AES executes in much
shorter time relative to the ADC speed, we can only acquire a few samples, which
are at best 2-3 samples per AES round for this specific board. For AES, we collect
1,000,000 (1M) traces for each of the two sets of fixed and random messages. Similar to
Mod-Exp, the differences between the traces are visible, with the most distinguishable
106
5.3. Leaky Noise: New Side-Channel Attack Vectors in Mixed-Signal IoT Devices
Figure 5.25.: First order leakage assessment results based on a fixed-vs-random t-test for
100k traces collected during modular exponentiation on the STM32F407VG
Discovery #1.
Figure 5.26.: First order leakage assessment results based on a fixed-vs-random t-test for 1M
traces collected during AES encryptions on the STM32F407VG Discovery#1.
107
5. PDN-based Power Side-Channel Analysis Attacks
Figure 5.27.: Leakage Assessment on mbedTLS AES and modular exponentiation (Mod-Exp)
with {Vdd, GND, N/C} connected to the ADC on all platforms. Flat lines on
the bottom are constant zero, shifted for visibility.
higher frequency than 104kHz is reachable, which leads to less than 16 samples over
the complete AES runtime. For modular exponentiation, the runtime is longer. Thus,
we sample slower (20 − 88kHz) to collect only as much samples as the memory capacity
of the internal SRAM.
In Figure 5.27, we show six plots of first order leakage assessments on AES and mod-
ular exponentiation (marked as ‘Mod-Exp’), separated by algorithm and the connection
of the ADC being Vdd, GND, or N/C, respectively. In these plots, the change of the
highest |t|-value inside the leakage assessment interval (cf. Figure 5.25,Figure 5.26) is
shown, over an increasing amount of traces used for the evaluation.
We start with the AES algorithm in the left column of Figure 5.27. We compare the
platforms when the ADC is connected to Vdd. In this configuration, |t| reaches values
clearly beyond the confidence threshold of 4.5, suggesting that all platforms are leaking
the information processed in the AES algorithm. For the case of a connection to GND,
both samples of the STM32F407VG Discovery leak, but not the other boards. For the
other boards, the ADCs actually output a constant value, such that Ground-Noise does
not occur. In the case, when we have no connection (N/C) on the ADC, i.e. the pin
is in a so-called floating state, all of the boards exhibit leakage (|t| / 4.5). We also
looked into second order leakage. However, there were no changes with respect to the
108
5.3. Leaky Noise: New Side-Channel Attack Vectors in Mixed-Signal IoT Devices
Table 5.4.: Overview of Leakage Assessment over all tested Platforms and Configurations;
ADC connected to {Vdd, GND, not connected (N/C)}. Amount of collected traces
are 100k for modular exponentiation, and 1M for AES. The ADC was noise-free
when σ=0.
Leakage detected? (t > 4.5)
Platform AES-128 (ADC fast) Mod-Exp-512 (ADC slow)
Vdd GND N/C Vdd GND N/C
ESP32-devkitC @80MHz yes σ=0 yes no1 σ=0 no1
STM32L475 IoT Node @80MHz yes σ=0 yes yes σ=0 σ=0
STM32F407VG Discovery #1 @168MHz yes yes yes yes yes yes
STM32F407VG Discovery #2 @168MHz yes yes yes yes yes yes
1
For the center 1/3 trace. For the beginning and/or end of the cryptographic function, |t| was above
4.5. For more details check Section B.1, Figure B.2.
In summary, the ADC settings such as the sampling frequency and connection to Vdd,
GND or N/C levels affect the observable side-channel leakage. This relation exists,
because the sampling frequency also changes the inherent noise characteristics. The
connection of the ADC affects how it is coupled to the remaining parts of the system,
particularly the digital (CPU and memory) subsystem. In most cases when the ADC
shows any noise (sample data with σ > 0), the t-test detects leakage in the ADC data.
In other cases, when the ADC is completely noise free (σ = 0), surely no information
leakage is possible. The two boards that we used of the same type lead to almost the
same results, leading to the conclusion that sample variation only has a minor effect.
We summarize these results in Table 5.4. For reference, Section B.1 shows all leakage
assessments across all the boards in detail, including the assessment over the complete
time window when ADC samples were acquired (cf. Section 5.3.2.2).
109
5. PDN-based Power Side-Channel Analysis Attacks
(a) Total correlation after 10M traces for all 256 key byte candidates; The correlation
with the correct key byte is marked red
(b) Correlation progress over 10M traces for all 256 key byte candidates; The correla-
tion with the correct key byte is marked red
Figure 5.28.: Results of a CPA attack on the 6th byte of the last secret round key of AES on
the STM32F407VG Discovery #1 @168MHz with the ADC connected to GND
and the program compiled with the -Os optimization option (like for leakage
assessment).
In this subsection, we present results of CPA attacks on secret AES round keys in
different setups. We perform a ciphertext-based CPA on the last round of AES over
10M ADC traces and show both the final correlations for each key byte candidate with
the entire set of traces as well as the correlation progress over the amount of traces.
We evaluate the different preprocessing and power model variants, which are explained
in Section 2.8. Pre-aligning traces with a shift of ±2 using normalized cross-correlation,
and performing CPA with the standard S-box Hamming distance model is the most
successful variant. The CPA experiments are performed on the STM32F407VG Dis-
covery, which shows the most promising results during leakage assessment. Although
we attacked all 16 bytes of the AES round key, only the best results for the respective
setup are presented here. We state the total amount of recovered bytes for each setup
and display the correlation plots for all key bytes in the appendix.
We first evaluate a CPA attack on 10M traces using the same parameters as for
leakage assessment, when the ADC pin is connected to GND. Here, AES takes about
13µs, leading to an estimated 17 samples during the complete AES function call to
110
5.3. Leaky Noise: New Side-Channel Attack Vectors in Mixed-Signal IoT Devices
(a) Total correlation after 10M traces for all 256 key byte candidates; The correlation
with the correct key byte is marked red
(b) Correlation progress over 10M traces for all 256 key byte candidates; The correla-
tion with the correct key byte is marked red
Figure 5.29.: Results of a CPA attack on the 12th byte of the last secret round key of AES
on the STM32F407VG Discovery #2 @56MHz with the ADC connected to Vdd
and the program compiled with the -O0 optimization option.
111
5. PDN-based Power Side-Channel Analysis Attacks
best correlation appears when attacking the 12th byte. The results can be seen in
Figure 5.29. A peak during the last part of the encryption is clearly visible, again
indicating the last AES round. Furthermore, we see that the correct key byte, which
is marked red, correlates much clearer with the collected traces than the incorrect key
bytes.
We conclude that key recovery attacks on the AES with data from ADC traces are
generally possible, albeit the success depends on the overall system parameters, such as
the clock speed, the ADC pin connection, and possibly even the code optimization at
compile time. For data collected with the ADC pin left unconnected, we were unable
to recover secret key bytes successfully.
5.3.4. Discussion
Our results prove the existence of a correlation between the data processed in a micro-
controller, and the noise that can be observed in their integrated ADCs. The correlation
is strong enough to distinguish the data processed in cryptographic algorithms, running
on a CPU in the system. By leakage assessment it was shown that this observation
is valid in most cases. Furthermore, we proved that the leakage can be sufficient to
perform a CPA-based key recovery attack on AES. Due to the protection mechanism
that can be enabled for the full RSA implementation of the used mbedTLS library,
we assume that more advanced attacks are required for that, but are generally feasi-
ble [158–160].
The performed experiments reveal an underlying problem of highly integrated mixed-
signal systems that consist of analog and digital components in a single chip. Such a
level of SoC integration is an increasing trend in many hardware platforms for IoT
applications and beyond. In these systems, both power supply based coupling effects
as well as crosstalk can cause the noise in the analog part, which can then be exploited
by anyone with access to the data measured in the analog circuit.
112
5.3. Leaky Noise: New Side-Channel Attack Vectors in Mixed-Signal IoT Devices
manipulated by any user that accesses the website. In effect, if this reference application
is used as the basis of a product, it introduces a threat. On the other hand, 3rd party
smartphone applications often need to use various analog sensors of the system, which
could contain the required side-channel information to perform an attack. Even a
typical audio sampling rate is already fast enough to distinguish differences in modular
exponentiation on the tested platforms, and it was already reported that such a low
sampling rate can be sufficient to attack RSA [161].
Our experimental setups were chosen for comparability across different vendor sys-
tems, and additionally to allow methods for leakage assessment to be performed easily.
Yet, these setups are sufficient to prove that ADC noise is generally a possible source
of side-channel information. Using the aforementioned example, this could actually be
exploitable in some existing products, and adds a dangerous new way to acquire power
side-channel information completely remotely.
If an attack on a real system can succeed depends on additional aspects. Next to
the basic requirement of access to ADC data, it is also required that the data can
be sampled during cryptographic operations, and the collected traces to be aligned
properly. However, it was shown that even in full commercial implementations, many
of such obstacles can still be circumvented, and complex attacks can be performed by
considering more aspects of a full system [11, 137, 162, 163]. For instance, it is often
still possible to find a way of synchronization. At least an estimation at which time an
encryption starts can often be achieved through the behavior of the overall application.
A remote user can thus estimate which part of sensor data might contain exploitable
side-channel leakage.
Another aspect in real systems is the side-channel data being modulated on top of
other sensor measurement data, or the available sampling rate. However, this should
only increase the number of traces required for differential attacks, which are specifically
suited to cope with such situations. Often the sensor data is not much of a problem if
it changes rather slowly, for instance a temperature sensor. It was also shown that a
sampling rate below the expected side-channel leakage can still be sufficient to perform
a full attack [157, 161].
Our results imply that even sensor data that a microcontroller measures and sends
over a digital connection made available online can leak sensitive information which is
accessible from anywhere in the world. In many scenarios, leaking the secret keys of
a sensor node might be just a small issue, but in other scenarios a more sophisticated
attacker could use such information as part of a larger scale attack, for instance on
SCADA systems [164, 165].
113
5. PDN-based Power Side-Channel Analysis Attacks
When designers are aware of this possible threat, it is achievable to design proper
mitigations, depending on the application. The task of mitigation, for instance, can be
considered in the hardware design of the ADC module to reduce or completely remove
the noise-related leakage. In small IoT systems in which all executed code can be
trusted, information leakage and possible attacks require a certain level of care, and
we suggest either of the following options as a possible isolation practice:
• It must be guaranteed that any analog measurements can only take place mutually
exclusive to security-related computations.
• Filter the ADC data in a way that leakage cannot be assessed anymore. For
instance, a filter for noise or specific frequencies might make attacks infeasible,
or reduce effective sampling rates.
114
Part III.
115
6. Related Work and
Countermeasures
The sections in this chapter were partially overtaken from previously published works
included in this thesis, which were co-authored with (in no particular order): Falk
Schellenberg, Jonas Krautter, Fabian Oboril, Saman Kiamehr, Amir Moradi and
Mehdi B. Tahoori.
Next to the work carried out in this thesis, fault or side-channel attacks that can
be exploited even remotely through software are recently increasing. Probably the
most impactful seminal findings have been the Spectre and Meltdown speculative ex-
ecution attacks that use cache timing side-channels [166, 167] as covert information
channels [11, 12] to exfiltrate information from a speculative execution context. These
impact a wide range of mainstream CPUs in use, and have thus motivated further re-
search in the direction of microarchitectural attacks. However, in contrast to that, this
thesis focuses on a lower level, in which the electrical characteristics of the system can
be exploited and impacted from the software side. Thus, specifically power analysis
and timing or voltage-based fault attacks are the main focus.
117
6. Related Work and Countermeasures
on the ARM CPU, affecting trusted applications or hardware in the SoC which other-
wise should not be accessible. Using CLKSCREW, the used RSA signature scheme in
TrustZone can also be subverted, leading to execution of self-signed applications. More
recently, we also saw control over power management being used to attack Intel SGX
in [176–178].
When looking into FPGAs, this thesis shows the first results on injecting tran-
sient faults merely through reconfiguration in Chapter 4. Others have extended these
attacks. In [179], timing faults are caused that make a random number generator mis-
behave. Other works have shown alternate ways to cause voltage drops or inject timing
faults. By specific memory access patterns to FPGA-internal BRAM, sufficient voltage
drop to cause faults can be generated [180]. On the other hand, alternate ring oscillator
designs, which do not require a combinational loop, have been shown in [109]. In [181],
it is further characterized what exact on-chip spatial and timing conditions are needed
to inject faults successfully.
118
6.3. Countermeasures
leakage could also be observed on digital port pins not connected to the power supply.
Thus, it is also an indication that an ADC could observe such leakage if it is connected
to a port pin from the inside. Extending the work presented in Section 5.3, O’Flynn
et al. [186] uses the noise of an ADC to perform power analysis attacks targeting the
secrets of an ARM TrustZone-M implementation.
6.3. Countermeasures
Both categories of attacks — fault or side-channel — require pragmatic solutions. For
FPGAs, some initial solutions have already been published in collaboration with the
author of this thesis, while minor suggestions have already been given in the respective
earlier chapters. In general, power analysis side-channel attacks could be tackled with
traditional countermeasures, such as hiding and masking schemes. However, given the
specific nature of FPGA on-chip attacks, other types of countermeasures can also be
developed.
One of these countermeasures is Active Fencing, which takes chip-wide floorplanning
and spatial relations of attacker and victim circuits into consideration [187]. Earlier
isolation techniques for FPGA security already suggest to put unused logic slices be-
tween victim and attacker circuits inside the FPGA [98], which should be able to
protected against attacks that are based on crosstalk between adjacent wires of victim
and attacker. Active fencing goes one step further, and instead puts active logic in
those slices. That logic can cancel out a significant degree of side-channel leakage that
travels through the on-chip PDN. In [187] it is demonstrated that a CPA attack can
require 166 × more traces when Active Fencing is applied.
Another possible countermeasure is to perform a check on the bitstream, which was
briefly mentioned for DoS in Section 4.1. A supervisor is integrated that reverse engi-
neers and checks bitstreams before they are loaded into the FPGA as a sort of FPGA
Anti-Virus. By analyzing the bitstreams for malicious signatures, circuit configurations
used for attacks can be detected and prevented to execute on the FPGA. With perfect
detection accuracy, all types of fault and side-channel attacks presented here could be
prevented. Like all detection approaches, the practical difficulties are in suggesting sig-
natures that allow to detect all malicious bitstreams, without rejecting actually benign
bitstreams. In [188] and [189] various signatures and detection methods are suggested.
Fundamental circuit properties are formulated that can be used for indirect sensing of
voltage fluctuations or to cause faults through voltage drop, as was presented in the
attacks in Chapter 4 and Chapter 5. In [189], these properties are then evaluated on a
broad range of benchmark circuits.
119
6. Related Work and Countermeasures
from different designs are routed through the same switch matrix. As an isolation
mechanism, so-called moats provide what is considered as physical isolation by adding
unused FPGA slices between logical blocks, while drawbridges are used for a more re-
stricted communication [98]. Major FPGA vendors such as Xilinx and Intel (formerly
Altera) picked this up to suggest similar design flows for secure isolation [124, 190].
This countermeasure is considered a physical separation, however it misses the shared
PDN, and by that can not prevent the attacks shown in Chapter 4 and Chapter 5.
However, these isolation mechanisms protect against another type of attack that has
just recently been shown. In [125] it was shown first that long wires inside a single
switch matrix can influence each other. Later on, it was shown that this is sufficient for
a side-channel attack to extract information in [153]. Thus some studies have followed
to analyze the exact behavior of this long wire coupling effects [191, 192] Since moats
put additional space, that cross influence can be prevented, at the cost of unused slices.
120
6.5. Results of this Thesis and similar Related Work
Table 6.1.: Overview of experimental results on side-channel or fault attacks in FPGAs, sys-
tems containing FPGAs, or FPGA-based SoCs.
Attack successful?
121
7. Conclusion and Perspectives
7.1. Conclusion
The experiments conducted for this thesis show that new security threats emerge with
new use-cases of FPGA hardware, such as multi-tenancy, or as accelerators in a multi-
user system. Albeit FPGAs are intended as digital circuits, the methods used in this
thesis allow that voltage fluctuations can be generated or observed inside FPGA de-
signs, in a way sufficient to perform attacks. Thus, with software-based reconfiguration
of an FPGA, physical attacks such as power analysis and fault attacks are feasible in-
side the FPGA, but also to other chips that share the same power supply. Previously,
such attacks were believed to require local access to the device under attack, and ded-
icated test and measurement equipment. By this thesis these threats are now lifted
to a potentially remote attacker, exemplifying that they should not be disregarded in
threat modeling.
The results for FPGAs also indicate that the electrical level of semiconductor chips
in general should not be disregarded for a system-wide security analysis. Thus, to
demonstrate this generality, the thesis also analyzes a similar threat for mixed-signal
IoT devices. In that case, it reveals that power side-channel information is indirectly
available as the noisy part of on-chip ADCs.
In conclusion, this thesis has shown that the electrical level of semiconductor chips,
foremost FPGAs, is a feasible attack vector, even when the attacker has no direct
physical access to the device. The results of this thesis suggest that solutions have
to be found for multi-tenant FPGA security, before they can be used responsibly on
a wider scale. Considering other semiconductor devices, system-wide security threat
modeling needs to include the electrical level, and not only when a local attacker
is considered.
123
7. Conclusion and Perspectives
tive system-level approach in which an FPGA design can only be loaded after it gets
checked.
More generally, these countermeasures are very specific to the application so far,
and do not yet solve or grasp the underlying problem of sharing a power distribution
network among components with different security privileges. For that, the underlying
electrical level needs to be more carefully analyzed in each new chip generation and
stands as an open research problem, which has also been introduced into the research
community recently.
Finally, as an interesting byproduct, the results of this thesis are also valuable for
education. Since this thesis showed that low-cost FPGA boards alone are sufficient to
perform practical fault or side-channel attacks, expensive test and measurement equip-
ment is not required anymore. By that, each student can be provided with their own
board in which all necessary experiments can be performed, even at home. Together
with an accompanying lecture, a course Practical Introduction into Hardware Security
is already established at the Karlsruhe Institute of Technology, in which we use these
benefits to teach fault and power analysis attacks.
124
Part IV.
Appendix
125
Bibliography
[1] Y. Xu, O. Muller, P.-H. Horrein, and F. Petrot, “HCM: An abstraction layer
for seamless programming of DPR FPGA,” in 22nd International Conference on
Field Programmable Logic and Applications (FPL). IEEE, aug 2012.
[2] K. Eguro and R. Venkatesan, “FPGAs for trusted cloud computing,” in Interna-
tional Conference on Field Programmable Logic and Applications (FPL). IEEE,
2012, pp. 63–70.
[9] H. Cloud. (2020) FPGA Accelerated Cloud Server (FACS). [Online]. Available:
https://www.huaweicloud.com/en-us/product/fcs.html
[10] E. Ronen, A. Shamir, A.-O. Weingarten, and C. O’Flynn, “IoT Goes Nuclear:
Creating a ZigBee Chain Reaction,” in IEEE Symposium on Security and Privacy
(S&P). IEEE, May 2017, pp. 195–212.
127
Bibliography
128
Bibliography
[22] F. Schellenberg, D. R. Gnad, A. Moradi, and M. B. Tahoori, “An Inside Job: Re-
mote Power Analysis Attacks on FPGAs,” in Proceedings of Design, Automation
& Test in Europe (DATE), Mar. 2018.
[26] P. Larsson, “Power supply noise in future IC’s: a crystal ball reading,” in IEEE
Custom Integrated Circuits Conference (CICC), 1999, pp. 467–474.
[28] K. Arabi, R. Saleh, and X. Meng, “Power Supply Noise in SoCs: Metrics, Manage-
ment, and Measurement,” IEEE Des. Test. Comput., vol. 24, no. 3, pp. 236–244,
2007.
[31] A. Muhtaroglu, G. Taylor, and T. Rahal-Arabi, “On-die droop detector for analog
sensing of power supply noise,” IEEE Journal of Solid-State Circuits, vol. 39,
no. 4, pp. 651–660, April 2004.
[32] T. Wang, C. Zhang, J. Xiong, and Y. Shi, “On the Deployment of On-Chip Noise
Sensors,” IEEE Transactions on Computer-Aided Design of Integrated Circuits
and Systems (TCAD), vol. 33, no. 4, pp. 519–531, April 2014.
[34] Henry J. Zhang, “Basic Concepts of Linear Regulator and Switching Mode Power
Supplies,” Analog Devices, 2013.
129
Bibliography
130
Bibliography
[50] K. M. Zick and J. P. Hayes, “Low-cost Sensing with Ring Oscillator Arrays
for Healthier Reconfigurable Systems,” ACM Trans. Reconfigurable Technol.
Syst. (TRETS), vol. 5, no. 1, pp. 1:1–1:26, Mar. 2012. [Online]. Available:
http://doi.acm.org/10.1145/2133352.2133353
131
Bibliography
132
Bibliography
[74] E. Brier, C. Clavier, and F. Olivier, “Correlation Power Analysis with a Leakage
Model,” Cryptographic Hardware and Embedded Systems - CHES 2004, pp. 16–
29, 2004.
[75] G. Piret and J.-J. Quisquater, “A Differential Fault Attack Technique against
SPN Structures, with Application to the AES and Khazad,” in Cryptographic
Hardware and Embedded Systems - CHES 2003, C. D. Walter, Ç. K. Koç, and
C. Paar, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2003, pp. 77–88.
[76] I. Kuon and J. Rose, “Measuring the Gap Between FPGAs and ASICs,” IEEE
Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 26, no. 2, pp. 203–215,
Feb 2007.
133
Bibliography
[81] Z. Liu, S. Sun, and P. Boyle, “FPGA core PDN design optimization,” in IEEE
International Symposium on Electromagnetic Compatibility (EMC), Aug 2011,
pp. 411–416.
[83] P. Pant and J. Zelman, “Understanding Power Supply Droop during At-Speed
Scan Testing,” in IEEE VLSI Test Symposium (VTS), May 2009, pp. 227–232.
[85] X. Lin, “Power Supply Droop and Its Impacts on Structural At-Speed Testing,”
in IEEE Asian Test Symposium (ATS), Nov 2012, pp. 239–244.
[87] P. Mosalikanti, N. Kurd, C. Mozak, and T. Oshita, “Low Power Analog Circuit
Techniques in the 5th Generation Intel Core Microprocessor (Broadwell),” in
IEEE Custom Integrated Circuits Conference (CICC), Sept 2015, pp. 1–4.
[90] Xilinx, “ML605 Hardware User Guide - UG534 (v1.8),” October 2012.
134
Bibliography
[94] E. Biham and A. Shamir, Differential fault analysis of secret key cryptosystems.
Berlin, Heidelberg: Springer Berlin Heidelberg, 1997, pp. 513–525. [Online].
Available: https://doi.org/10.1007/BFb0052259
[99] T. Fuhr, E. Jaulmes, V. Lomne, and A. Thillard, “Fault Attacks on AES with
Faulty Ciphertexts Only,” in 2013 Workshop on Fault Diagnosis and Tolerance
in Cryptography. IEEE, aug 2013.
[104] M. Hutter and J.-M. Schmidt, “The Temperature Side Channel and Heating
Fault Attacks,” in Smart Card Research and Advanced Applications. Springer
International Publishing, 2014, pp. 219–235.
135
Bibliography
[106] N. Selmane, S. Bhasin, S. Guilley, T. Graba, and J.-L. Danger, “WDDL is Pro-
tected against Setup Time Violation Attacks,” in 2009 Workshop on Fault Diag-
nosis and Tolerance in Cryptography (FDTC). IEEE, sep 2009.
[107] S. Endo, Y. Li, N. Homma, K. Sakiyama, K. Ohta, and T. Aoki, “An Effi-
cient Countermeasure against Fault Sensitivity Analysis Using Configurable De-
lay Blocks,” in 2012 Workshop on Fault Diagnosis and Tolerance in Cryptography.
IEEE, sep 2012.
[108] A. Amouri and M. Tahoori, “A Low-Cost Sensor for Aging and Late Transi-
tions Detection in Modern FPGAs,” in International Conference on Field Pro-
grammable Logic and Applications (FPL). IEEE, sep 2011, pp. 329–335.
[110] F. Schellenberg, “Novel methods of passive and active side-channel attacks,” doc-
toralthesis, Ruhr-Universität Bochum, Universitätsbibliothek, 2018.
[115] J. Rajendran, V. Jyothi, and R. Karri, “Blue team red team approach to hardware
trust assessment,” in ICCD. IEEE Computer Society, 2011, pp. 285–288.
136
Bibliography
[117] L. Lin, W. Burleson, and C. Paar, “MOLES: Malicious off-chip leakage enabled
by side-channels,” in ICCAD. ACM, 2009, pp. 117–122.
[124] J. D. Corbett, “The Xilinx Isolation Design Flow for Fault-Tolerant Systems,”
2012.
[127] L. Zussa, I. Exurville, J.-M. Dutertre, J.-B. Rigaud, B. Robisson, A. Tria, and
J. Clédière, “Evidence of an Information Leakage Between Logically Independent
Blocks,” in Proceedings of the Second Workshop on Cryptography and Security
in Computing Systems, ser. CS2 ’15. New York, NY, USA: ACM, 2015, pp.
25:25–25:30. [Online]. Available: http://doi.acm.org/10.1145/2694805.2694810
137
Bibliography
[129] C. Nagl, “Exploiting the Virtex 6 System Monitor for Power-Analysis Attacks,”
Master’s thesis, IAIK - Graz University of Technology, 2012.
[133] S. P. Olarig and M. F. Angelo, “Method for the secure remote flashing of a BIOS
memory,” Dec. 28 1999, uS Patent 6,009,524.
[134] A. Moradi, M. Kasper, and C. Paar, “On the Portability of Side-Channel At-
tacks - An Analysis of the Xilinx Virtex 4, Virtex 5, and Spartan 6 Bitstream
Encryption Mechanism,” Cryptology ePrint Archive, Report 2011/391, 2011,
https://eprint.iacr.org/2011/391.
[135] J. Hendricks and L. van Doorn, “Secure Bootstrap is Not Enough: Shoring Up
the Trusted Computing Base,” in SIGOPS European Workshop. ACM, 2004.
[Online]. Available: http://doi.acm.org/10.1145/1133572.1133600
138
Bibliography
[143] C. M. Medaglia and A. Serbanati, “An Overview of Privacy and Security Is-
sues in the Internet of Things,” in The Internet of Things, D. Giusto, A. Iera,
G. Morabito, and L. Atzori, Eds. New York, NY: Springer, 2010, pp. 389–395.
[144] R. Roman, P. Najera, and J. Lopez, “Securing the Internet of Things,” Computer,
vol. 44, no. 9, pp. 51–58, Sep. 2011.
[149] J. A. Stankovic, “Wireless Sensor Networks,” Computer, vol. 41, no. 10, pp.
92–95, Oct. 2008.
[150] W. Shi and S. Dustdar, “The Promise of Edge Computing,” Computer, vol. 49,
no. 5, pp. 78–81, May 2016.
139
Bibliography
[156] J. Jaffe, “A First-Order DPA Attack Against AES in Counter Mode with
Unknown Initial Counter,” in Cryptographic Hardware and Embedded Systems
(CHES), P. Paillier and I. Verbauwhede, Eds., Berlin, Heidelberg, 2007, pp. 1–
13.
[159] W. Schindler and K. Itoh, “Exponent Blinding Does Not Always Lift (Partial)
Spa Resistance to Higher-Level Security,” in Applied Cryptography and Network
Security, J. Lopez and G. Tsudik, Eds. Berlin, Heidelberg: Springer Berlin
Heidelberg, 2011, pp. 73–90.
[161] D. Genkin, A. Shamir, and E. Tromer, RSA Key Extraction via Low-Bandwidth
Acoustic Cryptanalysis. Berlin, Heidelberg: Springer, 2014, pp. 444–461.
[Online]. Available: http://dx.doi.org/10.1007/978-3-662-44371-2_25
140
Bibliography
141
Bibliography
[186] C. O’Flynn and A. Dewar, “On-Device Power Analysis Across Hardware Secu-
rity Domains,” IACR Transactions on Cryptographic Hardware and Embedded
Systems, pp. 126–153, 2019.
142
Bibliography
Reconfigurable Technol. Syst., vol. 12, no. 3, Aug. 2019. [Online]. Available:
https://doi.org/10.1145/3328222
[190] Altera Corporation, “AN 567: Quartus II Design Separation Flow,” 2009.
143
Acronyms
ADC Analog-to-Digital Converter. 10, 119, 123
AES Advanced Encryption Standard. 15, 16, 53–55, 57, 60, 61, 63, 65–72, 81, 82, 84,
85, 147, 149, 150
ALM Adaptive Logic Module; an advanced configurable LUT, used in most Intel
FPGAs. 64, 65
BRAM Block RAM; typically SRAM memory, available in user-configurable FPGA
logic. 80, 82, 118
CPA Correlation Power Analysis. 13, 14, 82, 84, 85, 92, 100, 119
DCM Digital Clock Manager. 81
DFA Differential Fault Analysis. 6, 15, 53–55, 57, 60, 63, 66–71, 149, 150
DoS Denial-of-Service Attack; compromises Availability of an asset. 5, 6, 12, 53, 72,
119
DPA Differential Power Analysis. 75
FPGA Field Programmable Gate Array. 3, 4, 7, 8, 11, 12, 15, 53–60, 62–66, 69–72,
118, 123, 149
GPU Graphics Processing Unit. 3
HDL Hardware Description Language; class of computer languages used to design
electronic circuits, major contenders are VHDL and Verilog. 57
IC Integrated Circuit. 75
ILA Integrated Logic Analyzer; available Debug Core in Xilinx FPGAs. 80, 150
LAB Logic Array Block; block containing various configurable elements and local
interconnect in Intel FPGAs. 65
LSB Least Significant Bit. 59
LUT Look-Up Table; used in most FPGAs to implement custom logic. 57, 58, 64–67,
69, 72, 145, 149
MSB Most Significant Bit. 59
PCB Printed Circuit Board. 5, 7, 86, 88, 118
PDN Power Distribution Network. 7, 56, 57, 71, 75–78, 85, 86, 119, 120, 150
RO Ring Oscillator; combinational logic with a feedback loop that leads to oscillation.
11, 56–67, 70–73, 118, 149
RSA a common public-key cryptosystem. 75
SCA Side-Channel Analysis. 75–77, 86, 87, 150
SoC System-on-Chip. 54, 71, 76, 77, 150
SPA Simple Power Analysis. 92
SPN Substitution-Permutation Network. 15
STA Static Timing Analysis. 64, 67, 69, 70, 155
TCB Trusted Computing Base. 88
TDC Time-to-Digital Converter. 11, 12, 78, 79, 92, 118, 147, 150
145
List of Figures
147
List of Figures
148
List of Figures
149
List of Figures
4.19. Total measured fault injection rates Ftot and measured injection rate of
faults usable in DFA FDFA as well as setup slacks reported by Quartus
STA for different AES operating clock frequencies fop with preserved de-
sign placement but remaining routing randomization on the DE0-Nano-
SoC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.20. Range of frequencies to toggle ROs on/off that cause timing faults or
crashes in the Xilinx VCU108 Virtex Ultrascale Board. . . . . . . . . . 73
5.1. Two scenarios of SCA attacks, where the circuits are logically separated,
but share the same PDN. a) In a shared FPGA, one user (A) can attack
another (B). b) In an FPGA SoC, a user with current access to the
FPGA accelerator can attack any software or operating system on the
CPU. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.2. Floorplan (rotated right) of one TDC Sensor with 18×(LUT, Latch) as
part of the Initial Delay. . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.3. Architecture of the underlying AES encryption core (ShiftRows and
KeySchedule not shown) . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.4. Experimental setup showing the Sakura-G Board connected to our mea-
surement PC, with Chipscope ILA used for data acquisition. . . . . . . 80
5.5. Floorplans showing the Experimental Setup with all the relevant parts.
Left: the internal sensor is placed close to the AES module. Right:
the internal sensor is placed far away from the AES. . . . . . . . . . . . 81
5.6. Single traces measured using an oscilloscope (top) and using our devel-
oped sensor at different sampling frequencies (below). Time samples
refer to the individual samples captured at the respective sampling rate. 83
5.7. Results using the oscilloscope (top row), using the internal sensor at
different sampling frequencies (rows below), for each the correlation by
means of 5 000 traces (left) and the progressive curves over the num-
ber of traces (right). The correct key hypothesis is marked in black.
Time samples refer to the individual samples captured at the respective
sampling rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.8. Correlation using 5 000 traces (left) and progress of the maximum cor-
relation over the number of traces (right) using the internal sensor at
96 MHz sampling frequency, placed far away from the AES module. . . 85
5.9. CPA attack through on-chip sensors on the Lattice ECP5 FPGA; Cor-
relation progress over 10000 samples for all 256 secret AES key byte
candidates with the correct key byte marked red. . . . . . . . . . . . . 86
5.10. Scenario of a shared power supply leading to a risk of side-channel at-
tacks. In this example Chip C tries to deduce information from the
power supply on Chip A. . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.11. Experimental setup showing the SAKURA-G Board connected to our
measurement PC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.12. Setup showing the configuration of the two FPGAs on the SAKURA-G
board. The sensor to attack is in the main FPGA, while the crypto-
graphic module (AES or RSA) runs on the auxiliary FPGA. . . . . . . 90
150
List of Figures
5.13. Averaged traces measured during AES using the TDC sensors at different
sampling frequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.14. CPA attack on AES: Results to estimate the sensor quality at differ-
ent sampling rates, with a board when all relevant capacitors are re-
moved. Each row shows the correlation using 500 000 traces (left) and
the progressive curves over the number of traces (right). The correct
key hypothesis is marked in black. Time samples refer to the individual
samples captured at the respective sampling rate. . . . . . . . . . . . . 94
5.15. CPA attack on AES: Progressive curves over the number of traces with
a board when all relevant capacitors are removed (left) and with the
default capacitor configuration (right), both sampled at 24 MHz. The
correct key hypothesis is marked in black. Time samples refer to the
individual samples captured at the respective sampling rate. . . . . . . 95
5.16. Binary exponentiation for RSA captured with the TDC sensor on sepa-
rate FPGAs, sampled at 24 MHz (raw trace). . . . . . . . . . . . . . . . 96
5.17. Detail of the binary exponentiation captured with the TDC sensor after
applying a 900 kHz low-pass filter. Dotted lines mark the time-span
of an individual step in the binary exponentiation. Arrows indicated
whether the state of the multiplication module changed (on to off: arrow
upwards, off to on: arrow downwards, and no change: dash). The bits
above are a part of the (correctly) recovered secret exponent according
to this classification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.18. Basic principle of the two variants of our adversarial model considered in
this section. In both cases an ADC in the Analog Subsystem is biased
from Task A in the digital subsystem. This bias can contain secret
information that Task A processes. . . . . . . . . . . . . . . . . . . . . 99
5.19. Overview of our common experimental setup, shared among the used
platforms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.20. Description of one loop iteration of the two FreeRTOS tasks. . . . . . . 102
5.21. Principle how different ADC sampling styles will cover less or more parts
of the voltage noise affecting the ADC result. DMA needs to be used for
continuous sampling, while software-based sampling (in the multi-core
scenario) will always introduce some gaps. . . . . . . . . . . . . . . . . 103
5.22. Average over 1000 traces for oscilloscope measurements on Vdd and Idd
in the first two plots and average of concurrently measured ADC values
when the ADC was set to 6-bit and the pin was not connected (N/C)
on the bottom. High activity phases are marked grey. . . . . . . . . . . 105
5.23. Average over 100k traces for a fixed base exponentiated in a mbedTLS
sliding window exponentiation, and 100k traces with each a random base
exponentiated with the same secret exponent on the STM32F407VG
Discovery #1. ADC connected to GND. . . . . . . . . . . . . . . . . . 106
5.24. Average over 1M traces for a fixed message encrypted with mbedTLS
AES and the FIPS key, and 1M traces with each a random message
encrypted with the same key on the STM32F407VG Discovery #1. ADC
connected to GND. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
151
List of Figures
152
List of Figures
153
List of Tables
3.1. Resource use for one sensor, encoder, and registers between . . . . . . . 22
3.2. Calibration data for each sensor in layout with eight horizontal sensors:
Time to bit 0 (i.e. initial delay) and average required delay per bit
(linear estimate). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3. Notations describing workloads w and activity patterns W . . . . . . . . 26
3.4. Overview of workloads w used in W in this chapter. . . . . . . . . . . . 26
4.1. Different conditions for the tested boards. Power consumption measured
with wall plug, default power supplies. . . . . . . . . . . . . . . . . . . 49
4.2. First results about general vulnerability of different platforms . . . . . 60
4.3. STA corner timing-models available in the Quartus STA from fastest to
slowest model for a given device speed grade at 1100 mV supply voltage 64
155
A. Appendix on FPGAhammer
A.1. Single Ring Oscillator Verilog Source Code
module osc ( enablein , dummyout );
input enablein ;
output dummyout ;
wire enablein_lut ;
wire loop_lut , loop ;
lut_input enable_lutin ( enablein , enablein_lut );
lut_input loop_lutin ( loop , loop_lut );
lut_output loop_lutout (~ loop_lut & enablein_lut , loop );
assign dummyout = loop ;
endmodule
Listing A.1: Single RO module Verilog source code using low-level primitives to implement a
two-input NAND gate
157
A. Appendix on FPGAhammer
Listing A.3: Instantiation of ROs with virtual pins in the top-level module; To generate bare
ROs without virtual pins, the dummyout output declaration is removed and the
respective port of the osc_array module instance left unconnected
158
B. Appendix on Leaky Noise
B.1.1. ESP32-devkitC
B.1.1.1. AES
(a) First order leakage assessment results based on (b) First order leakage assessment results based on
a fixed-vs-random t-test for 1k traces collected a fixed-vs-random t-test for 1M traces collected
during AES encryptions on the ESP32-devkitC during AES encryptions on the ESP32-devkitC
with the ADC pin connected to GND. with the ADC pin disconnected (N/C).
(c) First order leakage assessment results based on (d) Leakage Assessment progress on mbedTLS AES
a fixed-vs-random t-test for 1M traces collected with {Vdd, GND, N/C} connected to the ADC
during AES encryptions on the ESP32-devkitC on the ESP32-devkitC.
with the ADC pin connected to Vdd .
159
B. Appendix on Leaky Noise
(a) First order leakage assessment results based on (b) First order leakage assessment results based on
a fixed-vs-random t-test for 1k traces collected a fixed-vs-random t-test for 100k traces col-
during modular exponentiation on the ESP32- lected during modular exponentiation on the
devkitC with the ADC pin connected to GND. ESP32-devkitC with the ADC pin disconnected
(N/C).
(c) First order leakage assessment result based on a (d) Leakage Assessment progress on mbedTLS
fixed-vs-random t-test for 100k traces collected modular exponentiation with {Vdd, GND,
during modular exponentiation on the ESP32- N/C} connected to the ADC on the ESP32-
devkitC with the ADC connected to Vdd . devkitC.
Figure B.2.: Results of Leakage Assessments on ESP32-devkitC for mbedTLS modular expo-
nentiation.
160
B.1. Results of all Leakage Assessments
B.1.2.1. AES
(a) First order leakage assessment results based on (b) First order leakage assessment results based on
a fixed-vs-random t-test for 1M traces collected a fixed-vs-random t-test for 1M traces collected
during AES encryptions on the STM32F407VG during AES encryptions on the STM32F407VG
Discovery #1 with the ADC pin connected to Discovery #1 with the ADC pin disconnected
GND. (N/C).
(c) First order leakage assessment results based on (d) Leakage Assessment progress on mbedTLS AES
a fixed-vs-random t-test for 1M traces collected with {Vdd, GND, N/C} connected to the ADC
during AES encryptions on the STM32F407VG on the STM32F407VG Discovery #1.
Discovery #1 with the ADC pin connected to
Vdd .
161
B. Appendix on Leaky Noise
(a) First order leakage assessment results based on (b) First order leakage assessment results based on
a fixed-vs-random t-test for 100k traces col- a fixed-vs-random t-test for 100k traces col-
lected during modular exponentiation on the lected during modular exponentiation on the
STM32F407VG Discovery #1 with the ADC STM32F407VG Discovery #1 with the ADC
pin connected to GND. pin disconnected (N/C).
(c) First order leakage assessment results based on (d) Leakage Assessment progress on mbedTLS
a fixed-vs-random t-test for 100k traces col- modular exponentiation with {Vdd, GND,
lected during modular exponentiation on the N/C} connected to the ADC on the
STM32F407VG Discovery #1 with the ADC STM32F407VG Discovery #1.
pin connected to Vdd .
162
B.1. Results of all Leakage Assessments
B.1.3.1. AES
(a) First order leakage assessment results based on (b) First order leakage assessment results based on
a fixed-vs-random t-test for 1M traces collected a fixed-vs-random t-test for 1M traces collected
during AES encryptions on the STM32F407VG during AES encryptions on the STM32F407VG
Discovery #2 with the ADC pin connected to Discovery #2 with the ADC pin disconnected
GND. (N/C).
(c) First order leakage assessment results based on (d) Leakage Assessment progress on mbedTLS AES
a fixed-vs-random t-test for 1M traces collected with {Vdd, GND, N/C} connected to the ADC
during AES encryptions on the STM32F407VG on the STM32F407VG Discovery #2.
Discovery #2 with the ADC pin connected to
Vdd .
163
B. Appendix on Leaky Noise
(a) First order leakage assessment results based on (b) First order leakage assessment results based on
a fixed-vs-random t-test for 100k traces col- a fixed-vs-random t-test for 100k traces col-
lected during modular exponentiation on the lected during modular exponentiation on the
STM32F407VG Discovery #2 with the ADC STM32F407VG Discovery #2 with the ADC
pin connected to GND. pin disconnected (N/C).
(c) First order leakage assessment results based on (d) Leakage Assessment progress on mbedTLS
a fixed-vs-random t-test for 100k traces col- modular exponentiation with {Vdd, GND,
lected during modular exponentiation on the N/C} connected to the ADC on the
STM32F407VG Discovery #2 with the ADC STM32F407VG Discovery #2.
pin connected to Vdd .
164
B.1. Results of all Leakage Assessments
B.1.4.1. AES
(a) First order leakage assessment results based on (b) First order leakage assessment results based on
a fixed-vs-random t-test for 1M traces collected a fixed-vs-random t-test for 1M traces collected
during AES encryptions on the STM32L475 IoT during AES encryptions on the STM32L475 IoT
Node with the ADC pin connected to GND. Node with the ADC pin disconnected (N/C).
(c) First order leakage assessment results based on (d) Leakage Assessment progress on mbedTLS AES
a fixed-vs-random t-test for 1M traces collected with {Vdd, GND, N/C} connected to the ADC
during AES encryptions on the STM32L475 IoT on the STM32L475 IoT Node.
Node with the ADC pin connected to Vdd .
Figure B.7.: Results of Leakage Assessments on STM32L475 IoT Node for mbedTLS AES.
165
B. Appendix on Leaky Noise
(a) First order leakage assessment results based (b) First order leakage assessment results based
on a fixed-vs-random t-test for 1k traces col- on a fixed-vs-random t-test for 1k traces col-
lected during modular exponentiation on the lected during modular exponentiation on the
STM32L475 IoT Node with the ADC pin con- STM32L475 IoT Node with the ADC pin dis-
nected to GND. connected (N/C).
(c) First order leakage assessment results based on (d) Leakage Assessment progress on mbedTLS
a fixed-vs-random t-test for 100k traces col- modular exponentiation with {Vdd, GND,
lected during modular exponentiation on the N/C} connected to the ADC on the
STM32L475 IoT Node with the ADC pin con- STM32L475 IoT Node.
nected to Vdd .
Figure B.8.: Results of Leakage Assessments on STM32L475 IoT Node for mbedTLS modular
exponentiation.
166
B.2. Results of CPA for all secret AES key bytes on Vdd and GND
(a) CPA progress for the 0th secret key byte (b) CPA progress for the 1st secret key byte
(c) CPA progress for the 2nd secret key byte (d) CPA progress for the 3rd secret key byte
(e) CPA progress for the 4th secret key byte (f) CPA progress for the 5th secret key byte
(g) CPA progress for the 6th secret key byte (h) CPA progress for the 7th secret key byte
Figure B.9.: Results of a CPA attack on the last secret round key (bytes 0 to 7) of AES on the
STM32F407VG Discovery #1 @168MHz with the ADC connected to GND and
the program compiled with the -Os optimization option. Each plot shows the
correlation progress of all 256 key candidates for a specific key byte over 10M
traces and the respective correct key candidate is marked red.
167
B. Appendix on Leaky Noise
(a) CPA progress for the 8th secret key byte (b) CPA progress for the 9th secret key byte
(c) CPA progress for the 10th secret key byte (d) CPA progress for the 11th secret key byte
(e) CPA progress for the 12th secret key byte (f) CPA progress for the 13th secret key byte
(g) CPA progress for the 14th secret key byte (h) CPA progress for the 15th secret key byte
Figure B.10.: Results of a CPA attack on the last secret round key (bytes 8 to 15) of AES on
the STM32F407VG Discovery #1 @168MHz with the ADC connected to GND
and the program compiled with the -Os optimization option. Each plot shows
the correlation progress of all 256 key candidates for a specific key byte over
10M traces and the respective correct key candidate is marked red.
168
B.2. Results of CPA for all secret AES key bytes on Vdd and GND
(a) CPA progress for the 0th secret key byte (b) CPA progress for the 1st secret key byte
(c) CPA progress for the 2nd secret key byte (d) CPA progress for the 3rd secret key byte
(e) CPA progress for the 4th secret key byte (f) CPA progress for the 5th secret key byte
(g) CPA progress for the 6th secret key byte (h) CPA progress for the 7th secret key byte
Figure B.11.: Results of a CPA attack on the last secret round key (bytes 0 to 7) of AES on
the STM32F407VG Discovery #2 @56MHz with the ADC connected to Vdd
and the program compiled with the -O0 optimization option. Each plot shows
the correlation progress of all 256 key candidates for a specific key byte over
10M traces and the respective correct key candidate is marked red.
169
B. Appendix on Leaky Noise
(a) CPA progress for the 8th secret key byte (b) CPA progress for the 9th secret key byte
(c) CPA progress for the 10th secret key byte (d) CPA progress for the 11th secret key byte
(e) CPA progress for the 12th secret key byte (f) CPA progress for the 13th secret key byte
(g) CPA progress for the 14th secret key byte (h) CPA progress for the 15th secret key byte
Figure B.12.: Results of a CPA attack on the last secret round key (bytes 8 to 15) of AES on
the STM32F407VG Discovery #2 @168MHz with the ADC connected to Vdd
and the program compiled with the -O0 optimization option. Each plot shows
the correlation progress of all 256 key candidates for a specific key byte over
10M traces and the respective correct key candidate is marked red.
170
B.3. Simplified Source Code of the Experiments
Listing B.4: adc_get_samples for ESP32 ULP-based, plus ULP assembly code (adc.S). Please
note, in the actual implementation we directly used the ULP from the mbedTask
instead of a separate adcTask.
1 static inline void adc_get_samples () {
171
B. Appendix on Leaky Noise
2 adc1_ulp_enable ();
3 ulp_load_binary (0 , ulp_main_bin_start , ulp_bin_size );
4 u l p _ s e t _ w a k e u p _ p e r i o d (0 , 1000);
5 while ((( volatile typeof ( ulp_sync_back )) ulp_sync_back ) == 0);
6 *(( volatile typeof ( ulp_sync_back )*) & ulp_sync_back ) = 0;
7 uint32_t * p_ulp_adc_data = & ulp_adc_data ;
8 for ( int i =0; i < ADC_WORDS ; i ++) {
9 adc_data [ i ] = ( uint16_t ) (* p_ulp_adc_data ++) & 0 xffff ;
10 }
11 }
12
13 adc . S :
14 move r0 , adc_data
15 measure :
16 adc r2 , adc_nr , adc_channel + 1
17 st r2 , r0 ,0
18 add r0 , r0 , 1
19 jumpr measure , adc_data + ADC_WORDS , lt
20 // sync back to main cpu , which spinlocks :
21 move r1 , sync_back
22 move r2 , 0 x0001
23 st r2 , r1 ,0
24 halt
11 void H A L _ A D C _ C o n v C p l t C a l l b a c k ( A D C _ H a n d l e T y p e D e f * hadc ) {
12 osSignalSet ( adcHandle , 0 x0001 );
13 }
172