0% found this document useful (0 votes)
13 views11 pages

FPGA2019 Tutorial

Uploaded by

trinhdinh hoa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views11 pages

FPGA2019 Tutorial

Uploaded by

trinhdinh hoa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

The University of Manchester Research

Invited Tutorial: FPGA Hardware Security for Datacenters


and Beyond

Document Version
Proof

Link to publication record in Manchester Research Explorer

Citation for published version (APA):


Mätas, K., La, T., Grunchevski, N., Pham, K., & Koch, D. (2020). Invited Tutorial: FPGA Hardware Security for
Datacenters and Beyond. In 28th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
(FPGA)

Published in:
28th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA)

Citing this paper


Please note that where the full-text provided on Manchester Research Explorer is the Author Accepted Manuscript
or Proof version this may differ from the final Published version. If citing, it is advised that you check and use the
publisher's definitive version.

General rights
Copyright and moral rights for the publications made accessible in the Research Explorer are retained by the
authors and/or other copyright owners and it is a condition of accessing publications that users recognise and
abide by the legal requirements associated with these rights.

Takedown policy
If you believe that this document breaches copyright please refer to the University of Manchester’s Takedown
Procedures [http://man.ac.uk/04Y6Bo] or contact [email protected] providing
relevant details, so we can investigate your claim.

Download date:26. May. 2023


Invited Tutorial: FPGA Hardware Security for Datacenters
and Beyond
Kaspar Matas, Tuan La, Nikola Grunchevski, Khoa Pham, and Dirk Koch
Department of Computer Science,
The University of Manchester, UK
{firstname.lastname}@manchester.ac.uk
ABSTRACT bitstream) against cloning/overbuilding, reverse engineering, tam-
Since FPGAs are now available in datacenters to accelerate ap- pering, and spoofing, as summarized in [24]. This view has changed
plications, providing FPGA hardware security is a high priority. with FPGAs are now being integrated into data centers and cloud
FPGA security is becoming more serious with the transition to computing infrastructures at large scale [4, 10, 19]. One principle
FPGA-as-a-Service where users can upload their own bitstreams. commonly used in cloud computing is resource pooling which allows
Full control over FPGA hardware through the bitstream enables sharing resources across different tenants such that overall utiliza-
attacks to weaken an FPGA-based system. These include physically tion of cloud hardware resources gets improved. Resource pooling
damaging the FPGA equipment and leaking of sensitive informa- is currently not offered by any major FPGA cloud service provider,
but multi-tenant scenarios are expected to provide better utilization
tion such as the secret keys of crypto algorithms. While there is no
and consequently better overall power efficiency at lower cost as
known attacks in the commercial settings so far, it is not so much a
compared to the current one-user-per-fabric scheme [25]. It should
question of if but more of when? The tutorial will show concrete
also be mentioned that the commonly used scenario consisting of a
attacks applicable on datacenter FPGAs.
shell (i.e., the static system infrastructure that a data center FPGA
The goal of this tutorial is to prepare the FPGA community to
provides to allow a user circuit communicating with the server) and
impending security issues in order to pave way for a proactive
the user accelerator design can already be considered multi-tenant.
security. First, we will give a tour through the FPGA hardware
This is because the shell and a tenant are implemented individually
security jungle surveying practical attacks and potential threats.
and it needs protection mechanisms to ensure the system integrity
We will reinforce this with live demos of denial of service attacks.
of both (shell and user accelerator). For instance, a user accelerator
Less than 10% of the logic resources on an FPGA can draw enough should not be able to gain access to the shell, which in turn may
dynamic power to crash a datacenter FPGA card. In the second part compromise other parts of the cloud infrastructure.
of the tutorial, we will show different mitigations that are either The tutorial continues as follows: the next section provides a
vendor supported or proposed by the academic community. In survey on FPGA hardware security, followed in Section 3 with a
summary, the tutorial will communicate that while FPGA hardware tutorial on how to research optimized ring oscillators for power
security is complicated to bring about, there are acceptable solutions hammering and for side-channel attacks. This section serves as
for known FPGA security problems. a template to create other kinds of malicious circuits. Section 4
provides a tutorial on installing and using the open-source FPGA
KEYWORDS bitstream virus scanner FPGADefender. Some virus scan results
FPGA security, hardware security, datacenter, cloud computing are finally provided in Section 5.
ACM Reference Format:
Kaspar Matas, Tuan La, Nikola Grunchevski, Khoa Pham, and Dirk Koch. 2 HARDWARE THREATS FOR DATACENTER
2020. Invited Tutorial: FPGA Hardware Security for Datacenters and Beyond. FPGAS
In 2020 ACM/SIGDA International Symposium on Field-Programmable Gate
Due to their deep low-level programmability, FPGAs comprise new
Arrays (FPGA’20), February 23–25, 2020, Seaside, CA, USA. ACM, New York,
NY, USA, 10 pages. https://doi.org/10.1145/3373087.3375390
threat models far beyond of what is commonly known from con-
ventional CPU/GPU systems. For instance, modules running on
an FPGA may include circuits being able to measure system states
1 INTRODUCTION
at high accuracy which may open physical side-channels to leak
Traditionally, FPGA industry had the position that hardware secu- sensitive data from other users [5, 21]. These kind of attacks are not
rity of an FPGA was primary about protecting designs in terms of available in known software threat models, but had been shown
intellectual property (IP) in configuration data (i.e. the configuration for FPGAs.
Permission to make digital or hard copies of all or part of this work for personal or In the reminder of this section, we take a brief literature re-
classroom use is granted without fee provided that copies are not made or distributed view on potential threats against multi-tenant FPGAs which can
for profit or commercial advantage and that copies bear this notice and the full citation be categorized into:
on the first page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, (1) attacks on the system availability (DoS-like attacks)
to post on servers or to redistribute to lists, requires prior specific permission and/or a
fee. Request permissions from [email protected]. (2) attacks on the user confidentiality (via physical side-channel
FPGA ’20, February 23–25, 2020, Seaside, CA, USA analysis)
© 2020 Association for Computing Machinery.
ACM ISBN 978-1-4503-7099-8/20/02. . . $15.00 This section will also provide published state-of-the-art counter-
https://doi.org/10.1145/3373087.3375390 measures.
User A

Shutdowns the Attackers put on


victim with on-chip sensor core Logic isolation
malicious circuits

Legitimate Static shell


requests can’t
get resources
to execute

Users put on
crypto-graphic User B
core

Figure 1: Denial-of-service-like (DoS-like) threat model. A Figure 2: An illustration of the eavesdropping threat model
user may try to shutdown an FPGA service in a data center of user confidentiality in a multi-tenant computing environ-
by sending malicious circuits such that legitimate requests ment.
from other users cannot use the FPGA resources. Short-
Although all current power hammering attacks are leveraging
circuits and power hammering designs can be utilized for
self-oscilating circuits, glitch amplification can potentially be uti-
such attacks on the system availability. This kind of attack
lized for this purpose as shown in Figure 3.
may potentially age or damage the equipment.
2.1 Attacks on the system availability 2.2 Attacks on the user confidentiality
Denial-of-service-like (DoS-like) attacks are used to bring down Side-channel attacks on FPGAs can be either active (e.g., timing
active infrastructures and/or to compromise states in other system fault injection) or passive (e.g., power analysis, crosstalk coupling,
components which stay outside the scope of an attacking module, as electromagnetic analysis, and thermal channel leakage [16]). In
illustrated in Figure 1. At the electrical level, two means for DoS-like [14], timing faults have been injected through a large number of
attacks had been utilized: short-circuits and power hammering. ring oscillators to cause voltage drops followed by analyzing the
Short-circuits on modern FPGAs have been demonstrated in [1] resulting faulty cipher text using Differential Fault Analysis (DFA)
within the multiplexers inside a switch matrix using a manipulated for successfully revealing the secret key of a crypto-core. The idea
configuration bitstream resulting in a huge current increase (with of most timing fault injection attacks is to temporarily create a huge
several mA extra current for a single multiplexer). While the FPGA power demand (e.g., by starting a large number of ring oscillators).
vendor tools ensure that generated bitstream are short-circuit free, This will reduce the FPGAs supply voltage and may in turn slow
an attacker can create shorts relatively easily. In fact, in [8], short- down a path in a victim circuit such that it may fail timing.
circuits had been used for obfuscating power traces from an AES Power analysis attacks have been demonstrated to leak the se-
core to make power analysis attacks much harder to perform. cret key of a cryptographic function that was running on the same
Power hammering is another mechanism to carry out DoS-like FPGA [21], running on a CPU embedded on the same FPGA die [27],
attacks. All current power hammering attacks [7] are based on and running on a different FPGA on the same FPGA board [22].
fast toggling circuits in order to draw a substantial amount of All these attacks have in common that they use ring-oscillators to
dynamic power. We will show in Section 3 that it is possible to measure key-dependent fluctuations on the voltage. In addition to
implement ring oscillators running in the GHz frequency domain sensing voltage, self-oscillators can be used to monitor crosstalk
with a corresponding dynamic power footprint. In [7], a grid of effects [5, 6, 20]. In these studies, it was found that a long wire
ring oscillators was activated at an adjustable rate (to stimulate carrying a logical 1 will slow down a ring-oscillator that is imple-
resonance effects in the power supply regulation circuit). With this, mented using an adjacent wire. Therefore, by taking advantage
several FPGA platforms such as Xilinx Virtex 6, Kintex 7, and Zynq- of the sensitivity of self-oscillators, attackers can leak the current
7000 FPGAs had been crashed (and in some cases requiring power- state of a signal which is a concern in shared FPGA infrastructures.
cycling for bringing up boards back into service). In this tutorial,
we will examine the potential for power hammering in more detail
in Section 4. Although ring-oscillators are usually flagged with a
2.3 State-of-the-art countermeasures
warning by the vendor design tool flows and hence, not allowed The main schemes to prevent side-channel power analysis attacks
to be deployed on any cloud or data center infrastructure, a recent are based on masking and hiding strategies. In the masking strategy,
research [6] has reported new ring-oscillator designs which can an implementation of a cryptographic algorithm is transformed to
bypass such Design Rule Checking (DRC). A simple trick to bypass another (typically larger) variant which is functionally equivalent,
DRC is implementing a ring oscillator passing through an enabled but where the new circuit is able to remain secure although the
transparent latch. With this, ROs could be deployed, for example, attacker can observe some details of the operation through a side-
on Amazon F1 instances. channel, as proposed in [11]. This makes power analysis attacks
much harder as the data leaked has also to be correlated with the
same manufacturing process and the same UltraScale+ FPGA fabric
Glitch architecture than what is provided in current Xilinx datacenter
signal
SET
FPGA boards, like the popular Alveo U200/250 FPGA boards. The
D Q best design found for Ultra96 will then be tested at scale on an
Alveo U200 board.

CLR Q 3.1 Time-to-Digital Converters on Xilinx


clk UltraScale FPGAs
Figure 3: Illustration of a glitch generator. The glitch signal Ring oscillators on an FPGA can run in the GHz regime which is
can be used to drive a large network of wires and combina- substantially faster than any user logic design can normally sus-
torial logic to drain excessive power. tain. This makes the use of simple counters prohibitive to measure
fastest possible oscillators. We therefore used a Time-to-Digital
implementation changing scheme used inside the secured core. On Converter (TDC) to measure speeds of oscillators. TDCs basically
the other hand, the hiding strategy aims at lowering the Signal use propagation delay to measure a wave form. The idea of TDCs
Noise Ratio (SNR) during the operation by either adding more is to use different propagation delays from the probe to a set of
sources of noise or lessening the strength of the signal, as suggested flip-flops such that the parallel sampled flip-flops reveal the state of
in [3, 8, 12, 26]. the probe at different points in time. See Figure 4 for an illustration
Ring-oscillators can be used to monitor the healthiness of an on the operation of a TDC.
FPGA fabric [28] and can also detect voltage drop attacks (e.g.,
power hammering, power analysis) [18, 29]. A recent work has sug-
gested to use ring-oscillators not only to monitor a power analysis
attack but also to response against the attack by triggering more
power noise [13].
In a recent related work [15], LUT-based ring-oscillator designs
are detected directly from configuration bitstreams. While that
work fundamentally showed that oscillator circuits can be detected Figure 4: Illustration of a Time-to-Digital Converter (TDC).
from bitstreams, it was only shown for basic LUT-based oscillators. The speed of the ring oscillator is measured by a chain of
This leaves an attacker the chance to deploy alternative oscillator flip-flops that sample a delay line in parallel.
designs (e.g., based on glitch amplification). Furthermore [15] was
implemented on a Lattice FPGA and those FPGAs are relatively
small for building a multi-tenancy system. However, the vast major- The flip-flop sample clock speed can be selected mostly arbitrary
ity of systems that would benefit from an FPGA virus scanner are (we used 100 MHz) as the variance in propagation delay is the key
based on modern FPGA architectures that are substantially more property that determines the characteristics of a TDC. Tradition-
complex (e.g., fracturable LUTs, complex DSP blocks with ALU ally, TDCs had been implemented using carry chains to implement
functionality, complex clock networks, a hierarchical routing fabric, the delay chain. However, Xilinx UltraScale+ FPGAs do not have
etc.). traditional carry chains, but use carry-look-ahead (CLA) circuits
The following section will show how we use GoAhead to re- instead. For implementing a TDC on Xilinx UltraScale+ FPGAs,
search the threat potential of ring-oscillators, while in Section 4 we consequently use local routing for the delay chain. Using this
we show how viruses1 can be detected with our new tool FPGADe- strategy, it is not important to find the fastest path (which we are
fender. normally interested in when implementing a module for perfor-
mance), but instead finding paths that reach the different flops such
3 OPTIMIZING AND EVALUATING RING that we form a TDC delay chain that has a linear characteristic (i.e.
OSCILLATOR DESIGNS FOR POWER the variance in time between any pair of two consecutive flip-flops
should be about the same) and high resolution (i.e. the absolute
HAMMERING AND SPEED delay between any pair of two consecutive flip-flops should be
For power hammering, an attacker obviously wants to maximize the small). This physical implementation problem is non-trivial and
amount of power a malicious circuit can waste per unit resources. not directly supported by the Xilinx vendor tools.
For a side-channel attack respectively, the highest sensitivity is We solved that problem using the path search function in the
the most important objective, which correlates to the speed of an tool GoAhead [2]. GoAhead is a tool originally designed for imple-
oscillator. In this section, we will use the tool GoAhead to design menting partially reconfigurable FPGAs. In its latest version, it uses
and tune ring oscillators to waste as much power as possible or to Xilinx Vivado to report a device description that includes the entire
run as fast as possible. We use a setup on an Ultra96 Board where architecture graph (including all possible switch matrix settings) as
we precisely measured supply power. We use a Time-to-Digital well as a detailed timing model. This device description is parsed
Converter, as illustrated in Figure 4, to measure the actual clock in by GoAhead which allows this tool to report latencies for any
frequency of an oscillator. The FPGA on the Ultra96 board uses the path found from any arbitrary primitive port or a port inside a
1 This
switch matrix. To automate processes in GoAhead, the tool sup-
tutorial uses the term virus scanning in its figurative meaning for detecting all
kinds of malicious threats rather than in its original meaning of infecting a program ports TCL scripts. The TCL script in Figure 5 provides an example
with malicious code to spread out in a virus-like manner. for searching for all possible paths stating from startPort on tile
startTile to reach a set of flip-flops specified in flopPins located
on targetTile.

Figure 6: Output created by the TCL script in Figure 5.


and we manually chose a set of paths that result in good linearity
(≈ ±10ps) and a reasonably fine resolution (≈ 70ps). With N H I GH
Figure 5: GoAhead TCL script example to find paths from samples in a TDC for the measured high values and N LOW being
one primitive pin to a set of flip-flops. the number of low samples, the speed of a RO is:
1
f RO ≈
tdel ay × (N H I GH + N LOW ) (1)
The results of the script is a set of paths found by a breadth-first
search sorted for each path in the for-loop of the script by the
number of hops (which correlates to the routing resources used), And with a ≈ 70ps resolution, this allows measuring RO speeds up
as shown in Figure 6. Please note that the names used in GoAhead to about 7GHz2 . In order to get even more accurate resolution, we
correspond to exactly the same naming scheme used by Xilinx in are reporting all values as the median of at least 10 000 runs.
their Vivado tool suite. This holds for names used in scripts as well
as for names used in results. Most importantly, GoAhead annotates 3.2 Optimizing Ring Oscillators for Power
the latency for each path. As can be seen in Figure 6, GoAhead Hammering and Speed
reports the time as it gets incremented along the path. With the
With having an FPGA system instrumented for accurate measure-
PrintLatency switch in the GoAhead PathSearchOnFPGA func-
ment of power and frequency measurement (through our TDCs),
tion, a user can select between any SLOW_MIN, SLOW_MAX, FAST_MIN,
we explored and evaluated various FPGA ring-oscillator designs for
or FAST_MAX timing corner to be considered in the timing analysis.
For building the TDC delay chain, the result paths are sorted by 2 We would like to highlight that a relatively cheap FPGA board allows experimenting
their latency (i.e. the latency reported for the last hop in each path) at such high speeds.
a) I5 I5
b)
enable I4 enable I4
I3 I3
I2 I2
I1 I1
I0 RO_2 I0 RO_3

I5 I5

enable I4 enable I4
I3 I3
I2 I2
I1 I1
I0 I0
RO_1 RO_0

Figure 8: Enhanced ROs grid for power hammering:


a) schematic; b) implementation with 384000 ROs.
critical point when the board crashes. Surprisingly, when reaching
only 15% of the total LUTs resources (1182240 LUT6 primitives), it
causes a strong drop in internal core voltage VCCI NT and even-
tually crashing the board when VCCI NT reached 0.74V . At that
point, we already exceeded the 225W maximum power budged
of the board. Figure 10 and Figure 11 show the power consump-
tion, internal FPGA core voltage VCCI NT , and core temperature
Figure 7: Output created by a GoAhead path search using
in relation to the activated ROs. Please note that the power was
the same port for begin and target (for finding ring oscillator
measured on the power supply grid with the help of an Ampere
variants).
meter (True RMS). The used power supply was a Silverstone Strider
their suitability for power hammering and for side-channel attacks 600W Modular SFX 80+ Gold Power Supply. The figures illustrate
(i.e., fastest oscillator speed). We used the GoAhead tool to find how dangerous malicious circuits could be in a data center setup.
all possible ring oscillator designs. This uses again the GoAhead Therefore, it is necessary to prevent loading any bitstream onto an
PathSearchOnFPGA command (which we used for designing the FPGA board that may include such malicious circuits.
TDC in the previous section) by simply specifying the output of
the LUT intended for the ring oscillator implementation for both 3.4 Further GoAhead Use Cases
the beginPort and the targetPort. An output of such a search
So far, we showed how the timing-driven path search in GoAhead
is shown in Figure 7. For each LUT, the path search will sort the
can be used to find and optimize ring-oscillators. There are several
result paths found in an order reporting the paths with the least
other use cases that can benefit from this ability, in particular in
number of hops first. These paths are typically the fastest ones and
the field of hardware security. For instance, in Figure 3 we showed
the reported latency serves as a sanity check. We used a GoAhead
how different routing latencies can cause glitches. This however,
script (in the same way used for finding the TDC delay path) to
depends on the exact routing delays and by balancing latencies for
find all fastest ring-oscillator designs over all LUTs in a CLB. We
all paths to the XOR gate shown in the example in Figure 3, glitches
then implemented those paths for 2000 LUTs on the Ultra96 board
can actually be canceled out. This is relevant for implementing DPA-
and measured speed and power consumption.
resistant circuits of cryptographic algorithms that often heavily
Our experiments found the fastest oscillator speed being 5.8GHz
use XORs. In such applications, balancing routing latencies may
and an increase in power of 4.2W for the most malicious oscilla-
dramatically reduce power signatures that can be measured by a
tor design found (see Figure 9). The experiments with the poorest
potential attacker (see also Figure 2).
results achieved only 1.1GHz speed and 1.7W waste power. This
Vice versa, carefully imbalanced routing can be used for ampli-
means that a single LUT has a waste power potential of 2.1mW
fying glitches (as needed for power-hammering). Other use cases
when considering the most malicious oscillator design.
include the design of asynchronous circuits and wave pipelining
To put this into perspective: an Alveo U200 data center card fea- that rely on the implementation of exact (routing) latencies to func-
turing a VU9P FPGA providing 1.182 million LUTs would have tion correctly.
a waste power potential of over 2kW using the optimized power In many cases, only a few signals are critical and they can be
ring oscillator design. Consequently even a fraction of that logic easily found by a path search in GoAhead together with a rank-
would by far exceed the thermal and electrical specifications of any ing of the results by latency. The paths selected can be directly
FPGA/FPGA board. implemented in the Xilinx Vivado tool through guided routing
constraints (using the TCL command set property ROUTE). All
3.3 Xilinx Alveo U200 Power Hammering remaining routing can then be added by Vivado automatically. By
Experiment default, GoAhead uses a breadth-first search which means that the
We deployed the optimized ring-oscillator design from the previous search essentially enumerates the entire search space. In practice,
paragraph on an Alveo U200 data center card. This board has the this is often acceptable because the depth of the search is rather
same specifications than the FPGA boards available with Amazon’s limited (typically less than 10 hops in practical systems) and the
F1 instances. We deployed 384000 ROs (≈ 32% of the available LUTs, adjacency of switch matrices is rather sparse. For longer paths, the
as shown in Figure 8) and gradually enabled them to evaluate the GoAhead path search also supports a variant of A*.
Figure 9: ROs Frequency versus Waste Power Gain (measured for 2000 ROs) for all 8 LUT6 primitives inside a CLB for all
corresponding different cases that implement the fastest possible loop from output O6 to an input of the same LUT (resulting
in 8 × 6 = 48 individual experiments).

Power Consumption VCCINT Power Consumption Temperature


450 0.86 450 120
Crash Crash
point! 400
point!
400
0.84
Power Consumption (W)

100
Power Consumption (W)

350
Internal Voltage (V)

350

Temperature (oC)
0.82
300 300 80

0.8 250
250
60
200
200 0.78
150 40
150
0.76
100
100
20
0.74 50
50
0 0
0 0.72 0.00% 5.00% 10.00% 15.00%
0.00% 5.00% 10.00% 15.00%

LUTs Utilization LUTs Utilization


Figure 10: Power Consumption and Core Voltage versus Figure 11: Power Consumption and Temperature versus
LUTs Utilization on Alveo U200. LUTs Utilization on Alveo U200.

4 FPGA VIRUS-SCANNING WITH 4.1 Overview


FPGADEFENDER FPGADefender3 is built entirely in Python which provides a bun-
dle of supportive packages such as NetworkX [9] to represent and
Having examined the threat of ring-oscillators in previous sections,
analyze an implementation graph from a bitstream. As a first step,
we are now looking closer into threat mitigation strategies. We will
an implementation graph is created by a netlist generator which con-
now introduce the tool FPGADefender which detects malicious
tains node and edge information. This graph reassembles the netlist
constructs in bitstreams such that a system can reject a threat before
encoded inside the bitstream. The netlist generator is implemented
it could even materialize on an FPGA.
as an enhancement to the tool Bitman4 . The implementation graph
is encoded in JSON format as shown in Figure 13.
3 Available online at: https://github.com/KasparMatas/FPGAVirusScanner.git
4 BitMan [17] is available under: https://github.com/khoapham/bitman.git
static system (shell) • Short circuit detector: Detect short circuits caused by bit-
partial partial partial stream manipulation. In general, we detect any bitstream
region 1 region 2 region 3 encoding that is invalid for routing. In Xilinx UltraScale+
FPGA FPGAs, this means in practice that all switch matrix multi-
plexers have to be one-hot encoded.
configuration manager
• Fanout detector: Detect and report maximum fanout. This
netlist generator virus scanner
is an indicator for a malicious design as power-hammering
bitstream netlist GN
(BitMan) (FPGADefender) needs some kind of high fan-out control in order to activate
a larger number of ROs. However, this is just an indicator as
architecture virus pos./neg. an attacker could easily hide high fan-out signals. This is an
graph GA signatures filters interesting field for further research.
Figure 12: Envisioned system with a virus scanner for detect- A score is given in each scanning stage and summed up to deliver a
ing malicious configuration bitstreams. total score. Currently, FPGADefender is leaving the evaluation of
the scores and the report to the user. However, our virus scanner
performs already all the heavy-lifting scanning work. Based on
the reported result, the configuration manager will ultimately be
able to decide whether a bitstream is safe to be deployed or not, as
shown in Figure 12.

4.2 How to use FPGADefender


FPGADefender is a command-line program for scanning imple-
mented FPGA designs (i.e. bitstreams) for malicious circuits and
constructs. This section will describe installing and using FPGADe-
fender as well as showing some scan examples. All of the examples
below use the short option flags. For more details about the options
use the --help flag.
Given design in file input_design.json, a scan can be per-
formed on the command line using:

v i r u s s c a n n e r − i i n p u t _ d e s i g n . j s o n −c c o n f i g .
Figure 13: A snippet of a single edge of the implementation i n i −o o u t p u t . t x t
graph. The above command runs FPGADefender on the implemented
After parsing the implementation graph, scanning options are graph given by the input_design.json file based on the options
parsed to provide inputs for the virus detector engine as well as set in the config.ini file and outputs the results to the output.txt
filters. FPGADefender allows specifying a positive filter to describe file.
configurations that must exist in the original bitstream (e.g., a spe- The config file is used to configure FPGADefender and the
cific connection through which a partially reconfigurable module tools it uses. The configuration file is parsed using the Python’s
communicates with the surrounding shell infrastructure). Corre- ConfigParser package and therefore it consists of sections and
spondingly, a negative filter allows describing primitives and rout- options. The configuration file should have the following items
ing resources that are prohibited in a bitstream. In detail, the scan- specified:
ning process executes the following set of virus detector engines:
• virus_signatures: Names of the virus signature packages to
• Combinational cycle detector: Detect combinatorial cycles. be executed
This includes detecting cycles that use transparent latches – Specific virus_signature options described in the next sec-
in order to prevent the attack revealed in [23]. tion
• Attribute detector: Detect asynchronous design elements • connection_attributes: Optional section for adding attributes
such as using latches. to connections
• Port detector: Detect prohibited ports. For example, this – attributes_file: Path to the CSV file describing which con-
allows it to detect if a partial module tries leaking to a port nections get which attributes.
not belonging to its allocated partial region. • removables
• Path detector: Detect prohibited paths. For example, detect if – connections_file - Path to a text file describing which
a partial module tries accessing a static route that is crossing connections should be removed from the implementation
a partial region (note that we explicitly allow static routes graph before the scans.
which are commonly used in complex designs).
• Antenna detector: Detect dangling paths. This is in most The different available virus signatures can be set up in the
cases rather a warning that a module may have an interface config file by adding the name of the virus signature class under
wire not properly connected. the virus_signatures section, as shown in Table 1.
Virus Scanner flow
netlist GN

Virus detector engines:


●Combinational loop detector
● Signature options parse implementation graph ●Attribute detector
● Positive filter lists ●Port detector
● Connection attribute adding ●Path detector
●Antenna detector
parse scanning options ●Short circuit detector
●Unspecified path detector
●Fanout detector

execute virus scanning

Result represented
in text file

Figure 14: FPGADefender flowchart.

Table 1: Virus signatures in FPGADefender.


Feature Signature Options Description
Ring oscillator ring_oscillator_detection - This detects loops in the given
CombinatorialLoopDetector
detection ignored_attributes_file implementation
Disallowed port node_detection - Can detect disallowed port usages, like
PortDetector
detection disallowed_nodes_file snooping on neighbouring designs
path_detection -
Disallowed path disallowed_begin_nodes_file & Can detect disallowed path usages like
PathDetector
detection disal- paths next to leaky long wires
lowed_destination_nodes_file
short_detection - This detects outputs with multiple used
Short circuit detection ShortCircuitDetector
short_location_file inputs which can cause short circuits
antenna_detection -
Can detect undesired dangling input
Antenna detection AntennaDetector allowed_input_antennas_file &
and output wires
allowed_output_antennas_file
Can detect paths which start or end at
unspecified_path_detection -
specified ports but use disallowed
Unspecified path specified_begin_nodes_file &
UnspecifiedPathDetector routing ports. The detected paths will
detection specified_end_nodes_file &
be from the end ports which don’t start
specified_routing_nodes_file
at the specified start ports
fan_out_begin_nodes_file - Can detect all nodes which are
Fan-out detection FanOutDetector fan_out_begin_nodes_file & connected to too many end nodes. The
fan_out_end_nodes_file threshold is set to 100 currently
Can detect all nodes with the attribute
Attribute detection AttributeDetector -
"LATCH"

To build the executable, firstly a requirements file has to be set p y i n s t a l l e r v i r u s s c a n n e r / __main__ . py −n


for the venv environment variable. With this, we can run: v i r u s s c a n n e r −F −−hidden − i m p o r t =
virusscanner . parsing . signatures .
pip i n s t a l l −r r e q u i r e m e n t s . t x t ring_oscillator_detection

This will install the executables using the PyInstaller tool got To add more than one signature, the .spec file can be modified.
from pip. When building the executable, we have to make sure to
add the virus scanner packages given in the config file as hidden 5 SCAN RESULTS
imports, as shown in the following example: We ran FPGADefender on a benchmark of malicious bitstreams
and this section presents briefly the results. As a sanity check, we
also ran scans on bitstreams that do not contain malicious circuits
and FPGADefender had not reported any issue, except for one case:
a true random number generator that actually uses ring-oscillators
as a source of randomness. In detail we provide the following re-
ports:
• Combinatorial loop and transparent latch detection are re-
ported in Figure 15. The file lists a couple of cycles detected.
Each cycle starts with a status line stating the specific class
of ring-oscillator. FPGADefender supports detecting ROs
through LUTs, cascading multiplexers (MUX7/MUX8 in Xil-
inx notation), CLA carry logic, DSP blocks and latches. After
this the entire first cycle of each class is reported. This can
be identified by the first and last entry of each cycle pointing
to the same node.
• Short-circuits are reported in Figure 16. This section reports
first the number of short circuit situations found and then
list for the first detected switch matrix multiplexer the input
ports activated. Each switch matrix multiplexer can only
connect to no port (if not used) or to at most one of its
available inputs.
• Latches are reported in Figure 17. This section reports latches
used in cycles but also all other latches which are not mali-
cious, but which indicates that the bitstream was not imple-
mented following good RTL design principles.
• Antennas are reported in Figure 18. The report lists the last
port of an antenna which allows investigating the antenna
issue using the Vivado tool suite.
• Fan-outs are reported in Figure 19. The fan-out report lists
the nets with the highest fan-out in the design. The number
of nets reported is specified in the config file.

6 CONCLUSIONS AND DISCUSSION


In this tutorial we provided a small survey on recent FPGA hard-
ware security research and we revealed that in particular ring-
oscillators impose a real world threat. With this, we described how
the academic tool GoAhead can be used to build a Time-to-Digital
Converter for UltraScale+ FPGAs which was used for evaluating
a larger number of ring-oscillator designs. This resulted in one
design that has the enormous waste power potential of over 2kW Figure 15: Report sample for combinatorial loop detection.
on an Alveo U200 data center card and experiments on that board
resulted in a power-induced crash using just 15% of the available
LUT resources of the available VU9P FPGA. In the reminder of this
tutorial, we showed how the open-source tool FPGADefender can
detect (probably all kinds of) ring oscillator designs for mitigating
this threat.
The huge waste power potentials point out that hardware Tro-
jans and other malicious circuits are a real threat and only very
little logic is required to crash a system. We like to stress that this Figure 16: Report sample for short-circuit detection.
is not a vendor-specific problem and the threats discussed in this
tutorial apply to any FPGA from any vendor. However, we also
showed that malicious circuits can be detected automatically and
that this is even possible at the bitstream level. We believe that se-
curity through some level of virus scanning is inevitably needed as
part of an FPGA ecosystem. We also believe that such security tools
[7] D. Gnad, F. Oboril, and M. Tahoori. 2017. Voltage Drop-based Fault Attacks on
FPGAs Using Valid Bitstreams. In 2017 27th International Conference on Field
Programmable Logic and Applications (FPL). IEEE, 1–7.
[8] T. Güneysu and A. Moradi. 2011. Generic Side-Channel Countermeasures for
Reconfigurable Devices. In Cryptographic Hardware and Embedded Systems –
CHES 2011, Bart Preneel and Tsuyoshi Takagi (Eds.). Springer Berlin Heidelberg,
Berlin, Heidelberg, 33–48.
[9] A. Hagberg, P. Swart, and D. Schult. 2014. NetworkX - Software for Complex
Networks. Retrieved Oct 29, 2019 from https://networkx.github.io/
[10] Amazon Inc. 2019. Amazon EC2 F1 Instances. Retrieved Jun 27, 2019 from
https://aws.amazon.com/ec2/instance-types/f1/
[11] Y. Ishai, A. Sahai, and D. Wagner. 2003. Private Circuits: Securing Hardware
Figure 17: Report sample for transparent latch detection. against Probing Attacks. In Advances in Cryptology - CRYPTO 2003, Dan Boneh
(Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 463–481.
[12] N. Kamoun, L. Bossuet, and A. Ghazel. 2009. Correlated Power Noise Generator
as a Low Cost DPA Countermeasures to Secure Hardware AES Cipher. In 2009
3rd International Conference on Signals, Circuits and Systems (SCS). 1–6.
[13] J. Krautter, D. Gnad, F. Schellenberg, A. Moradi, and M. Tahoori. 2019. Active
Fences against Voltage-based Side Channels in Multi-Tenant FPGAs. Retrieved Oct
29, 2019 from https://eprint.iacr.org/2019/1152.pdf
Figure 18: Report sample for antenna detection. [14] J. Krautter, D. Gnad, and M. Tahoori. 2018. FPGAhammer: Remote Voltage
Fault Attacks on Shared FPGAs, suitable for DFA on AES. IACR Transactions on
Cryptographic Hardware and Embedded Systems (2018), 44–68.
[15] J. Krautter, D. Gnad, and M. Tahoori. 2019. Mitigating Electrical-level Attacks
Towards Secure Multi-Tenant FPGAs in the Cloud. ACM Trans. Reconfigurable
Technol. Syst. 12, 3, Article 12 (Aug. 2019), 26 pages.
[16] S. S. Mirzargar and M. Stojilovic. 2019. Physical Side-Channel Attacks and Covert
Communication on FPGAs: A Survey. In Proceedings of the 29th International
Conference on Field-Programmable Logic and Applications (FPL).
[17] K. D. Pham, E. Horta, and D. Koch. 2017. BITMAN: A Tool and API for FPGA
Bitstream Manipulations. In Design, Automation & Test in Europe Conference &
Exhibition (DATE), 2017. IEEE, 894–897.
Figure 19: Report sample for fan-out detection. [18] G. Provelengios, D. Holcomb, and R. Tessier. 2019. Characterizing Power Dis-
tribution Attacks in Multi-User FPGA Environments. In Proceedings of the 29th
can reliably solve any security issue and that even multi-tenancy in International Conference on Field-Programmable Logic and Applications (FPL).
[19] A. Putnam, A. Caulfield, E. Chung, D. Chiou, K. Constantinides, J. Demme, H.
datacenters is well possible. For industry, the best way to address Esmaeilzadeh, J. Fowers, G. Gopal, J. Gray, M. Haselman, S. Hauck, S. Heil, A.
security challenges is by opening architectures, bitstreams, and Hormati, J. Kim, S. Lanka, J. Larus, E. Peterson, S. Pope, A. Smith, J. Thong, P.
tools in order to give the research community best possibilities to Xiao, and D. Burger. 2014. A Reconfigurable Fabric for Accelerating Large-scale
Datacenter Services. In Proceeding of the 41st Annual International Symposium on
develop mitigation strategies. Computer Architecuture (ISCA ’14). 13–24.
With this tutorial, we want to create awareness for FPGA security [20] C. Ramesh, S. Patil, S. Dhanuskodi, G. Provelengios, S. Pillement, D. Holcomb,
and stimulate research to ensure that FPGA security will be treated and R. Tessier. 2018. FPGA Side Channel Attacks without Physical Access. In
2018 IEEE 26th Annual International Symposium on Field-Programmable Custom
in a proactive manner. Computing Machines (FCCM). IEEE, 45–52.
[21] F. Schellenberg, D. Gnad, A. Moradi, and M. Tahoori. 2018. An Inside Job: Remote
Power analysis Attacks on FPGAs. In 2018 Design, Automation & Test in Europe
7 ACKNOWLEDGMENTS Conference & Exhibition (DATE). IEEE, 1111–1116.
[22] F. Schellenberg, D. R. E. Gnad, A. Moradi, and M. B. Tahoori. 2018. Remote Inter-
This work is kindly supported by the National Cyber Security Cen- Chip Power Analysis Side-Channel Attacks at Board-Level. In 2018 IEEE/ACM
tre of the UK through the project rFAS - reconfigurable FPGA Accel- International Conference on Computer-Aided Design (ICCAD). 1–7.
erator Sandboxing (grant agreement 4212204/RFA 15971) and by the [23] T. Sugawara, K. Sakiyama, S. Nashimoto, D. Suzuki, and T. Nagatsuka. 2019.
Oscillator without a Combinatorial Loop and Its Threat to FPGA in Data Centre.
European Commission through the H2020 project EuroEXA (grants Electronics Letters 55, 11 (2019), 640–642.
754337). [24] S. Trimberger and J. Moore. 2014-08. FPGA Security: Motivations, Features, and
We also thank the Xilinx University Program for tools and boards. Applications. Proc. IEEE 102, 8 (2014-08), 1248,1265.
[25] A. Vaishnav, K. D. Pham, D. Koch, and J. Garside. 2018. Resource Elastic Virtual-
ization for FPGAs Using OpenCL. In 2018 28th International Conference on Field
Programmable Logic and Applications (FPL). 111–1117.
REFERENCES [26] A. Wild, A. Moradi, and T. Güneysu. 2018. GliFreD: Glitch-Free Duplication
[1] C. Beckhoff, D. Koch, and J. Torresen. 2010-08. Short-Circuits on FPGAs Caused Towards Power-Equalized Circuits on FPGAs. IEEE Trans. Comput. 67, 3 (2018),
by Partial Runtime Reconfiguration. In 2010 International Conference on Field 375–387.
Programmable Logic and Applications. IEEE, 596,601. [27] M. Zhao and G. Suh. 2018. FPGA-based Remote Power Side-channel Attacks. In
[2] C. Beckhoff, D. Koch, and J. Torresen. 2012. Go Ahead: A Partial Reconfiguration 2018 IEEE Symposium on Security and Privacy (SP). IEEE, 229–244.
Framework. In 2012 IEEE 20th International Symposium on Field-Programmable [28] K. Zick and J. Hayes. 2012-03-01. Low-cost Sensing with Ring Oscillator Arrays
Custom Computing Machines. 37–44. for Healthier Reconfigurable Systems. ACM Transactions on Reconfigurable
[3] J. Danger, S. Guilley, S. Bhasin, and M. Nassar. 2009. Overview of Dual Rail with Technology and Systems (TRETS) 5, 1 (2012-03-01), 1,26.
Precharge Logic Styles to Thwart Implementation-level Attacks on Hardware [29] K. Zick, M. Srivastav, W. Zhang, and M. French. 2013. Sensing Nanosecond-
Cryptoprocessors. In 2009 3rd International Conference on Signals, Circuits and scale Voltage Attacks and Natural Transients in FPGAs. In Proceedings of the
Systems (SCS). 1–8. ACM/SIGDA international symposium on Field programmable gate arrays. ACM,
[4] K. Georgopoulos, K. Bakanov, I. Mavroidis, I. Papaefstathiou, A. Ioannou, P. 101–104.
Malakonakis, K. D. Pham, D. Koch, and L. Lavagno. 2019. A Novel Framework for
Utilising Multi-FPGAs in HPC Systems. 153–189.
[5] I. Giechaskiel, K. Rasmussen, and K. Eguro. 2016. Leaky Wires: Information
Leakage and Covert Communication Between FPGA Long Wires. (2016), 15–27.
[6] I. Giechaskiel, K. Rasmussen, and J. Szefer. 2019. Measuring Long Wire Leak-
age with Ring Oscillators in Cloud FPGAs. In Proceedings of the International
Conference on Field-Programmable Logic and Applications (FPL).

You might also like