0% found this document useful (0 votes)
37 views6 pages

3 Remote - Power - Side-Channel - Attacks - On - FPGAs

Uploaded by

shippu ranjan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views6 pages

3 Remote - Power - Side-Channel - Attacks - On - FPGAs

Uploaded by

shippu ranjan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

This article has been accepted for publication in IEEE Design & Test.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/MDAT.2024.3448371

Remote Power Side-Channel Attacks on FPGAs


Mark Zhao and G. Edward Suh

Keywords—field programmable gate array, ring oscillator, sensitive logic that is co-resident on the same FPGA fabric,
power analysis attack, side-channel attack, system-on-chip targeting deployments where an FPGA is shared among mul-
tiple users or logic regions for efficiency (e.g., in a cloud
I. I NTRODUCTION AND BACKGROUND datacenter environment). We assume that the shared FPGA
Field Programmable Gate Arrays (FPGAs) are widely used has common security mechanisms to ensure isolation between
to accelerate sensitive applications in datacenters and beyond. different users; the attack and victim circuits are both logically
For example, Microsoft heavily utilizes FPGAs in its data- partitioned (i.e., there are no illicit connections between the
centers for tasks ranging from network security to machine two modules) and physically partitioned (i.e., with a ‘fence’
learning [1]. Amazon offers FPGA instances in its EC2 of unused configurable logic blocks) [3].
service, allowing customers to rent FPGAs and accelerator de- Secondly, an FPGA-to-CPU attack uses malicious FPGA
signs for applications like genomics sequencing. Furthermore, logic to extract a secret from a software process (including the
hardware vendors such as Intel and AMD have introduced kernel itself) running on a CPU within the same SoC. Here,
heterogeneous System-on-Chip (SoC) designs which integrate we assume an SoC architecture that contains both FPGA fabric
both processing cores and FPGA fabric in one silicon die. and traditional processing elements such as CPUs and GPUs.
These FPGA-SoCs allow applications to leverage both the We assume that the system has proper protection mechanisms
programmability of CPUs and efficiency of FPGAs in one (e.g., CPU privilege modes) to prevent direct accesses from
device, showing utility in diverse deployments ranging from the FPGA fabric to the rest of a system used by another user
medicine to defense. or process.
As these applications become increasingly reliant on FPGAs Demonstrating and understanding the mechanisms and lim-
to improve their performance and energy efficiency, systems itations of the FPGA remote power side channel is the first
are deploying both trusted and untrusted logic within the same step to mitigating this powerful class of attacks.
FPGA device. For example, in cloud FPGAs, untrusted user
logic is co-resident with privileged OS-like “shell” control II. T HE FPGA-BASED P OWER M ONITOR
logic. Meanwhile, recent works have proposed FPGA virtu-
alization and enclave-like mechanisms to share FPGAs and A. Operating Principle
compartmentalize trusted logic [2]. Understanding the security We first explore the principles behind how an on-chip power
implications of co-resident untrusted FPGA logic is essential. monitor can be built via software-programmed logic on a
To this end, this article explores a key security vulnerability modern FPGA.
we discovered in 2018 that can be exploited to perform power The power demanded by a CMOS circuit can be modeled
side-channel attacks in software, without requiring physical as the sum of the static and dynamic components of power
access or proximity to the target system. Power side-channel consumption. As power side-channel attacks often leverage
attacks infer confidential information based on the data- data-dependent changes in the power consumption, we only
dependent variations in a target system’s power consumption. focus on monitoring the dynamic power consumption, Pdyn .
In order to obtain power traces, attackers typically insert a low- For one CMOS cell, the average dynamic power consumption
impedance resistor in series with the power supply and use an can be modeled as the sum of charging and short-circuit power
2
oscilloscope to measure the power consumption as the voltage consumption, Pdyn = Pchrg +Psc , where Pchrg = αf CL VDD
drop across the resistor. Thus, these power side-channel attacks and Psc = αf VDD Ipeak tsc . α is the activity factor, f is the
historically required physical access to the target system. clock frequency, VDD is the supply voltage, CL is the load
In contrast, we assume a threat model where an adversary capacitance, Ipeak is the current peak caused by the switching
can program a part of an integrated FPGA (e.g., as a cloud event, and tsc is the short-circuit time. The dynamic power
FPGA tenant or via third-party IP core) and implement a increases proportionally with the activity of the circuit, α.
circuit of their choice. However, they have no physical access A power distribution network (PDN) converts and dis-
or proximity to the target system itself and therefore cannot tributes power from the power supply to individual circuit
directly measure physical properties such as power consump- components. The goal of the PDN is to provide a clean voltage
tion. Instead, the adversary can leverage the inherent properties supply resistant to varying current demands. To maintain a
of FPGA devices to create an on-chip power monitor using constant voltage, a PDN uses a voltage regulator to adjust the
the programmable logic of an FPGA (i.e., software-only), amount of supplied current and uses decoupling capacitors as
allowing them to measure dynamic power consumption with a buffer to handle current variations. However, the voltage
sufficient resolution to enable power analysis attacks. regulator and the decoupling capacitors cannot completely
We explore two types of attacks in this article. First, an hide current variations, and high switching activities often
FPGA-to-FPGA attack uses malicious FPGA logic to attack lead to transient voltage drops in the PDN of an FPGA. In

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on August 26,2024 at 11:37:38 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Design & Test. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/MDAT.2024.3448371

8
1 T Q T Q T Q 10

enable RO Data

Avg. Osc. Freq (Hz)


Q Q Q 9.1 Linear Regression

count[0] count[1] count[n] 9

Fig. 1. A Ring Oscillator (RO) based on-chip power monitor design.


8.9

0 2000 4000 6000 8000 10000 12000 14000 16000


other words, the voltage drop on the PDN reflects the power Number of Power Virus Instances
consumption.
The PDN can be modeled as an equivalent RLC matrix. Fig. 2. The average RO frequency versus the power consumption level. A
linear regression is shown to fit the data.
Thus, the transient voltage drop seen by a circuit can be
approximated by the following equation.
system clock) running with frequency fRef . We enable both
di
Vdrop = IR + L (1) the RO circuit and the reference counter at the same time. The
dt RO is allowed to run until the reference counter reaches a pre-
The voltage drop depends on both steady-state current determined sampling cycle count CRef . Then, we disable the
di
consumption (IR drop) as well as short transients ( dt drop) RO and read the TFF counter CRO . We can then calculate
caused by switching logic on the FPGA. In typical CMOS f
the RO frequency using Equation 3, where ε ∈ [0, CRef Ref
]
circuits, combinational logic delays can be modeled to be is the quantization error introduced by the phase difference
inversely proportional to the voltage supplied to each gate. between the two clock pulses. We use 16-bit counters for our
Therefore, a change in the combinational logic delay reflects experiments, but the bitwidth may be increased via additional
the voltage drop, which reflects the power consumption and TFFs.
the switching activity of circuits programmed on the FPGA.
This correlation between combinational logic delay and the fRef
fRO = CRO ∗ +ε (3)
power consumption can be leveraged to build an on-chip power CRef
monitor on an FPGA. In essence, we can program an FPGA There exists an inherent trade-off between time and power
with a specific circuit – a ring oscillator (RO) – that allows resolution of the RO-based power monitor; distinguishing a
us to measure a combinational path delay. We can then use small difference in power consumption requires running ROs
this delay to estimate the power consumption of other modules long enough for that difference to show up as a sufficient
that share the PDN. change in the oscillation count, but at the cost of reduced time
An RO consists of an odd number of inverters connected resolution. Furthermore, RO frequency drops are affected by
in series along with an AND gate such that the output of the their spatial location on the FPGA fabric. Thus, we instantiate
last inverter is combinationally fed back into the input of the multiple ROs throughout the FPGA and average their counters
AND gate. The other input of the AND gate is connected for the final power monitor value. We provide further analysis
to an enable signal. The oscillation frequency of the RO is of the power monitor resolution in our full paper [4].
thus inversely proportional to the time that a signal takes
to propagate twice around the circuit. In addition to voltage
B. Power Virus Demonstration
variations, the propagation delay is dependent on process and
temperature variations. However, by reducing the number of To demonstrate the efficacy of the RO-based power monitor,
stages in the RO, one can both reduce the temperature variation we used a 28nm Zynq-7020 SoC, which integrates a hardened
dependency as well as increase the resolution of the ring dual-core ARM Cortex-A9 with an Artix-7 equivalent FPGA
oscillator. The oscillation frequency can then be approximated with 53,200 LUTs. We instantiated a network of 20 RO circuits
by Equation 2, where k and f0 are constants, and V (x, y, t) as our power monitor, distributed throughout the FPGA fabric.
is the transient supply voltage at the RO’s location. We then created a “power virus” that consists of one flip-flop
whose output is fed through an inverter followed by an AND
fRO ≈ k ∗ V (x, y, t) + f0 (2) gate and back into its own input port, creating a circuit that can
be enabled to switch each clock cycle. We instantiated 16,000
The above equation suggests that the RO can be used as instances of the power virus on the FPGA fabric so as to
a voltmeter if we can measure its frequency. To do so, we cover the majority of the FPGA. We then performed successive
construct the RO counter circuit shown in Figure 1. We use experiments, increasing the number of enabled power viruses
only one inverter stage to reduce temperature dependency and by 500 with each step. For each step, we recorded 1,000,000
improve resolution. We feed the output of the RO circuit to power measurements with a sampling period of 10µs (i.e.,
the clock input of a counter that increments every oscillation fRef =100 MHz and CRef =1000 in Equation 3).
period. As the RO is oscillating much faster than the system Figure 2 shows how the RO can accurately measure dynamic
clock, we construct the counter as a chain of T-flip-flops (TFF) power consumption. The baseline average frequency across all
to eliminate slow carry chains. Alongside the TFF counter, we 20 ROs, with no additional switching activity, was 914.7MHz.
use a second counter with a reference clock (i.e., the FPGA A linear correlation which very closely models the correlation

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on August 26,2024 at 11:37:38 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Design & Test. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/MDAT.2024.3448371

mod exp (M, d , N)


{
R = 1
S = M
f o r ( i = 0 t o n −1)
{
i f ( d mod 2 == 1 )
R = R* S mod N
S = S * S mod N
d = d >> 1 Fig. 3. A zoomed-in view of the RSA power trace showing 16 key bits.
}
return R
} Importantly, we observe that if the current bit of the
Listing 1. Pseudo-code for a square-and-multiply modular exponentiation. exponent is 1, then both multipliers will be active, resulting
in high switching activity in the FFs and LUTs that store and
compute the multiply’s intermediate results. However, if the
between switching activity and oscillation frequency can be exponent bit is 0, then only the squaring multiplier’s logic
constructed as f (x) = −1800x + 9.147 ∗ 108 with an R- will switch, while the other multiplier’s logic will largely be
squared value of 0.9966, where x is the number of power idle. Thus, the power consumption will be different between
virus instances actively switching. The RO can effectively an iteration with an exponent bit of 1 and an iteration with
resolve minute power variations using a simple linear model, an exponent bit of 0. As a result, the RSA cryptomodule is
demonstrating its utility as a remote power monitor. vulnerable to a Simple Power Analysis (SPA) attack [5]. While
SPA has been demonstrated on RSA numerous times since
III. FPGA- TO -FPGA P OWER A NALYSIS ATTACK the technique’s introduction by Kocher et al. in 1999, no prior
A. RSA Cryptomodule work has demonstrated a successful attack without physical
We next demonstrate how the RO-based power monitor access to the victim device.
can successfully perform an FPGA-to-FPGA power side-
channel attack, targeting an FPGA-based RSA cryptomodule.
B. Remote SPA Attack
RSA decryption requires the computation of a large modular
exponentiation M = C d mod N , where C is the ciphertext In contrast, we demonstrate a remote attack by instantiating
message, d is the private key exponent, N is the RSA modulus, the RSA cryptomodule alongside the RO-based power monitor.
and M is the decrypted plaintext. The private key exponent and We consider three separate cases, ranging from least to most
modulus are typically 1,024 to 4,096 bits. Because modular ex- restrictive from the attacker’s perspective. In the PR case, the
ponentiation underpins much of modern communication, many attacker can define place and route constraints for the RO
devices rely on dedicated cryptomodules for its computation. circuits, placing them close to the victim logic. In the ISO case,
One simple algorithm that many of these cryptomodules use we enforce a physical isolation of unused logic between the
is square-and-multiply, which decomposes the exponent d into power monitor and the RSA cryptomodule. Finally, NoPR does
the sum of successive powers of 2 (i.e.. binary notation). The not allow any user-defined placement constraints, allowing the
cryptomodule can then easily perform the modular exponen- design tool to automatically place-and-route the RO circuits.
tiation as shown in Listing 1. We generated ten 1,024-bit private keys using OpenSSL and
To study if the RO-based power monitor is accurate and used the same keys within the cryptomodule across all three
fast enough to perform a power analysis attack in practice, configurations. We then used a control program running on the
we implemented a 1,024-bit square-and-multiply circuit on ARM core to instruct the cryptomodule to decrypt a message
the aforementioned Zynq FPGA. The circuit consists of two using one of the ten keys, randomly choosing a private key
dedicated modular multiplication modules (using shift-and- and a decryption start time. Concurrently, we enabled the RO-
add), as well as a state machine that iterates through each based power monitor and continuously recorded its counter
bit of the 1,024-bit input exponent, starting at the least- using a separate process on the ARM core at a sampling rate
significant bit. One multiply module is dedicated to compute of approximately 2 MHz. We exported the power monitor trace
the square term S = (S ∗ S) mod N , while the other module to a desktop for further analysis.
is used to compute the multiplication R = (R ∗ S) mod N . Figure 3 presents a zoomed-in view on a portion of a power
However, if the current least-significant bit of the iteratively- trace recorded in the PR configuration. The vertical bars delin-
shifted exponent is 0, the second multiplier instead computes eate the boundary between successive modular multiplications.
R = (R ∗ 1) mod N . Both multipliers are synchronized to From one raw power trace, it is possible to visually identify the
start computation at the same cycle for each iteration of the difference between one and two switching modular multipliers,
loop. The cryptomodule runs at 20MHz on the relatively low- which correspond to a key bit of 0 and 1, respectively. It is thus
end Zynq-7020, performing a modular exponentiation every possible to perform a SPA attack to recover the full 1,024-bit
52.4ms. The module requires 15,561 LUTs and 12,843 FFs. decryption key.

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on August 26,2024 at 11:37:38 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Design & Test. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/MDAT.2024.3448371

We automated the attack with a MATLAB script. As shown

Power Monitor Output


in Equation 4, we observe that the recorded power consump- 9600
tion is the sum of the dynamic power consumption of an RSA
9580
module, the dynamic power consumption of other activities,
the constant static power consumption, and electronic noise, 9560
which we consider to be all independent. To identify the sec-
tion of a power trace that corresponds to the RSA computation 9540
(i.e., PRSA ), we first record a power trace that only captures
2.9 2.95 3 3.05 3.1 3.15
background noise (i.e. Pother + Pstatic + Pnoise ) when the Sample Count 105
RSA module is idle. We then filter the trace through a low-
pass filter to remove high-frequency electronic noise. We can Fig. 4. A zoomed-in view of the power trace for constant-time modular
exponentiation in software.
then create a threshold that corresponds to the maximum and
minimum background power consumption.
logic. Observing CPU activity via the power side channel can
Ptotal = PRSA + Pother + Pstatic + Pnoise (4) introduce timing-based attacks.
Next, we perform the same low-pass filtering on a power Specifically, timing channel attacks infer confidential infor-
trace that includes an RSA computation. We can easily identify mation by observing data-dependent timing variations from
the portion of the trace corresponding to the RSA computation sources such as caches, program execution, and network
by observing the first and last samples that go beyond our latencies. Common timing attack countermeasures wrap com-
baseline power consumption threshold. We subtract Pstatic putations in a timing mitigator that delays the return time of the
away from the power trace, leaving us with low-frequency computation, reducing or removing the data-dependent timing
noise and PRSA . Then, we can divide the section of the channel [6].
power trace corresponding to the RSA computation into 1,024 The FPGA-based power side channel breaks the basic
segments, one for each key bit, and simply classify each assumption in these timing-channel protection schemes, which
segment as a 0 or 1 based on the average of the power assumes that without physical access, the power – and thus ac-
trace values over the segment. We found that this algorithm tivity – of the CPU itself cannot be observed. We demonstrate
is sufficient to recover most of the key bits accurately. The that this assumption does not hold by demonstrating that the
accuracy can be further improved by performing the attack underlying activity of a CPU process, implementing a delay-
on multiple traces for the same key and taking the majority based mitigation technique, can be observed via the remote
vote for each bit. Identifying the exact trigger to capture a power side channel.
power trace is an orthogonal and well-studied problem, often We implemented a software program that performs RSA
dependent on the unique characteristics of each device (e.g., modular exponentiation using the square-and-multiply algo-
known command sequences). rithm on a 4,096-bit key. Since each square-and-multiply
In fact, we successfully recovered all ten private keys across iteration requires more CPU cycles (and thus leaks timing
the three configurations using the automated algorithm, requir- information) if a key bit is 1, we leveraged a delay-based
ing 3.7, 8.9, and 11.4 traces on average to fully recover all 10 mitigation to make each iteration constant-time regardless of
keys in the PR, ISO, and NoPR configurations, respectively. values of the RSA key. We allowed the RSA process to run
The RO-based power monitor enables accurate, automated, on an ARM core on the Zynq device. Simultaneously, we
and remote power analysis attacks. Importantly, the ISO and ran the power monitor on the SoC’s FPGA (using a sampling
NoPR results further show two important takeaways. First, period of 50 cycles), and we used a separate attacker process
physically separating sensitive logic from malicious logic to continuously record the power monitor’s trace.
using a ‘fence’ of CLBs [3] (as in ISO) is not an effective Figure 4 shows a zoomed-in view of the power trace on 32
countermeasure. While the attack required more traces, the iterations of the square-and-multiply loop. The first 16 bits (16
power side-channel leaks leverages the FPGA’s PDN, which LSBs) of the secret key are all set to 0, while the next 16 bits
spans across the fence. Secondly, forbidding user-defined are all set to 1. The power trace clearly shows that the first
placement constraints (as in NoPR) is a similarly ineffective half of the trace, corresponding to 16 bits of 0, has periods
countermeasure. While the frequencies of individual ring oscil- where the power consumption is significantly lower (i.e. the
lators can vary significantly due to the design tool’s automatic power monitor output is higher) than the latter half of the
placements, our attack was still successful as it relies on power trace. These periods correspond to the delays inserted
relative frequency differences. for timing channel protection. On the other hand, periods
with low power consumption are much shorter in the second
half of the trace, suggesting that there is no significant idle
IV. FPGA- TO -CPU ATTACK
time. Thus, an attacker can simply identify the corresponding
The remote power analysis attack is not limited to FPGA key bit for each iteration by looking at the trace even if the
logic. In SoCs, modules on the same die commonly share the execution time of the program is fixed. Importantly, this attack
same power supply. Thus, switching activity from the CPU bypasses traditional process isolation mechanisms including
or other modules will cause a voltage drop in the FPGA kernel-mode privilege.

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on August 26,2024 at 11:37:38 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Design & Test. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/MDAT.2024.3448371

V. D ISCUSSION [4] M. Zhao and G. E. Suh, “FPGA-based remote power side-channel


attacks,” in 2018 IEEE Symposium on Security and Privacy (SP), May
The very nature of FPGAs provides users with fine-grained 2018, pp. 229–244.
control over reconfigurable fabric. As heterogeneous FPGA- [5] P. C. Kocher, J. Jaffe, and B. Jun, “Differential power analysis,” in 19th
Annual International Cryptology Conference on Advances in Cryptology
based systems become more widely deployed in systems (CRYPTO), 1999, pp. 388–397.
ranging from cloud datacenters to mobile platforms, the se- [6] A. Askarov, D. Zhang, and A. C. Myers, “Predictive black-box miti-
curity implication of allowing untrusted parties to remotely gation of timing channels,” in 17th ACM Conference on Computer and
Communications Security (CCS), 2010, pp. 297–307.
implement custom circuits on an FPGA becomes more sig- [7] I. Giechaskiel, K. B. Rasmussen, and J. Szefer, “C3apsule: Cross-fpga
nificant. We demonstrate that the FPGA introduces a new covert-channel attacks through power supply unit leakage,” in 2020 IEEE
security vulnerability by allowing untrusted users to remotely Symposium on Security and Privacy (SP), 2020, pp. 1728–1741.
[8] M. Lipp, A. Kogler, D. Oswald, M. Schwarz, C. Easdon, C. Canella,
implement a power monitor that is capable of inferring the and D. Gruss, “Platypus: Software-based power side-channel attacks on
level of dynamic switching activities. This power monitor x86,” in 2021 IEEE Symposium on Security and Privacy (SP), 2021, pp.
circuit can not only measure the power consumption of circuits 355–371.
[9] S. Moini, S. Tian, D. Holcomb, J. Szefer, and R. Tessier, “Remote power
on the reconfigurable fabric, but also observe the power side-channel attacks on bnn accelerators in fpgas,” in 2021 Design,
consumption of other modules such as a CPU (or a GPU) Automation & Test in Europe Conference & Exhibition (DATE), 2021,
on a heterogeneous SoC device. While this article explores pp. 1639–1644.
[10] T. M. La, K. Matas, N. Grunchevski, K. D. Pham, and D. Koch,
a few possible attacks enabled by these circuits, our full “Fpgadefender: Malicious self-oscillator scanning for xilinx ultrascale
paper [4] explores additional attacks, power monitor circuits, + fpgas,” ACM Trans. Reconfigurable Technol. Syst., vol. 13, no. 3, sep
and potential countermeasures. 2020. [Online]. Available: https://doi.org/10.1145/3402937
Importantly, alternative circuits to ROs can be also used to
implement a remote power monitor. A delay line, composed
of a long chain of logic elements, can measure the propagation
delay of a signal as it passes through the chain. This delay may
then be digitized by latching various points along the chain at
fixed periods, generating a thermometer-coded power signal.
Since its publication, our paper has motivated a new field of
research into remote power side channels, applying our core
idea across a myriad of FPGA-enabled devices and beyond.
Recent work (e.g., [7]) has shown how the power side channel
is visible in Intel FPGAs, directly on AWS EC2 F1 instances,
across SLR dies, and even across discrete ICs sharing the same
power supply. Other follow-up studies (e.g., [8]) also explored
different remote power monitors such as long wires, CPU
interfaces, and microphones. Recent research (e.g., [9]) also
expanded the targets beyond cryptomodules, including neural
networks, multitenant FPGAs, and voltage attacks. Finally, a
line of work (e.g., [10]) has further pursued our proposed
countermeasures and beyond, including masking, active fences
and bitstream scanning. We believe that our work will continue
to motivate future research, as new attack variants leverage the
power side channel and new countermeasures are created in
response.

R EFERENCES
[1] J. Fowers, K. Ovtcharov, M. Papamichael, T. Massengill, M. Liu, D. Lo,
S. Alkalay, M. Haselman, L. Adams, M. Ghandi, S. Heil, P. Patel,
A. Sapek, G. Weisz, L. Woods, S. Lanka, S. K. Reinhardt, A. M.
Caulfield, E. S. Chung, and D. Burger, “A configurable cloud-scale
DNN processor for real-time AI,” in Proceedings of the 45th Annual
International Symposium on Computer Architecture, ser. ISCA ’18.
Piscataway, NJ, USA: IEEE Press, 2018, pp. 1–14. [Online]. Available:
https://doi.org/10.1109/ISCA.2018.00012
[2] M. Zhao, M. Gao, and C. Kozyrakis, “Shef: Shielded enclaves for
cloud fpgas,” in Proceedings of the 27th ACM International Conference
on Architectural Support for Programming Languages and Operating
Systems, ser. ASPLOS ’22. New York, NY, USA: Association
for Computing Machinery, 2022, p. 1070–1085. [Online]. Available:
https://doi-org.stanford.idm.oclc.org/10.1145/3503222.3507733
[3] T. Huffmire, B. Brotherton, G. Wang, T. Sherwood, R. Kastner, T. Levin,
T. Nguyen, and C. Irvine, “Moats and drawbridges: An isolation
primitive for reconfigurable hardware based systems,” in 2007 IEEE
Symposium on Security and Privacy (S&P), 2007, pp. 281–295.

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on August 26,2024 at 11:37:38 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Design & Test. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/MDAT.2024.3448371

Mark Zhao is a Ph.D. student in electrical engineering at Stanford University.


His research interests centers around building performant, scalable, and
secure systems for datacenter-scale applications such as machine learning.
He is a student member of the ACM. Contact him at [email protected].

Shipping Information: 801 Washington Street, Apartment 315, Mountain


View CA 94043

G. Edward Suh is a Senior Director of Research at NVIDIA, where he leads


a group in security and privacy research. He is also an Adjunct Professor in
electrical and computer engineering at Cornell University. He is a Fellow of
the IEEE. Contact him at [email protected].

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on August 26,2024 at 11:37:38 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.

You might also like