Redactor
Redactor
Abstract—With the ever-increasing integration of artificial the delay and area overheads associated with replacing these
intelligence into daily life and the growing importance of well- specific modules with eFPGAs.
arXiv:2501.18740v1 [cs.CR] 30 Jan 2025
trained models, the security of hardware accelerators supporting To the best of our knowledge, there is a gap in the existing
Deep Neural Networks (DNNs) has become paramount. As a
promising solution to prevent hardware intellectual property literature on how eFPGA redaction affects the security of
theft, eFPGA redaction has emerged. This technique selectively DNN accelerators. Specifically, there is limited exploration
conceals critical components of the design, allowing authorized of the implications on timing, area, and power overheads
users to restore functionality post-fabrication by inserting the when integrating eFPGA to replace a segment of a hardware
correct bitstream. In this paper, we explore the redaction of accelerator. The contributions of this paper are as follows:
DNN accelerators using eFPGAs, from specification to physical
design implementation. Specifically, we investigate the selection • Proposing an eFPGA redaction flow to hide sensitive
of critical DNN modules for redaction using both regular components of a DNN accelerator with low overhead;
and fracturable look-up tables. We perform synthesis, timing • Providing guidance on selecting a critical module to be
verification, and place & route on redacted DNN accelerators. replaced with regular or fracturable LUTs;
Furthermore, we evaluate the overhead of incorporating eFPGAs
into DNN accelerators in terms of power, area, and delay, finding • Evaluating the overhead of the proposed redaction
it reasonable given the security benefits. method in terms of area, power, and delay as well as
Index Terms—Hardware Security; Deep Neural Networks; the security against oracle-guided SAT-based attacks.
Hardware Accelerators; eFPGA Redaction; Physical Design The rest of the paper is organized as follows: Section II
I. I NTRODUCTION discusses the background, preliminaries, and related works.
Section III proposes our contributions to redact critical IPs
Deep Neural Network (DNN) technologies continue to of a DNN accelerator via eFPGAs, followed by verification,
evolve, propelled by the quest for enhanced accuracy. Achiev- synthesis, and place & route steps. Section IV depicts the com-
ing superior accuracy in DNNs necessitates high-performance prehensive experimental results on the overhead and security
computing resources, such as hardware accelerators. As ac- of the redacted accelerator. Finally, conclusions are given in
curacy and throughput become increasingly critical, the im- Section V.
portance of security for DNN models and accelerators also
escalates. II. BACKGROUND AND P RELIMINARIES
Attackers have the capability to reverse engineer the In- In this section, we discuss the background on DNNs,
tellectual Property (IP) of hardware accelerators, enabling accelerators, eFPGAs, and existing work on securing hardware
them to create unauthorized substitute models. Additionally, accelerators.
they can tamper with the weights of the DNN, resulting in
alterations to accuracy and misclassification of inputs [1]. A. Deep Neural Networks
To address these security concerns, logic locking [2]–[7] has DNNs are employed in various tasks such as image clas-
emerged using various types of key gates in which only sification [18] and speech recognition [19]. The input layer
authorized users can unlock the circuit. However, logic locking of DNNs receives inputs, and outputs are generated through
faces challenges from SAT-based attacks [8]–[13] that exploit weighted sums followed by non-linear activation functions.
an activated Integrated Circuit (IC) as a reference point along These outputs then proceed to subsequent layers. Initially set
with the leaked locked netlist, allowing them to extract the randomly, weights and biases are adjusted during the training
correct key. In response to these threats, programmable fabrics phase to minimize disparities between expected and actual
exemplified by embedded Field-Programmable Gate Arrays outputs.
(eFPGAs) offer robust defenses against reverse engineering A Convolutional Neural Network (CNN) is a type of DNN
and hardware IP theft [14]–[17]. The configuration bitstream comprising four main layers: convolutional, normalization,
remains accessible solely to the designer or authorized user, pooling, and fully connected layers. In the convolutional
even if an untrusted foundry manufactures the design. The layer, inputs and outputs are processed as 2D or 3D arrays,
challenges in eFPGA redaction encompass identifying which with dimensions represented by height, width, and number of
modules within the IP should be redacted and minimizing channels. The convolutional layer uses 3D filters to calculate
(a) 4x4 fabric (b) Tile components (c) BLE components
the inner product with sub-arrays of the input, moving a systems. Each tile within the eFPGA architecture comprises
sliding window of the filter size across the input with a fixed two fundamental elements that collectively enable its pro-
stride. Normalization layers adjust activation levels within grammability and functionality. The first component is the
each feature map, aiding in stabilizing the learning process Configurable Logic Block (CLB), which serves as the pri-
by keeping inputs within a reasonable range. Pooling layers mary unit for implementing logic functions and user-defined
in CNNs down-sample input feature maps to create lower- designs. The second component is programmable routing,
resolution versions, retaining essential information while ex- which facilitates the interconnection of various CLBs and
cluding irrelevant details. Operations such as average pooling enables the flow of data between them. Within each CLB,
and max pooling are applied independently to each feature there is a Basic Logic Element (BLE) that incorporates a K-
map, reducing dimensions while maintaining the number of input Look-Up Table (LUT) capable of mapping a K-input
channels. Fully connected layers in CNNs determine class single-output Boolean function, alongside a flip-flop and a 2-
scores by processing outputs from preceding layers. Activation 1 routing multiplexer used to toggle between sequential and
functions like Rectified Linear Unit (ReLU) introduce non- combinational logic.
linearity. Fig. 1 provides an example of a 4x4 eFPGA architecture,
the components inside each tile, and a visual representation of
B. Hardware Accelerators
a basic BLE setup with a 4-input LUT. Studies have identified
DNN models demand substantial computational resources, the optimal LUT size for balancing area and delay to be
with tasks such as image classification requiring millions of between 4 to 6, with 6-LUTs exhibiting superior performance
weights and over half a billion Multiply And Accumulate and 4-LUTs occupying the smallest area [26]. To address
(MAC) operations, necessitating specialized accelerators for these trade-offs and enhance LUT utilization, a refined version
real-time execution. DNN architectures require Arithmetic known as fracturable LUTs (FLUTs) has been introduced.
Logical Units (ALUs) [20], [21] that focus on MAC units for FLUTs have more than one mode of operation: they can
computational resources and flexibility, dataflow optimization function as a K-input LUT or be fractured into smaller K-1
[22], [23] to minimize energy consumption, and sparsity [24] LUTs, with shared inputs [27]. Within the CLB, BLEs can
to skip zero multiplications and reduce power consumption. be organized into logic clusters to optimize eFPGA speed
Eyeriss [25] is a Row-Stationary (RS) dataflow-based accel- and area efficiency. The ideal cluster size (N ) is typically
erator designed to minimize data movement energy in spatial determined to be between 4 and 10 [28]. Crossbar routing
architectures. In the RS dataflow, a row of operands is stored in interconnects the inputs and outputs of CLBs and BLEs using
the Register File (RF). Eyeriss employs diagonal connections a series of programmable multiplexers, ensuring connectivity
of Processing Elements (PEs) for input reuse and vertical between all BLEs and every CLB pin. Switch Blocks (SBs)
accumulation of partial sums. Each PE includes local registers and Connection Blocks (CBs) serve as the global routing that
for storing at least one row of weights and activations, along connects CLBs together.
with a MAC unit and controller responsible for the temporal
reuse of MAC units in 1D convolution. A significant portion D. Related Works
of the RF is allocated to weights, and input vectors are reused
to calculate partial sums for multiple output feature maps. With the recent boom in Artificial Intelligence (AI), de-
veloping new approaches to safeguard the DNN models and
C. Embedded Field Programmable Gate Arrays accelerators is of the utmost importance. In [29], fault injection
Modern eFPGA architectures adopt a tile-based structure, attacks are examined that are capable of inducing misclassifi-
with each tile containing configurable logic resources. Sur- cation in DNN by altering the bias in the output layer to favor
rounding these tiles are the I/O blocks, which facilitate the desired adversarial class. As a countermeasure in [30] the
communication between the eFPGA and external devices or fault-sensitivity of individual neurons within a given DNN is
measured by effectively leveraging both external and internal TABLE I: Critical IPs redacted
redundancy within DNN models to balance system robustness Critical # of # of Description
against hardware overhead. IP Modules I/Os
In addition, a Hardware Protected Neural Network (HPNN)
OWMC 5 20 Controls the flow of DNN weights
framework is proposed [31] to safeguard DNNs from attack-
ers with extensive knowledge. It conceals the weight space MUXDC 4 15 Controls dataflow between PEs
through a confidential HPNN key, controlling each neuron’s OMDC 20 26 Controls the convolution process
functionality. In [32] a hardware key is introduced to secure PEDC 1 5 Sets/resets output register of PEs
the accelerator such that a wrong key increases memory
access, inference time, and energy consumption, making it
unsuitable for inference. This method requires modifying the this adaptation is considered energy-efficient as it effectively
activation function to prevent bypassing the hardware key minimizes redundant access to off-chip memory.
controlled block. Moreover, they use a model key to obscure
the model without the need of retraining the model. Although A. Critical IP Selection and Fabric Generation
this approach resulted in decreased accuracy and higher mem- The goal of module selection is to choose a module that
ory access when incorrect keys were applied, it necessitates plays a crucial role in the overall functionality of the acceler-
modifications in the DNN accelerator. ator. In this regard, three options can be considered:
Initial logic locking solutions utilize XOR-based and MUX-
• Opting for a module that significantly impacts the outputs
based mechanisms [33], [34]. However, the oracle-guided SAT
proves beneficial for causing output corruption when an
attack [8] exposes vulnerabilities in these methods, prompting
unauthorized user loads the wrong bitstream.
the development of more robust techniques [3], [5], [6], [35]–
• Selecting a module that causes error propagation through-
[40], which increase the time complexity of SAT attacks.
out the system can amplify the impact of using incorrect
Furthermore, a soft embedded FPGA redaction method is
bitstreams and decrease the accuracy of the DNN model.
introduced [17] to hide critical IP functionality and routing
• Choosing a module that can be implemented using a
within RTL designs. A critical IP in the Verilog file is
complex fabric with a large unroll factor can be beneficial
identified and synthesized using Yosys [41], then place &
for securing against existing attacks that target eFPGAs
route is performed to determine the smallest eFPGA fabric
while ensuring it remains within the power, area, and
needed. The resulting fabric, devoid of critical IP information,
delay budget.
is loaded with a bitstream to maintain functionality. In [42] the
effects of varying parameters (K and N ) on area, power, delay, The first critical IP selected is the On-Chip Weight Memory
and security of eFPGA architectures are compared. Increasing Controller (OWMC), which has 12 outputs, some of which
K influences CLB inputs, impacting LUT sizes and routing, directly and indirectly impact the loading of DNN weights
while increasing N adds BLEs, affecting area and routing from the weight memory to the data flow block comprising
complexities. Findings challenge the assumption that fabric numerous Processing Elements (PEs). Protecting these weights
size directly correlates with security strength. ensures that sensitive information remains secure from unau-
In [11], the assumption that eFPGA-based designs are inher- thorized access or corruption. Subsequently, we iterate through
ently secure against oracle-guided attacks has been challenged. the redaction process to redact several IPs in the accelera-
The researchers conducted two attacks: CycSAT [9], which tor using eFPGAs of varying sizes and architectures. These
aims to break cycles within the circuit, and IcySAT [10]. IPs include the Multiplexers Dataflow Controller (MUXDC),
IcySAT took longer due to the unrolling process. However, the On-Chip Memory Dataflow Controller (OMDC), and the
CycSAT struggles to break hard loops, leaving at least one Processing Elements Dataflow Controller (PEDC). Table I
loop intact and causing the SAT solver to repeat iterations. displays the characteristics of each critical IP.
To address this, the paper developed a two-phase attack called To generate the eFPGA fabrics, we use OpenFPGA [45].
Break & Unroll. The first phase breaks cycles sequentially, We input three files into OpenFPGA: the benchmark file,
creating a new non-cyclic constraint. If hard cycles persist, the which contains the critical RTL Verilog portion of the selected
second phase unrolls the circuit, duplicating gates and breaking module for redaction, and two architecture files. One is the
feedback connections to prevent infinite loops. OpenFPGA architecture file [46], and the other is the VPR
architecture file [47]. Both files are in Extensible Markup
III. REDACTOR Language (XML) format and together specify the architecture
of the eFPGA.
In this section, we propose REDACTOR, an eFPGA We select a fabric architecture to achieve 100% block
REDaction framework for CNN ACceleraTORs shown in utilization and over 90% I/O utilization. This is accomplished
Fig. 2. Without loss of generality, we utilize the accelerator by manually adjusting the number of BLEs (N ) per CLB
outlined in [43], which is a modification of the Broadcast, and the number of I/O pins per tile until the target utilization
Stay, Migration (BSM) dataflow initially proposed in [44]. Al- is reached, while maintaining a constant 4-input LUT/FLUT
though the procedure is independent of the chosen accelerator, (K=4). The number of CLB inputs can be determined using
Fig. 2: The proposed eFPGA redaction flow for DNN accelerators
the following formula, known to provide favorable Power, operate as the chosen module for redaction. Table II presents
Performance, and Area (PPA) characteristics [48]. the eFPGA parameters used.
B. Verification
K(N + 1)
I= (1) We utilize ModelSim to simulate the functionality of the
2
fabric. Both the original design and the eFPGA fabric receive
Next, we acquire the timing constraints (i.e., SDC files) shared input stimuli, and the correct bitstream is loaded into
for the fabric intended for use in the physical design phase. the fabric for evaluation. Subsequently, the output vectors of
Additionally, we obtain the Verilog files for the fabric, along the eFPGA and the original design are compared. When the
with a testbench designed to verify the functionality of the correct bitstream is applied, the output vectors should match,
fabric and the bitstream necessary to program the fabric to ensuring consistency and functionality between the eFPGA
TABLE II: eFPGA parameters used
Parameter Value Description
K 4 Input size of a LUT/FLUT
N [1,9] Number of BLEs per CLB
W Auto Number of routing tracks in a channel
Fraction of the routing tracks that a CLB input
Fcin 0.15
can connect
Fig. 4: Loop breaker cell example
Fraction of the routing tracks that a CLB output
Fcout 0.1
can connect
Number of connections per incoming routing
Fs 3 netlists, Cadence Genus modifies the names of hierarchical
track in a switch block
The length of routing track (number of CLBs instances, which results in the disregard of some initial SDC
L 4 constraints. To address this issue, we instruct Genus to parse
spanned)
an additional SDC file, which contains instructions to disable
timing for specific paths using the flattened names.
and the original design. In Fig. 3, the waveforms depict the Finally, optimization is performed on the flattened netlist,
output vectors. The eFPGA output signals are visualized in where we instruct the tool to attempt optimization on all paths
green, while the original module’s signals are represented in with negative slack, including the critical path. Reports are
yellow. The verification process is repeated for the synthesized then generated to provide details on area, power, and timing.
fabric to verify functionality after synthesis. Additionally, the gate-level netlist Verilog file and a single
SDC file containing all constraints are generated. These two
C. eFPGA Integration and Synthesis
files serve as inputs for the subsequent place & route stage.
We utilize Cadence Genus for synthesizing each redacted Genus does not incorporate the constraints necessary to break
module. The redacted module serves as the top module in the timing loops in the final SDC file. Therefore, we manually
Cadence Genus for synthesis, where we replace the critical modify the file to include these constraints.
Verilog portion with the eFPGA at the RTL level. To map
the Verilog netlist to a gate-level netlist, we employ the
D. Place & Route
standard cell library file available on the Cadence website (i.e.,
slow vdd1v2 basicCells.lib), which is a 45nm technology The place & route stage in IC design is a critical process
library characterized by slower operating speeds or slower where logic gates, flip-flops, and other components undergo
process corners. positioning (i.e., placement) and interconnection (i.e., routing).
First, we read the Verilog and library files and elaborate the We utilize Cadence Innovus for the place & route of the
top module. Then, we analyze all the SDC files generated redacted modules. Cadence Innovus takes inputs including the
by OpenFPGA and command Genus to report timing. At gate-level netlist and the SDC constraint file generated by
this stage, Genus addresses any combinational loops that Cadence Genus, along with technology files such as timing
legally exist within the eFPGA fabric by inserting a buffer library files (.lib) and library exchange format (.lef) files.
from the technology library onto the feedback loop (i.e.,
cdn loop breaker cell). Additionally, it disables the timing arc
from the input to the output, effectively breaking the timing
loop. Fig. 4 shows an example of a loop that is broken. The
buffer highlighted in yellow represents the feedback loop.
We choose to use a flattened netlist to simplify the place
and route stage. However, during the process of flattening the
Fig. 3: eFPGA fabric and original module output vectors Fig. 5: Redacted module layout without metal layers
TABLE III: eFPGA fabric characteristics higher area, power, and delay overheads (33%, 16%, and 27%
eFPGA Block I/O Bitstream Channel respectively) compared to the FLUT-based fabric (30%, 15%,
Fabric Utilization utilization Size Width and 24.5% respectively). Opting for FLUT-based is preferable
when integrating eFPGA for the OWMC IP.
2x2 K4N2 100% 94% 614 18
MUXDC: We test the MUXDC IP using two fabric con-
2x2 K4 frac N1 100% 94% 458 18 figurations: a LUT-based with 6 BLEs/CLB (1x1 K4N6)
1x1 K4N6 100% 100% 440 26 and a FLUT-based with 3 BLEs/CLB using FLUT (1x1
1x1 K4 frac N3 100% 100% 256 18 K4 frac N3). Both fabrics achieve full block and I/O uti-
2x2 K4N4 100% 95% 1160 30 lization. The LUT-based has a larger bitstream size of 440
2x2 K4 frac N3 100% 95% 1059 30 compared to the FLUT-based, which has 256 bits. Routing
1x1 K4N1 100% 100% 66 6 widths differ, with the LUT-based at 26 and the FLUT-based
at 18 tracks per channel. Due to this variance, the area
1x1 K4 frac N1 100% 100% 79 14
difference between the fabrics is notable. The FLUT-based
shows significant improvement in power consumption and a
slightly better critical path delay. The first fabric introduces
The process begins with specifying the floor plan, followed
higher area, power, and delay overheads (192%, 139%, and
by power planning, which involves inserting VDD and VSS
2% respectively) compared to the second fabric (107%, 49%,
rings, and power and ground stripes to connect them to
and 1.6% respectively). For the MUXDC IP, opting for FLUT
standard cells. Standard cells are then placed within the
proves preferable for redaction.
specified floor plan. Additionally, we instruct Innovus to place
OMDC: In addition, the OMDC IP is tested using two
the design with I/O pins, eliminating the need for manual I/O
fabrics: a LUT-based (2x2 K4N4) and a FLUT-based (2x2
file creation. Subsequently, pre-Clock Tree Synthesis (CTS)
K4 frac N3), both achieving full block utilization and 95%
optimization is performed to meet all timing constraints before
I/O utilization. Despite the LUT-based fabric having a larger
CTS. We then perform CTS, followed by post-CTS optimiza-
bitstream size of 1160 compared to the second fabric’s 1059,
tion to further refine the design to meet timing constraints.
the fabric utilizing regular LUTs shows slightly better perfor-
Finally, routing and post-route optimization are carried out to
mance in area, power, and delay, even though both fabrics
complete the process, and timing, area, and power reports are
routed with a channel width of 30. The LUT-based introduces
generated. Fig. 5 shows the layout of the redacted module
an area overhead of 145%, a power overhead of 105%, and
without metal layers. The eFPGA is noticeable in the middle,
a critical path delay overhead of 23%, while the FLUT-based
where it exhibits a slightly different pattern.
introduces an area overhead of 154%, a power overhead of
IV. E XPERIMENTAL R ESULTS 114%, and a critical path delay overhead of 23.6%. For the
OMDC IP, choosing a regular LUT is preferable.
In this section, we evaluate the overhead and security
of REDACTOR. The source codes, along with the created PEDC: Finally, the PEDC IP is evaluated with two fabrics:
eFPGA fabrics, are available on our GitHub repository1 . a basic LUT-based with one CLB and one BLE (1x1 K4N1),
and an identical fabric with FLUT instead (K4 frac N1). Both
A. Overhead Analysis fabrics utilize I/Os and resources fully. The LUT-based fabric
As mentioned earlier, our goal is to find the simplest fabric needs 66 bits for programming, while the FLUT-based one
using the parameters shown in Table II. We achieve this by requires 79. The LUT-based fabric has a minimum channel
increasing or decreasing the number of BLEs in each CLB width of 6, while the FLUT-based one needs a wider channel
and I/O pins in I/O tiles. The characteristics of the eFPGA width of 14 due to its more complex routing. Because the
fabrics utilized in the experiments are summarized in Table original PEDC IP is a very small finite state machine, both
III. Fig. 6 compares the area, power, and delay overheads of the LUT-based and FLUT-based fabrics shows huge area and
LUT-based eFPGA and FLUT-based eFPGA for the redacted power overhead as well as high delay compared with the
modules normalized according to the original designs. original IP. In the case of PEDC IP, choosing the fabric
OWMC: We redact the OWMC IP using two fabrics: The with regular LUTs is preferable. However, it is not efficient
first one is a LUT-based fabric with 2 BLEs/CLB (2x2 K4N2). to replace a small IP with eFPGA due to the overheads
The second one is a FLUT-based fabric with 1 BLE/CLB (2x2 introduced.
K4 frac N1). Despite the expected larger bitstream for the B. Security Analysis
FLUT-based fabric, it actually has a smaller size because of
We conduct a separate synthesis of the eFPGA fabric, this
fewer BLEs/CLB. Both fabrics fully utilize blocks and I/O
time constraining Genus to utilize only basic logic cells. Using
but have different bitstream sizes: 614 for the LUT-based
Python, we convert the Verilog gate-level netlist from Genus
and 458 for the FLUT-based. Despite FLUT’s anticipated
to a .bench file. To establish an oracle, we program the
impact on channel width, it remains the same at 18 for both
fabric with the bitstream generated by OpenFPGA, assigning
fabrics due to fewer BLEs. The LUT-based fabric introduces
the outputs of the DFFs found in the scan chain with their
1 https://github.com/cars-lab-repo/REDACTOR corresponding bits. For creating a key-controlled netlist, we
(a) Redacted OWMC (b) Redacted MUXDC (c) Redacted OMDC
Fig. 6: Normalized area, power, and delay overhead of LUT-based vs. FLUT-based eFPGAs redacted modules
TABLE IV: Security results While using FLUT-based eFPGAs can complicate routing,
Fabric Unroll # Clauses Time (s) Key Reported? they offer an advantage by reducing the number of BLEs
or CLBs, which generally translates to lower power, delay,
2x2 K4N2 59 1610318 99 yes
and area overhead. As demonstrated by the experiments, the
2x2 K4 frac N1 30 559536 7 yes
choice between LUT or FLUT depends on the specific IP to be
1x1 K4N6 36 875500 51 yes redacted, and no general rule can be established. Additionally,
1x1 K4 frac N3 2 9291 0.26 yes we observed that FLUT-based eFPGAs have lower unroll
2x2 K4N4 112 N/A Time out no factors, making it easier for attackers to extract the bitstream.
2x2 K4 frac N3 59 3299375 81 yes Conversely, when we redacted the OWMC IP with a regular
1x1 K4N1 5 4189 0.1 yes LUT-based eFPGA, the attack timed out, and we were able to
1x1 K4 frac N1 2 2384 0.08 yes
maintain reasonable overhead while enhancing security.
For future research, systematically increasing the unrolling
factor or adding non-unrollable cycles to eFPGA fabrics can
expose the outputs of the DFFs in the scan chain as key be pursued to maintain low overhead while improving security.
inputs. Then, we utilized the break and unroll attack found ACKNOWLEDGMENT
in [11], which is then fed to NEOS [49] tool for extracting
This material is based upon work supported by the National
the bitstream.
Science Foundation under Award No. 2245247.
The security results are presented in Table IV. We were able
to extract the bitstream of all eFPGAs that used FLUTs in a R EFERENCES
shorter time compared to eFPGAs that used regular LUTs. [1] Sparsh Mittal, Himanshi Gupta, and Srishti Srivastava. A survey on
This is because the LUT-based fabrics have a higher unroll hardware security of dnn models and accelerators. Journal of Systems
Architecture, 117:102163, 2021.
factor. For instance, our 2x2 K4N4 fabric timed out after 6 [2] Jeyavijayan Rajendran, Youngok Pino, Ozgur Sinanoglu, and Ramesh
hours of running the attack, which has an unroll factor of 112. Karri. Security analysis of logic obfuscation. In Design Automation
To maintain the reduced overheads of FLUT-based eFPGA Conference (DAC), pages 83–89, 2012.
[3] Amin Rezaei and Hai Zhou. Sequential logic encryption against model
fabrics, we propose two methods for future research: checking attack. In Design Automation & Test in Europe Conference &
• Introducing more cycles within the eFPGA fabric, thereby Exhibition (DATE), pages 1178–1181, 2021.
increasing the unroll factor to the point where the attack [4] Jordan Maynard and Amin Rezaei. Dk lock: Dual key logic locking
against oracle-guided attacks. In International Symposium on Quality
times out. Electronic Design (ISQED), pages 1–7, 2023.
• Introducing non-unrollable cycles within the eFPGA fab- [5] Raheel Afsharmazayejani, Hossein Sayadi, and Amin Rezaei. Dis-
ric, which consist of oscillating and stateful cycles [11] tributed logic encryption: Essential security requirements and low-
overhead implementation. In Proceedings of Great Lakes Symposium
interlaced in a manner that ensures at least one cycle on VLSI (GLSVLSI), pages 127–131, 2022.
remains unbroken during a cyclic attack, causing it to [6] Yeganeh Aghamohammadi and Amin Rezaei. Cola: Convolutional
fail by entering an infinite loop. neural network model for secure low overhead logic locking assignment.
In Great Lakes Symposium on VLSI 2023 (GLSVLSI), pages 339–344,
V. C ONCLUSION 2023.
[7] Kevin Lopez and Amin Rezaei. K-gate lock: Multi-key logic locking
In this paper, we explored the importance of securing DNN using input encoding against oracle-guided attacks. In 30th Asia and
accelerators and proposed an approach for redacting critical South Pacific Design Automation Conference (ASP-DAC), 2025.
[8] Pramod Subramanyan, Sayak Ray, and Sharad Malik. Evaluating
IPs with eFPGAs, from specification to physical design. the security of logic encryption algorithms. In IEEE International
Specifically, we focused on evaluating the impact of eFPGA Symposium on Hardware Oriented Security and Trust (HOST), pages
fabrics with high I/O and block utilization and assessed the 137–143, 2015.
[9] Hai Zhou, Ruifeng Jiang, and Shuyu Kong. Cycsat: Sat-based attack
integration of these nearly fully utilized fabrics with regular on cyclic logic encryptions. In IEEE/ACM International Conference on
LUTs and FLUTs. Computer-Aided Design (ICCAD), pages 49–56, 2017.
[10] Kaveh Shamsi, David Z. Pan, and Yier Jin. Icysat: Improved sat-based [28] Vaughn Betz and Jonathan Rose. Cluster-based logic blocks for fpgas:
attacks on cyclic locked circuits. In 2019 IEEE/ACM International area-efficiency vs. input sharing and size. In Proceedings of Custom
Conference on Computer-Aided Design (ICCAD), pages 1–7, 2019. Integrated Circuits Conference (CICC), pages 551–554, 1997.
[11] Amin Rezaei, Raheel Afsharmazayejani, and Jordan Maynard. Evaluat- [29] Yannan Liu, Lingxiao Wei, Bo Luo, and Qiang Xu. Fault injection attack
ing the security of efpga-based redaction algorithms. In 2022 IEEE/ACM on deep neural network. In Proceedings of International Conference on
International Conference On Computer Aided Design (ICCAD), pages Computer-Aided Design (ICCAD), page 131–138. IEEE Press, 2017.
1–7, 2022. [30] Yu Li, Yannan Liu, Min Li, Ye Tian, Bo Luo, and Qiang Xu. D2nn:
[12] Yuanqi Shen, You Li, Shuyu Kong, Amin Rezaei, and Hai Zhou. A fine-grained dual modular redundancy framework for deep neural
Sigattack: New high-level sat-based attack on logic encryptions. In networks. In Proceedings of Annual Computer Security Applications
Design, Automation & Test in Europe Conference & Exhibition (DATE), Conference (ACSAC), page 138–147, 2019.
pages 940–943, 2019. [31] Abhishek Chakraborty, Ankit Mondai, and Ankur Srivastava. Hardware-
[13] Amin Rezaei, Ava Hedayatipour, Hossein Sayadi, Mehrdad Aliasgari, assisted intellectual property protection of deep learning models. In
and Hai Zhou. Global attack and remedy on ic-specific logic encryption. ACM/IEEE Design Automation Conference (DAC), pages 1–6, 2020.
In IEEE International Symposium on Hardware Oriented Security and [32] Jingbo Zhou and Xinmiao Zhang. Joint protection scheme for deep
Trust (HOST), pages 145–148, 2022. neural network hardware accelerators and models. IEEE Transac-
[14] Mustafa M. Shihab, Jingxiang Tian, Gaurav Rajavendra Reddy, Bo Hu, tions on Computer-Aided Design of Integrated Circuits and Systems,
William Swartz, Benjamin Carrion Schaefer, Carl Sechen, and Yiorgos 42(12):4518–4527, 2023.
Makris. Design obfuscation through selective post-fabrication transistor- [33] Jarrod A. Roy, Farinaz Koushanfar, and Igor L. Markov. Ending piracy
level programming. In Design, Automation & Test in Europe Conference of integrated circuits. Computer, 43(10):30–38, 2010.
& Exhibition (DATE), pages 528–533, 2019. [34] Jeyavijayan Rajendran, Huan Zhang, Cheng Zhang, Garrett S. Rose,
[15] Jianqi Chen and Benjamin Carrion Schafer. Area efficient functional Yier Pino, Ozgur Sinanoglu, et al. Fault analysis-based logic encryption.
locking through coarse grained runtime reconfigurable architectures. IEEE Transactions on Computers, pages 410–424, 2013.
In Asia and South Pacific Design Automation Conference (ASP-DAC), [35] Yang Xie and Ankur Srivastava. Anti-sat: Mitigating sat attack on logic
pages 542–547, 2021. locking. IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, 38(2):199–207, 2019.
[16] Jitendra Bhandari, Abdul Khader Thalakkattu Moosa, Benjamin Tan,
[36] Muhammad Yasin, Bodhisatwa Mazumdar, Jeyavijayan J V Rajendran,
Christian Pilato, Ganesh Gore, Xifan Tang, Scott Temple, Pierre-
and Ozgur Sinanoglu. Sarlock: Sat attack resistant logic locking. In
Emmanuel Gaillardon, and Ramesh Karri. Exploring efpga-based
2016 IEEE International Symposium on Hardware Oriented Security
redaction for ip protection. In IEEE/ACM International Conference On
and Trust (HOST), pages 236–241, 2016.
Computer Aided Design (ICCAD), pages 1–9, 2021.
[37] Muhammad Yasin, Ozgur Sinanoglu, and Ramesh Karri. Provably-
[17] Prashanth Mohan, Oguz Atli, Joseph Sweeney, Onur Kibar, Larry
secure logic locking: From theory to practice. In Proceedings of the 2017
Pileggi, and Ken Mai. Hardware redaction via designer-directed fine-
ACM SIGSAC Conference on Computer and Communications Security,
grained efpga insertion. In Design, Automation & Test in Europe
pages 1601–1618. ACM, 2017.
Conference & Exhibition (DATE), pages 1186–1191, 2021.
[38] Hai Zhou, Yuanqi Shen, and Amin Rezaei. Vulnerability and remedy
[18] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei- of stripped function logic locking. Cryptology ePrint Archive, Paper
Fei. Imagenet: A large-scale hierarchical image database. In IEEE 2019/139, 2019.
Conference on Computer Vision and Pattern Recognition (CVPR), pages [39] Yeganeh Aghamohammadi and Amin Rezaei. Machine learning-based
248–255, 2009. security evaluation and overhead analysis of logic locking. Journal of
[19] Liang Lu and Steve Renals. Small-footprint highway deep neural Hardware and Systems Security, 8:25–43, 2024.
networks for speech recognition. IEEE/ACM Transactions on Audio, [40] Amin Rezaei, Yuanqi Shen, Shuyu Kong, Jie Gu, and Hai Zhou.
Speech, and Language Processing, 25(7):1502–1511, 2017. Cyclic locking and memristor-based obfuscation against cycsat and
[20] Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao inside foundry attacks. In 2018 Design, Automation & Test in Europe
Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. Shidiannao: Conference & Exhibition (DATE), pages 85–90, 2018.
Shifting vision processing closer to the sensor. In ACM/IEEE 42nd [41] Yosys open synthesis suite. https://yosyshq.net/yosys/, 2013.
Annual International Symposium on Computer Architecture (ISCA), page [42] Jitendra Bhandari, Abdul Khader Thalakkattu Moosa, Benjamin Tan,
92–104, 2015. Christian Pilato, Ganesh Gore, Xifan Tang, Scott Temple, Pierre-
[21] Daofu Liu, Tianshi Chen, Shaoli Liu, Jinhong Zhou, Shengyuan Zhou, Emmanuel Gaillardon, and Ramesh Karri. Not all fabrics are created
Olivier Teman, Xiaobing Feng, Xuehai Zhou, and Yunji Chen. Pu- equal: Exploring efpga parameters for ip redaction. IEEE Transactions
diannao: A polyvalent machine learning accelerator. In Proceedings on Very Large Scale Integration (VLSI) Systems, 31(10):1459–1471,
International Conference on Architectural Support for Programming 2023.
Languages and Operating Systems (ASPLOS), page 369–381, 2015. [43] Cnn-accelerator. https://github.com/8krisv/CNN-ACCELERATOR/,
[22] Norman Jouppi, Cliff Young, Nishant Patil, and David Patterson. Moti- 2021.
vation for and evaluation of the first tensor processing unit. IEEE Micro, [44] Jihyuck Jo, Suchang Kim, and In-Cheol Park. Energy-efficient convo-
38(3):10–19, 2018. lution architecture based on eescheduled dataflow. IEEE Transactions
[23] Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, on Circuits and Systems I: Regular Papers, 65(12):4196–4207, 2018.
Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keck- [45] Xifan Tang, Edouard Giacomin, Baudouin Chauviere, Aurelien Alacchi,
ler, and William J. Dally. Scnn: An accelerator for compressed-sparse and Pierre-Emmanuel Gaillardon. Openfpga: An open-source framework
convolutional neural networks. In ACM/IEEE Annual International for agile prototyping customizable fpgas. IEEE Micro, 40(4):41–48,
Symposium on Computer Architecture (ISCA), pages 27–40, 2017. 2020.
[24] Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Na- [46] Xifan Tang, Edouard Giacomin, Giovanni De Micheli, and Pierre-
talie Enright Jerger, and Andreas Moshovos. Cnvlutin: Ineffectual- Emmanuel Gaillardon. Fpga-spice: A simulation-based architecture
neuron-free deep neural network computing. In Proceedings of Interna- evaluation framework for fpgas. IEEE Transactions on Very Large Scale
tional Symposium on Computer Architecture (ISCA), page 1–13. IEEE Integration (VLSI) Systems, 27(3):637–650, 2019.
Press, 2016. [47] Jason Luu, Jason Helge Anderson, and Jonathan Scott Rose. Architec-
[25] Yu-Hsin Chen, Joel Emer, and Vivienne Sze. Eyeriss: A spatial architec- ture description and packing for logic blocks with hierarchy, modes
ture for energy-efficient dataflow for convolutional neural networks. In and complex interconnect. In Proceedings of ACM/SIGDA Interna-
ACM/IEEE Annual International Symposium on Computer Architecture tional Symposium on Field Programmable Gate Arrays (ISFPGA), page
(ISCA), pages 367–379, 2016. 227–236, 2011.
[26] Satwant Singh, Jonathan Rose, Paul Chow, and David Lewis. The effect [48] Elias Ahmed and Jonathan Rose. The effect of lut and cluster size on
of logic block architecture on fpga performance. IEEE Journal of Solid- deep-submicron fpga performance and density. IEEE Transactions on
State Circuits, 27(3):281–287, 1992. Very Large Scale Integration (VLSI) Systems, 12(3):288–298, 2004.
[27] David Dickin and Lesley Shannon. Exploring fpga technology mapping [49] Kaveh Shamsi. Bitbucket — bitbucket.org. https://bitbucket.org/
for fracturable lut minimization. In International Conference on Field- kavehshm/neos/src/master/.
Programmable Technology (FPT), pages 1–8, 2011.