Securing Network-on-Chip Using Incremental Cryptography
Securing Network-on-Chip Using Incremental Cryptography
Abstract—Network-on-chip (NoC) has become the standard Threat Model: Figure 1 shows a typical NoC-based
communication fabric for on-chip components in modern many-core architecture which encrypts packets transferred
System-on-chip (SoC) designs. Since NoC has visibility to all between IP cores. When packets are sent through the NoC,
communications in the SoC, it has been one of the primary
targets for security attacks. While packet encryption can a router infected by a hardware Trojan can copy or re-route
provide secure communication, it can introduce unaccept- packets and send to a malicious IP sitting on the same
able energy and performance overhead due to the resource- NoC to leak sensitive information. Therefore, our model
constrained nature of SoC designs. In this paper, we propose assumes that some of the IPs, as well as the routers, can
a lightweight encryption scheme that is implemented on the be malicious. The IPs that we can trust to be non-malicious
network interface. Our approach improves the performance of
encryption without compromising security using incremental are referred to as secure IPs. The goal is to ensure secure
cryptography, which exploits the unique NoC traffic charac- communication between these secure IPs. We assume that
teristics. Experimental results demonstrate that our proposed network interfaces (NI) that connect IPs with routers are
approach significantly (up to 57%, 30% on average) reduces secure. This assumption is valid since the NIs are used to
the encryption time compared to traditional approaches with integrate components of an SoC and are typically built in
negligible (less than 2%) impact on area overhead.
house. A similar threat model and assumptions have been
Keywords-system-on-chip; network-on-chip; security used in previous work on NoC security, proving the validity
of the model [10; 11].
I. I NTRODUCTION
packets are required to be stored for the two different packet B. Incremental Crypto Engine
types (control and data) at the sender’s end. Similarly, the The operation of the incremental crypto engine is outlined
receiver’s side also stores the most recent packet for each in Algorithm 2. The payload (Pi ) sent from the IP core
packet type. In addition to that, the key (K) and initialization is compared with the previous payload of that type (Pi−1 )
vector (IV ) for the encryption scheme are also stored to identify the blocks that are different (Mi ). This can be
by both sender and receiver IPs. Once block differences implemented with a simple XOR operation in hardware (line
are computed, it is then sent to the encryption scheme 1). Once the bitwise differences are obtained, we split the
which encrypts only the different blocks (line 4). The final payload into blocks (line 2) to see which blocks are different
ciphertext is derived from the encrypted blocks and block (lines 3-6). Only different blocks are sent for encryption.
comparison results (line 5). Additional header bits are also The incremental crypto engine also sends the different block
computed in this step to be used by the decryption process. numbers (δi ) to build the complete ciphertext as well as to
Finally, the header and encrypted payload are concatenated set the header bits indicating the different blocks to be used
to create the final packet and injected into the network by the decryption algorithm.
(line 6). At the destination node, the inverse process takes
place. It also stores the previous packet for each packet Algorithm 2 - Finding Block-wise Packet Differences
type, and therefore, can construct the next packet using the Inputs: current payload Pi , previous payload Pi−1
stored packet and the incoming packet data. Since we store Output: different blocks Mi , different block indices δi
the previous packets in special registers, we don’t have to Procedure: compareBlocks
encrypt/decrypt the full packet. We send only the changed 1: bitDif f ← Pi ⊕ Pi−1
blocks and the receiver replaces the changed blocks with its 2: B[1], ..., B[k] ← split(bitDif f, blockSize)
modifications to construct the new packet. 3: for all x = 1, ..., size(B) do
4: if B[x] > 0 then
Algorithm 1 - Encryption Process
5: Mi .append(B[x])
Inputs: current packet packeti , previous payload Pi−1 , key 6: δi [x] = 1
K, initialization vector IV 7: return Mi , δi
Output: encrypted packet consisting of header Hi and en-
crypted payload Ci
Procedure: encryptP ackets C. Encryption Scheme
1: Pi ← packeti .payload We use the counter mode for encryption which uses
2: Hi ← packeti .header an initialization vector (IV), a key and the message to
3: Mi , δi ← compareBlocks(Pi , Pi−1 ) be encrypted as inputs and produces the ciphertext. The
4: C 0 ← E(IV, K, Mi ) IV k {q}d string, which is the standard format of the input
5: Ci ← constructCipherT ext(C 0 , δi ) nonce to counter mode, is used to give per message and
6: return Hi k Ci per block variability. In our framework, it is calculated
using the sequence number of the packet (let seqj be the
The remainder of this section elaborates the major compo- sequence number of packet Pj ), a counter, and the IV as
nents of our NoC security framework. Section IV-B explains IV k seqj k q to identify different blocks. The block cipher
the compareBlocks function which is implemented in the ID (q ∈ {1, 2, 3, 4}) changes with each block cipher and
incremental crypto engine. Section IV-C presents our en- the sequence number seqj varies from packet to packet. As
cryption scheme E and constructCipherT ext function in discussed before, the performance improvement is gained
Algorithm 3 and Algorithm 4, respectively. by encrypting multiple blocks in parallel. For example, if
two consecutive control packets have differences in two were generated by the cycle-accurate full-system simulator
blocks each, we can achieve twice the speedup by encrypting - gem5 [27]. The 4x4 Mesh NoC was built on top of
both at the same time compared to the traditional (non- “GARNET2.0” model that is integrated with gem5 [31].
incremental) approach where all four block ciphers will be We modified the network interface (NI) to simulate the
used to encrypt each packet. Algorithm 3 shows the major proposed security framework. We selected the following
steps of the encryption scheme. options to simulate architectural choices in a resource-
constrained NoC.
Algorithm 3 - Encrypt Selected Blocks
Packet format: For control and data packet formats,
Inputs: initialization vector IV , key K, we used the default GARNET2.0 implementations which
different blocks Mi allocates 128 bits for a flit. This value results in control
Output: encrypted blocks C 0 messages fitting in 1 flit, and data packets, in 5 flits. Out of
Procedure: E the 128 bits, 64 bits are allocated for the payload (address)
1: for all q = 1, ..., 4 do in a control packet and data packets have a payload of 576
2: seqj ← getSequenceN umber(Pj ) bits (64-bit address and 512-bit data). This motivated the use
3: rq ← EK (IV k seqj k q) of 16-bit blocks to evaluate the performance of our proposed
4: C 0 .append(rq ⊕ Mi [q]) incremental encryption scheme.
5: return C 0 Block cipher: We use an ultra-lightweight block cipher
- Hummingbird-2 as the block cipher of our encryption
C 0 is stored in a buffer. The final ciphertext is constructed scheme [25]. Hummingbird-2 was chosen in our experiments
using δi and C 0 as shown in Algorithm 4. Algorithm 4 takes mainly because it is lightweight and also, with the block
the encrypted value from the buffer for the changed blocks size being 16, other encryption schemes can be broken using
(lines 2-3) and appends n (block size) zeros to identical brute-force attacks in such small block sizes. However, it has
blocks compared to the previous packet (lines 4-5). It ensures been shown in [25] that Hummingbird-2 is resilient against
the construction of the same packet size, and as a result, attacks that try to recover the plaintext from ciphertext. It
every other functionality from fliticization to NoC traversal uses a 128-bit key and a 128-bit internal state which provides
remains the same. adequate security for on-chip communication. Considering
Algorithm 4 - Construct the Encrypted Payload the payload and block sizes, we used four block ciphers
in counter mode for our encryption scheme. Each block
Inputs: encrypted blocks C 0 , different block indices δi
cipher is assumed to take 20 cycles to encrypt a 16-bit
Output: Encrypted payload Ci
block and each comparison of two-bit strings incurs a 1-
Procedure: constructCipherT ext
cycle delay [25]. Our framework is flexible to accommodate
1: for all x = 1, ..., size(δ) do
different packet formats, packet sizes and block ciphers
2: if δi [x] > 0 then depending on the design requirements. For example, if a
3: Ci .append(C 0 [x]) certain architecture requires 128-bit blocks, AES can be used
4: else while keeping our incremental encryption approach intact.
5: Ci .append({0}n )
6: return Ci B. Performance Evaluation
We present the performance improvement achieved by
To ensure the secure implementation of our approach, the
our approach in two steps: (i) time taken for encryption
generation and management of keys and nonces needs to be
(Figure 7) and (ii) execution time (Figure 8). We measured
addressed. However, this is beyond the scope of this paper
the cycles spent for encryption alone (encryption time) and
and many previous studies have addressed this problem in
total cycles executed to run the benchmark (execution time)
several ways [29; 30].
including encryption time, using our approach as well as
V. E XPERIMENTS traditional encryption. Figure 7 shows the encryption time
In this section, we first describe the experimental setup comparison. Our approach improves the performance of
used to evaluate our approach. Then, results are presented encryption by 57% (30% on average) compared to the
to show the performance gain achieved through incremental traditional encryption schemes. The locality in data and the
encryption by comparing it with traditional encryption. Next, differences in operand values affect the number of changed
we discuss the security of the proposed framework and blocks between consecutive packets. This is reflected in the
associated overhead. encryption time. For example, if an application is doing an
image processing operation on an image stored in memory,
A. Experimental Setup accessing pixel data stored in consecutive memory locations
We validated our framework using five benchmarks cho- provides an opportunity for performance gain using our
sen from the SPLASH-2 benchmark suite. Traffic traces approach.
the design, nor access to known plaintext/ciphertext pairs. In
other words, as long as the block cipher and operation mode
is secure, incremental encryption doesn’t allow recovering
of plaintext from the ciphertext. The same argument has
been proven to hold true in previous work on incremental
encryption [19; 32].
Figure 7: Encryption time comparison using traditional Counter mode encryption: Using our approach, each
encryption and incremental encryption (our approach). block is treated independently while encrypting, and blocks
belonging to multiple packets can be encrypted in parallel.
In such a setup, using the same IV k {q}d string with the
We also compare the total execution time using traditional same key K can cause the “two time pad” situation. This
encryption as well as incremental encryption. Figure 8 is solved by setting the string to IV k seqj k q as shown in
presents these results. When the overall system including Algorithm 3. It gives per message and per block variability
CPU cycles, memory load/store delays and delays traversing and ensures that the value is a nonce. Our proposed usage
the NoC is considered, the total execution time improves of counter mode adheres to the security recommendations
upto 10% (5% on average). Benchmarks that have significant outlined in [28].
NoC traversals such as RADIX and OCEAN show higher Block cipher: As discussed above, the security of the
performance improvement (10%). proposed framework depends on the security of the block ci-
pher. The security of the block cipher used in our framework,
Hummingbird-2, has been discussed extensively in [25]. The
first version of the Hummingbird scheme was shown to be
insecure [33] and Hummingbird-2 was developed to address
the security flaws. After thousands of hours of cryptanal-
ysis, no significant flaws or sub-exhaustive attacks against
Hummingbird-2 have been found [25]. Hummingbird-2 ap-
Figure 8: Execution time comparison using traditional en- proach has been shown to be resilient against birthday
cryption and incremental encryption (our approach). attacks on the initialization, differential cryptanalisys, linear
cryptanalisys and algebraic attacks. Zhang et al. presented
a related-key chosen-IV attack against Hummingbird-2 that
C. Security Analysis recovered the 128-bit secret key [34]. However, the attack
When discussing the security of our approach, three requires 228 pairs of plaintext to recover the first 4 bits of
main components have to be considered: (i) incremental the key adding up to a data complexity of O(232.6 ) [34]. As
encryption, (ii) encryption scheme that uses counter mode, discussed before, launching such chosen plaintext attacks is
and (iii) block cipher. not possible in the NoC setting. A brute force key recovery
Incremental encryption: Due to the inherent character- takes 2128 attempts which is not computationally feasible
istics of incremental encryption, our approach reveals the according to modern computing standards as well as for
amount of differences between consecutive packets. Studies computing power in the foreseeable future.
on incremental encryption have shown that even though Our proposed approach allows easy plug-and-play of
hiding the amount of differences is not possible, it is possible security primitives. Any block size/key size/block cipher
to hide “everything else” by using secure block ciphers can be combined with our proposed incremental encryption
and secure operation modes [19]. Attacks on incremental approach. Note that stronger security comes at the expense
encryption using this vulnerability relies on the adversary of performance. Therefore, security parameters can be de-
having many capabilities in addition to the ones defined cided depending on the desired security and performance
in the threat model. When using incremental encryption to requirements.
encrypt documents undergoing frequent, small modifications
as explained in Section II, it is reasonable to assume that D. Overhead Analysis
the adversary not only has availability to the previously We implemented our proposed incremental encryption
encrypted versions of documents but is also able to modify approach using Verilog to show the area overhead in com-
documents and obtain encrypted versions of the modified parison with the original Hummingbird-2 implementation.
ones. This attack model allows the adversary to launch Our implementation is capable of assigning blocks to idle
chosen plaintext attacks [19]. Discussing security of our block ciphers and encrypting up to four payloads in parallel.
approach for known plaintext, chosen plaintext and chosen Merger and scheduler units were implemented to ensure
ciphertext attacks are irrelevant in our design since the the correctness of final encrypted/decrypted payloads. We
adversary doesn’t have access to an oracle that implements conducted our experiments using the Synopsys Design Com-
piler with 90nm Synopsis library (saed90nm). Based on [8] P. Mishra, S. Bhunia, and M. Tehranipoor, Hardware IP
our results, our proposed approach introduces less than 2% security and trust. Springer, 2017.
overall area overhead with respect to the entire NoC. When [9] S. Charles et al., “Lightweight anonymous routing in noc
based socs,” in DATE, 2020.
only the encryption unit is considered, the overhead is 15%.
[10] D. M. Ancajas et al., “Fort-NOCs: Mitigating the threat of a
This overhead is caused due to components responsible compromised NoC,” in DAC, 2014.
for buffering and scheduling of modified blocks to idle [11] J. Sepúlveda et al., “Towards Protected MPSoC Communi-
block cipher units as well as computations related to the cation for Information Protection against a Malicious NoC,”
construction of the final result. Therefore, our proposed Procedia computer science, 2017.
encryption approach has a negligible area overhead and it [12] J. Winter, “Trusted computing building blocks for embedded
can be efficiently implemented as a lightweight security linux-based arm trustzone platforms,” in STC, 2008.
[13] S. Charles and P. Mishra, “Lightweight and trust-aware rout-
mechanism for NoCs. While there is a minor increase in ing in noc based socs,” ISVLSI, 2020.
power overhead due to the additional components, there is no [14] “Using TinyCrypt Library, Intel Developer Zone, Intel, 2016.”
penalty on overall energy consumption due to the reduction https://software.intel.com/en-us/node/734330, [Online].
in execution time. [15] S. Charles et al., “Real-time detection and localization of dos
attacks in noc based socs,” in DATE, 2019.
VI. C ONCLUSIONS [16] S. Charles et al., “Real-time detection and localization of
In this paper, we proposed a lightweight security mecha- distributed dos attacks in noc based socs,” TCAD, 2020.
nism that improves the performance of traditional encryption [17] Y. Huang et al., “Scalable test generation for trojan detection
using side channel analysis,” TIFS, vol. 13, no. 11, pp. 2746–
schemes used in NoC while incurring negligible area and
2760, 2018.
power overhead. The security framework consists of an [18] Y. Lyu and P. Mishra, “A survey of side-channel attacks
encryption/decryption scheme that provides secure commu- on caches and countermeasures,” Journal of Hardware and
nication on the NoC. We used incremental encryption to Systems Security, vol. 2, no. 1, pp. 33–50, 2018.
improve performance by utilizing the unique traffic charac- [19] M. Bellare et al., “Incremental cryptography and application
teristics of packets observed in an NoC. We validated our to virus protection,” in STOC, 1995.
framework in terms of security to prove that the performance [20] S. Garg and O. Pandey, “Incremental program obfuscation,”
in CRYPTO, 2017.
gain is not achieved at the expense of security. Experimental
[21] L. Fiorin et al., “A security monitoring service for NoCs,” in
results show a performance improvement of up to 57% CODES+ISSS, 2008.
(30% on average) in encryption time and up to 10% (5% [22] M. Bellare et al., “Incremental cryptography: The case of
on average) in total execution time compared to traditional hashing and signing,” in CRYPTO, 1994.
encryption while introducing less than 2% overall area over- [23] K. Sajeesh and H. Kapoor, “An authenticated encryption
head. In the future, we plan to explore the development of based security framework for NoC architectures,” in ISED,
2011.
an incremental authentication scheme that can be seamlessly
[24] E. R. Naru et al., “A recent review on lightweight cryptogra-
integrated with the incremental encryption scheme to ensure phy in iot,” in I-SMAC, 2017.
data integrity. [25] D. Engels et al., “The Hummingbird-2 lightweight authenti-
cated encryption algorithm,” in RFIDSec. Springer, 2011.
ACKNOWLEDGMENT
[26] W. Itani et al., “Energy-efficient incremental integrity for
This work was partially supported by the National Science securing storage in mobile cloud computing,” in ICEAC,
Foundation (NSF) grant SaTC-1936040. 2010.
[27] N. Binkert et al., “The gem5 simulator,” SIGARCH Computer
R EFERENCES Architecture News, 2011.
[1] S. Charles et al., “Proactive thermal management using [28] D. A. McGrew, “Counter mode security: Analysis and rec-
memory-based computing in multicore architectures,” in ommendations,” Cisco Systems, November, 2002.
IGSC, 2018. [29] B. Lebiednik et al., “Architecting a secure wireless network-
[2] U. Gupta et al., “Dypo: Dynamic pareto-optimal configuration on-chip,” NOCS, 2018.
selection for heterogeneous mpsocs,” TECS, vol. 16, no. 5s, [30] J. Sepulveda et al., “Efficient security zones implementation
pp. 1–20, 2017. through hierarchical group key management at noc-based
[3] S. Charles et al., “Exploration of memory and cluster modes mpsocs,” Microprocessors and Microsystems, 2017.
in directory-based many-core cmps,” in NOCS, 2018. [31] N. Agarwal et al., “Garnet: A detailed on-chip network model
[4] A. Sodani et al., “Knights landing: Second-generation intel inside a full-system simulator,” in ISPASS, 2009.
xeon phi product,” IEEE MICRO, 2016. [32] I. Mironov et al., “Incremental deterministic public-key en-
[5] S. Charles et al., “Efficient cache reconfiguration using cryption,” in EUROCRYPT. Springer-Verlag, 2012.
machine learning in noc-based many-core cmps,” TODAES, [33] M. J. O. Saarinen, “Cryptanalysis of hummingbird-1,” in FSE.
vol. 24, no. 6, pp. 1–23, 2019. Springer, 2011.
[6] J.-P. Diguet et al., “NOC-centric security of reconfigurable [34] K. Zhang, L. Ding, and J. Guan, “Cryptanalysis of
SoC,” in NOCS, 2007. hummingbird-2,” Cryptology ePrint Archive, Tech. Rep.,
[7] F. Farahmandi, Y. Huang, and P. Mishra, System-on-Chip 2012.
Security: Validation and Verification. Springer Nature, 2019.