2017 Book FundamentalsOfIPAndSoCSecurity
2017 Book FundamentalsOfIPAndSoCSecurity
Susmita Sur-Kolay Editors
Fundamentals
of IP and SoC
Security
Design, Verification, and Debug
Fundamentals of IP and SoC Security
Swarup Bhunia ⋅ Sandip Ray
Susmita Sur-Kolay
Editors
123
Editors
Swarup Bhunia Susmita Sur-Kolay
Department of Electrical and Computer Advanced Computing and Microelectronics
Engineering Unit
University of Florida Indian Statistical Institute
Gainesville, FL Kolkata
USA India
Sandip Ray
NXP Semiconductors
Austin, TX
USA
v
vi Contents
1.1 Introduction
It has been almost a decade since the number of smart, connected computing devices
has exceeded the human population, ushering in the regime of the Internet of
things [1]. Today, we live in an environment containing tens of billions of computing
devices of wide variety and form factors, performing a range of applications often
including some of our most private and intimate data. These devices include smart-
phones, tablets, consumer items (e.g., refrigerators, light bulbs, and thermostats),
wearables, etc. The trend is toward this proliferation to increase exponentially in the
coming decades, with estimates going to trillions of devices as early as by 2030,
signifying the fastest growth by a large measure across any industrial sector in the
history of the human civilization.
Security and trustworthiness of computing systems constitute a critical and gating
factor to the realization of this new regime. With computing devices being employed
for a large number of highly personalized activities (e.g., shopping, banking, fit-
ness tracking, providing driving directions, etc.), these devices have access to a large
amount of sensitive, personal information which must be protected from unautho-
rized or malicious access. On the other hand, communication of this information to
other peer devices, gateways, and datacenters is in fact crucial to providing the kind
of adaptive, “smart” behavior that the user expects from the device. For example,
S. Ray (✉)
Strategic CAD Labs, Intel Corporation, Hillsboro, OR 97124, USA
e-mail: [email protected]
S. Sur-Kolay
Advanced Computing and Microelectronics Unit, Indian Statistical Institute,
Kolkata 700108, India
e-mail: [email protected]
S. Bhunia
Department of ECE, University of Florida, Gainesville, FL 32611, USA
e-mail: [email protected]fl.edu
© Springer International Publishing AG 2017 1
S. Bhunia et al. (eds.), Fundamentals of IP and SoC Security,
DOI 10.1007/978-3-319-50057-7_1
2 S. Ray et al.
a smart fitness tracker must detect from its sensory data (e.g., pulse rate, location,
speed, etc.) the kind of activity being performed, the terrain on which the activity is
performed, and even the motivation for the activity in order to provide anticipated
feedback and response to the user; this requires a high degree of data processing and
analysis much of which is performed by datacenters or even gateways with higher
computing power than the tracker device itself. The communication and process-
ing of one’s intimate personal information by the network and the cloud exposes
the risk that it may be compromised by some malicious agent along the way. In
addition to personalized information, computing devices contain highly confidential
collateral from architecture, design, and manufacturing, such as cryptographic and
digital rights management (DRM) keys, programmable fuses, on-chip debug instru-
mentation, defeature bits, etc. Malicious or unauthorized access to secure assets in
a computing device can result in identity thefts, leakage of company trade secrets,
even loss of human life. Consequently, a crucial component of a modern computing
system architecture includes authentication mechanisms to protect these assets.
Most computing systems are developed today using the system-on-chip (SoC) design
architecture. An SoC design is architected by a composition of a number of pre-
designed hardware and software blocks, often referred to as design intellectual prop-
erties or design IPs (IPs for short). Figure 1.1 shows a simple toy SoC design, includ-
ing some “obvious” IPs, e.g., CPU, memory controller, DRAM, various controllers
for peripherals, etc. In general, an IP can refer to any design unit that can be viewed
as a standalone sub-component of a complete system. An SoC design architecture
then entails connecting these IPs together to implement the overall system function-
ality. To achieve this connection among IPs, an SoC design includes a network-on-
chip (NoC) that provides a standardized message infrastructure for the IPs to coordi-
nate and cooperate to define the complete system functionality. In industrial practice
today, an SoC design is realized by procuring many third-party IPs. These IPs are
then integrated and connected by the SoC design integration house which is respon-
sible for the final system design. The design includes both hardware components
(written in a hardware description language such as Verilog of VHDL language) as
well as software and firmware components. The hardware design is sent to a foundry
or fabrication house to create the silicon implementation. The fabricated design is
transferred to platform developers or Original Equipment Manufacturers (OEMs),
who create computing platforms such as a smartphone, tablet, or wearable devices,
which are shipped to the end customer.
The description above already points to a key aspect of complexity in SoC design
fabrication, e.g., a complex supply chain and stake holders. This includes various
IP providers, the SoC integration house, foundry, and the OEMs. Furthermore, with
increasing globalization, this supply chain is typically long and globally distributed.
Chapter 2 discusses some ramifications of this infrastructure, e.g., the possibility
1 The Landscape of SoC and IP Security 3
Fig. 1.1 A representative SoC design. SoC designs are created by putting together intellectual
property (IP) blocks of well-defined functionality
A second dimension of challenges with the secure SoC design is in the sheer com-
plexity. Modern computing systems are inordinately complex. Note from Fig. 1.1
that the CPU represents "merely" one of a large number of IPs in an SoC design.
The CPU in a modern SoC design is arguably more complex than many of the high-
performance microprocessors of a decade back. Multiply this complexity increase
with the large number of IPs in the system (many of which include custom microcon-
trollers of commensurate complexity, in addition to custom hardware and firmware),
and one gets some sense of the level of complexity. Add some other cross-design fea-
tures, e.g., power management, performance optimization, multiple voltage islands,
clocking logic, etc., and the complexity perhaps goes beyond imagination. The num-
ber of different design states that such a system can reach exceeds by a long way the
number of atoms in the universe. It is challenging to ensure that such a system ever
functions as desired even under normal operating conditions, much less in the pres-
ence of millions of adversaries looking to identify vulnerabilities for exploitation.
Why is this complexity a bottleneck for security in particular? For starters, secure
assets are sprinkled across the design, in various IPs and their communication
infrastructure. It is difficult to envisage all the different conditions under which these
assets are accessed and insert appropriate protection and mitigation mechanisms to
ensure unauthorized access. Furthermore, security cross-cuts different IPs of the sys-
tem, in some cases breaking the abstraction of IPs as coherent, distinct blocks of
well-defined functionality. Consider an IP communicating with another one through
the communication fabric. Several IPs are involved in this process, including the
source and destination IPs, the routers involved in the communication, etc. Ensuring
the communication is secure would require an understanding of this overall architec-
ture, identifying trusted and untrusted components, analyzing the consequences of a
Trojan in one of the constituent blocks leaking information, and much more. To exac-
erbate the issue, design functionality today is hardly contained entirely in hardware.
Most modern SoC design functionality includes significant firmware and software
components which are concurrently designed together with hardware (potentially by
different players across the supply chain). Consequently, security design and vali-
dation become a complex hardware/software co-design and co-validation problem
distributed across multiple players with potentially untrusted participants. Finally,
the security requirements themselves vary depending on how an IP or even the SoC
design is used in a specific product. For example, the same IP when used in a wear-
able device will have a different security requirement from when it is used as a gam-
ing system. The security requirements also vary depending on the stage of the life
cycle of the product, e.g., when it is with a manufacturer, OEM, or end customer. This
makes it hard to compositionally design security features without a global view.
1 The Landscape of SoC and IP Security 5
There has been significant research in recent years to address the challenges outlined
above. There have been techniques to define security requirements [2, 3], architec-
tures to facilitate such implementation [4–7], testing technologies to define and emu-
late security attacks [8], and tools to validate diverse protection and mitigation strate-
gies [9–12]. There have been cross-cutting research too, on understanding trade-offs
between security and functionality, energy requirements, validation, and architec-
tural constraints [13, 14].
In spite of these advances, the state of the industrial practice is still quite primi-
tive. We still depend on security architects, designers, and validators painstakingly
mapping out various security requirements, architecting and designing various tai-
lored and customized protection mechanisms, and coming up with attack scenarios to
break the system by way of validation. There is a severe lack of disciplined method-
ology for developing security, in the same scale as there is methodology for defining
and refining architectures and micro-architectures for system functionality or perfor-
mance. Unsurprisingly, security vulnerabilities are abound in modern SoC designs,
as evidenced by the frequency and ease in which activities like identity theft, DRM
override, device jailbreaking, etc. are performed.
This book is an attempt to bridge across the research and practice in SoC security.
It is conceived as an authoritative reference on all aspects of security issues in SoC
designs. It discusses research issues and progresses in topics ranging from secu-
rity requirements in SoC designs, definition of architectures and design choices to
enforce and validate security policies, and trade-offs and conflicts involving secu-
rity, functionality, and debug requirements, as well as experience reports from the
trenches in design, implementation, and validation of security-critical embedded sys-
tems.
In addition to providing an extensive reference to the current state-of-the-art,
the book is anticipated to serve as a conduit for communication between different
stake holders of security in SoC designs. Security is one of the unique areas of
SoC designs, which cross-cuts a variety of concerns, including architecture, design,
implementation, validation, and software/hardware interfaces, in many cases with
conflicting requirements from each domain. With a unified material documenting
the various concerns side-by-side, we hope this book will help each stake holder
better understand and appreciate the others points of view and ultimately foster an
overall understanding of the trade-offs necessary to achieve truly secure systems.
The book includes eleven -chapters focusing on diverse aspects of system-level
security in modern SoC designs. The book is intended for researchers, students,
6 S. Ray et al.
become a topic of significant research over the last decade, with various novel PUF-
based authentication protocols emerging in recent years. This chapter discusses this
exciting and rapidly evolving area, compares PUF-based authentication with other
standard approaches, and identifies several open research problems.
Chapter 7 discusses security with IP and SoC designs based on field-programmable
gate arrays (FPGA). FPGAs have been the focus of attention because they permit
dynamic reconfigurability while still providing the energy efficiency and perfor-
mance compatible with a custom hardware implementation for many applications.
Unfortunately, FPGA-based IP implementations induce a number of significant secu-
rity challenges of their own. An FPGA-based IP is essentially a design implemen-
tation in a low-level hardware description language (also referred to as an FPGA
bitstream) which is loaded on a generic FPGA architecture. To ensure authentica-
tion and prevent unauthorized access, the bitstream needs to be encrypted, and must
thereafter be decrypted on-the-fly during load or update. However, bitstreams are
often updated on field and the encryption may be attacked through side channel or
other means. If the entire SoC is implemented in FPGA, IP management and coordi-
nation may become even more challenging. This chapter discusses the various facets
of security techniques for IP and SOC FPGA, open problems, and areas of research.
Chapter 8 discusses PUFs and IP protection techniques. IP protection techniques
are techniques to ensure robustness of IPs against various threats, including supply
chain challenges, Trojan, and counterfeiting. The chapter provides a broad overview
of the use of PUFs and IP protection techniques in modern SoC designs, and various
conflicts, cooperations, and trade-offs involved.
Chapter 9 discusses SoC design techniques that are resistant to fault injection attacks.
Fault injection attack is a complex, powerful, and versatile approach to subvert SoC
design protections, particularly cryptographic implementations. The chapter pro-
vides an overview of fault injection attacks and describes broad class of design tech-
niques to develop systems that are robust against such attacks.
Chapter 10 looks closely into one of the core problems of SoC security, e.g., hard-
ware Trojans. With increasing globalization of the SoC design supply chain there
has been increasing threat of such Trojans, i.e., hardware circuitry that may per-
form intentionally malicious activity including subverting communications, leaking
secrets, etc. The chapter looks closely at various facets of this problem from IP secu-
rity perspective, the countermeasures taken in the current state of practice, their defi-
ciencies, and directions for research in this area.
Chapter 11 discusses logic obfuscation techniques, in particular for FPGA designs
based on nonvolatile technologies. Logic obfuscation is an important technique to
provide robustness of an IP against a large class of adversaries. It is particularly criti-
cal for FPGA-based designs which need to be updated on field. The chapter proposes
a scheme for loading obfuscated configurations into nonvolatile memories to protect
design data from physical attacks.
Chapter 12 presents a discussion on security standards in embedded SoCs used in
diverse applications, including automotive systems. It notes that security is a critical
8 S. Ray et al.
consideration in the design cycle of every embedded SoC. But the level of secu-
rity and resulting cost–benefit trade-off depend on a target application. This chapter
provides a valuable industry perspective to this critical problem. It describes two lay-
ers of a hierarchical security model that a system designer typically uses to achieve
application-specific security needs: (1) foundation level security targeted at basic
security services, and (2) security protocols like TLS or SSL.
It has been a pleasure and honor for the editors to edit this material, and we hope
the broad coverage of system-level security challenges provided here will bridge
a key gap in our understanding of the current and emergent security challenges.
We believe the content of the book will provide a valuable reference for SoC secu-
rity issues and solutions to a diverse readership including students, researchers, and
industry practitioners. Of course, it is impossible for any book on the topic to be
exhaustive on this topic: it is too broad, too detailed, and touches too many areas of
computer science and engineering. Nevertheless, we hope that the book will provide
a flavor of the nature of the needed and current research in this area, and cross-cutting
challenges across different areas that need to be done to achieve the goal of trustwor-
thy computing systems.
References
1. Evans, D.: The Internet of Things—How the Next Evolution of the Internet is Changing Every-
thing. White Paper, Cisco Internet Business Solutions Group (IBSG) (2011)
2. Li, X., Oberg, J.V.K., Tiwari, M., Rajarathinam, V., Kastner, R., Sherwood, T., Hardekopf, B.,
Chong, F.T.: Sapper: a language for hardware-level security policy enforcement. In: Interna-
tional Conference on Architectural Support for Programming Languages and Operating Sys-
tems (2014)
3. Srivatanakul, J., Clark, J.A., Polac, F.: Effective security requirements analysis: HAZOPs and
use cases. In: 7th International Conference on Information Security, pp. 416–427 (2004)
4. ARM: Building a secure system using trustzone technology. ARM Limited (2009)
5. Basak, A., Bhunia, S., Ray, S.: A flexible architecture for systematic implementation of SoC
security policies. In: Proceedings of the 34th International Conference on Computer-Aided
Design (2015)
6. Intel: Intel® Software Guard Extensions Programming Reference. https://software.intel.com/
sites/default/files/managed/48/88/329298-002.pdf
7. Samsung: Samsung KNOX. www.samsungknox.com
8. Microsoft Threat Modeling & Analysis Tool version 3.0 (2009)
9. JasperGold Security Path verification App. https://www.cadence.com/tools/system-design-
and-verification/formal-and-static-verification/jasper-gold-verification-platform/security-
path-verification-app.html
10. Bazhaniuk, O., Loucaides, J., Rosenbaum, L., Tuttle, M.R., Zimmer, V.: Excite: symbolic exe-
cution for BIOS security. In: Workshop on Offensive Technologies (2015)
11. Kannavara, R., Havlicek, C.J., Chen, B., Tuttle, M.R., Cong, K., Ray, S., Xie, F.: Challenges
and opportunities with concolic testing. In: NAECON 2015 (2015)
12. Takanen, A., DeMott, J.D., Mille, C.: Fuzzing for software security testing and quality assur-
ance. Artech House (2008)
13. Ray, S., Hoque, T., Basak, A., Bhunia, S.: The power play: trade-offs between energy and
security in IoT. In: ICCD (2016)
14. Ray, S., Yang, J., Basak, A., Bhunia, S.: Correctness and security at odds: post-silicon valida-
tion of modern SoC designs. In: Proceedings of the 52nd Annual Design Automation Confer-
ence (2015)
Chapter 2
Security Validation in Modern SoC Designs
S. Ray (✉)
Strategic CAD Labs, Intel Corporation, Hillsboro, OR 97124, USA
e-mail: [email protected]
S. Bhunia ⋅ P. Mishra
Department of ECE, University of Florida, Gainesville, FL 32611, USA
e-mail: [email protected]fl.edu
P. Mishra
e-mail: [email protected]fl.edu
Fig. 2.1 Some typical smartphone applications and corresponding private end user information
(a) (b)
Fig. 2.2 Some potential attacks on a modern SoC design. a Potential attack areas for a smartphone
after production and deployment. b Potential threats from untrusted supply chain during the design
life cycle of an SoC design
out to unauthorized sources. This includes cryptographic and DRM keys, premium
content locks, firmware execution flows, debug modes, etc. Note that the notion of
“unauthorized source” changes based on what asset we are talking about: end user
may be an unauthorized source for DRM keys while manufacturer/OEM may be an
unauthorized source for end user private information.
In addition to criticality of the assets involved, another factor that makes SoC
security both critical and challenging is the high diversity of attacks possible.
Figure 2.2 provides a flavor of potential attacks on a modern SoC design. Of par-
ticular concern are the following two observations:
∙ Because of the untrusted nature of the supply chain, there are security threats at
most stages of the design development, even before deployment and production.
∙ A deployed SoC design inside a computing device (e.g., smartphone) in the hand
of the end user is prone to a large number of potential attacker entry points, includ-
ing applications, software, and network, browser, and sensors. Security assurance
must permit protection against this large attack surface.
We discuss security validation for the continuum of attacks from design to deploy-
ment. Given that the attacks are diverse, protection mechanisms are also varied, and
2 Security Validation in Modern SoC Designs 11
The life cycle of a SoC from concept to deployment involves number of security
threats at all stages involving various parties. Figure 2.2b shows the SoC life cycle
and the security threats that span the entire life cycle. These threats are increasing
with the rapid globalization of the SoC design, fabrication, validation, and distribu-
tion steps, driven by the global economic trend.
This growing reliance on reusable pre-verified hardware IPs during SoC design,
often gathered from untrusted third-party vendors, severely affects the security and
trustworthiness of SoC computing platforms. Statistics show that the global market
for third-party semiconductor IPs grew by more than 10 % to reach more than 2.1
billion in late 2012 [1]. The design, fabrication, and supply chain for these IP cores is
generally distributed across the globe involving USA, Europe, and Asia. Figure 2.3
illustrates the scenario for an example SoC that includes processor, memory con-
trollers, security, graphics, and analog core. Due to growing complexity of the IPs
as well as the SoC integration process, SoC designers increasingly tend to treat these
IPs as black box and rely on the IP vendors on the structural/functional integrity of
these IPs. However, such design practices greatly increase the number of untrusted
components in a SoC design and make the overall system security a pressing con-
cern.
Hardware IPs acquired from untrusted third-party vendors can have diverse secu-
rity and integrity issues. An adversary inside an IP design house involved in the
IP design process can deliberately insert a malicious implant or design modifi-
cation to incorporate hidden/undesired functionality. In addition, since many of
the IP providers are small vendors working under highly aggressive schedules,
it is difficult to ensure a stringent IP validation requirement in this ecosystem.
Design features may also introduce unintentional vulnerabilities, e.g., intentional
information leakage through hidden test/debug interfaces or side-channels through
power/performance profiles. Similarly, IPs can have uncharacterized parametric
behavior (e.g., power/thermal) which can be exploited by an attacker to cause irrecov-
erable damage to an electronic system. There are documented instances of such
attacks. For example, in 2012, a study by a group of researchers in Cambridge
revealed an undocumented silicon level backdoor in a highly secure military-grade
ProAsic3 FPGA device from MicroSemi (formerly Actel) [2], which was later
described as a vulnerability induced unintentionally by on-chip debug infrastruc-
ture. In a recent report, researchers have demonstrated such an attack where a mali-
cious upgrade of a firmware destroys the processor it is controlling by affecting
the power management system [3]. It manifests a new attack mode for IPs, where
12 S. Ray et al.
Fig. 2.3 An SoC would often contain hardware IP blocks obtained from entities distributed across
the globe
In addition to supply-chain threats, the design itself may have exploitable vulnerabil-
ities. Vulnerabilities in system design, in fact, forms the quintessential objective of
security study, and has been the focus of research for over three decades. At a high
level, the definition of security requirement for assets in a SoC design follows the
well-known “CIA” paradigm, developed as part of information security research [6].
In this paradigm, accesses and updates to secure assets are subject to the following
three requirements:
∙ Confidentiality: An asset cannot be accessed by an agent unless authorized to do
so.
∙ Integrity: An asset can be mutated (e.g., the data in a secure memory location can
be modified) only by an agent authorized to do so.
∙ Availability: An asset must be accessible to an agent that requires such access as
part of correct system functionality.
Of course, mapping these high-level requirements to constraints on individual assets
in a system is nontrivial. This is achieved by defining a collection of security poli-
cies that specify which agent can access a specific asset and under what conditions.
Following are two examples of representative security policies. Note that while illus-
trative, these examples are made up and do not represent security policy of a specific
company or system.
Example 1 During boot time, data transmitted by the cryptographic engine cannot
be observed by any IP in the SoC other than its intended target.
Example 2 A programmable fuse containing a secure key can be updated during
manufacturing but not after production.
and product needs. Following are some representative policy classes. They are not
complete, but illustrate the diversity of policies employed.
Access Control. This is the most common class of policies, and specifies how differ-
ent agents in an SoC can access an asset at different points of the execution. Here an
“agent” can be a hardware or software component in any IP of the SoC. Examples 1
and 2 above represent such policy. Furthermore, access control forms the basis of
many other policies, including information flow, integrity, and secure boot.
Information Flow. Values of secure assets can sometimes be inferred without direct
access, through indirect observation or “snooping” of intermediate computation or
communications of IPs. Information flow policies restrict such indirect inference.
An example of information flow policy might be the following.
∙ Key Obliviousness: A low-security IP cannot infer the cryptographic keys by
snooping only the data from crypto engine on a low-security communication fab-
ric.
Information flow policies are difficult to analyze. They often require highly sophisti-
cated protection mechanisms and advanced mathematical arguments for correctness,
typically involving hardness or complexity results from information security. Con-
sequently they are employed only on critical assets with very high confidentiality
requirements.
Liveness. These policies ensure that the system performs its functionality without
“stagnation” throughout its execution. A typical liveness policy is that a request for
a resource by an IP is followed by an eventual response or grant. Deviation from
such a policy can result in system deadlock or livelock, consequently compromising
system availability requirements.
Time-of-Check Versus Time of Use (TOCTOU). This refers to the requirement
that any agent accessing a resource requiring authorization is indeed the agent that
has been authorized. A critical example of TOCTOU requirement is in firmware
update; the policy requires firmware eventually installed on update is the same
firmware that has been authenticated as legitimate by the security or crypto engine.
Secure Boot. Booting a system entails communication of significant security assets,
e.g., fuse configurations, access control priorities, cryptographic keys, firmware
updates, debug and post-silicon observability information, etc. Consequently, boot
imposes more stringent security requirements on IP internals and communications
than normal execution. Individual policies during boot can be access control, infor-
mation flow, and TOCTOU requirements; however, it is often convenient to coalesce
them into a unified set of boot policies.
2 Security Validation in Modern SoC Designs 15
To discuss security validation, one of the first steps is to identify how a security
policy can be subverted. Doing so is tantamount to identifying potential adversaries
and charactertizing the power of the adversaries. Indeed, effectiveness of virtually all
security mechanisms in SoC designs today are critically dependent on how realistic
the model of the adversary is, against which the protection schemes are considered.
Conversely, most security attacks rely on breaking some of the assumptions made
regarding constraints on the adversary while defining protection mechanisms. When
discussing adversary and threat models, it is worth noting that the notion of adversary
can vary depending on the asset being considered: in the context of protecting DRM
keys, the end user would be considered an adversary, while the content provider (and
even the system manufacturer) may be included among adversaries in the context of
protecting private information of the end user. Consequently, rather than focusing on
a specific class of users as adversaries, it is more convenient to model adversaries
corresponding to each policy and define protection and mitigation strategies with
respect to that model.
Defining and classifying the potential adversary is a highly creative process. It
needs considerations such as whether the adversary has physical access to the sys-
tem, which components they can observe, control, modify, or reverse-engineer, etc.
Recently, there have been some attempts at developing a disciplined, clean catego-
rization of adversarial powers. One potential categorization, based on the interfaces
through which the adversary can gain access to the system assets, can be used to
classify them into the following six broad categories (in order of increasing sophis-
tication). Note that there has been significant research into specific attacks in different
categories, and a comprehensive treatment of different attacks is beyond the scope
of this chapter; the interested reader is encouraged to look up some of the references
for a thorough description of specific details.
Unprivileged Software Adversary: This form of adversary models the most com-
mon type of attack on SoC designs. Here the adversary is assumed to not have access
to any privileged information about the design or architecture beyond what is avail-
able for the end user, but is assumed to be smart enough to identify or “reverse-
engineer” possible hardware and software bugs from observed anomalies. The under-
lying hardware is also assumed to be trustworthy, and the user is assumed to have no
physical access to the underlying IPs. The importance of this naïve adversarial model
is that any attack possible by such an adversary can be potentially executed by any
user, and can therefore be easily and quickly replicated on-field on a large number of
system instances. For these types of attacks, the common “entry point” of the attack
is assumed to be user-level application software, which can be installed or run on the
system without additional privileges. The attacks then rely on design errors (both in
hardware and software) to bypass protection mechanisms and typically get a higher
privilege access to the system. Examples of these attacks include buffer overflow,
code injection, BIOS infection, return-oriented programming attacks, etc. [7, 8].
16 S. Ray et al.
System Software Adversary: This provides the next level of sophistication to the
adversarial model. Here we assume that in addition to the applications, potentially
the operating system itself may be malicious. Note that the difference between the
system software adversary and unprivileged software adversary can be blurred, in
the presence of bugs in the operating system implementation leading to security vul-
nerabilities: such vulnerabilities can be seen as unprivileged software adversaries
exploiting an operating system bug, or a malicious operating system itself. Nev-
ertheless, the distinction facilitates defining the root of trust for protecting system
assets. If the operating system is assumed untrusted, then protection and mitigation
mechanisms must rely on lower level (typically hardware) primitives to ensure pol-
icy adherence. Note that system software adversary model can have a highly subtle
and complex impact on how a policy can be implemented, e.g., recall from the mas-
querade prevention example above that it can affect the definition of communication
fabric architecture, communication protocol among IPs, etc.
Software Covert-Channel Adversary: In this model, in addition to system and
application software, a side-channel or covert-channel adversary is assumed to have
access to nonfunctional characteristics of the system, e.g., power consumption, wall-
clock time taken to service a specific user request, processor performance counters,
etc., which can be used in subtle ways to identify how assets are stored, accessed, and
communicated by IPs (and consequently subvert protection mechanisms) [9, 10].
Naïve Hardware Adversary: Naive hardware adversary refers to the attackers who
may gain the access to the hardware devices. While the attackers may not have
advanced reverse-engineering tools, they may be equipped with basic testing tools.
Common targets for these types of attacks include exposed debug interfaces and
glitching of control or data lines [11]. Embedded systems are often equipped with
multiple debugging ports for quick prototype validation and these ports often lack
proper protection mechanisms, mainly because of the limited on-board resources.
These ports are often left on purpose to facilitate the firmware patching or bug-
fixing for errors and malfunctions detected on-field. Consequently, these ports also
provide potential weakness which can be exploited for violating security policies.
Indeed, some of the “celebrated” attacks in recent times make use of available hard-
ware interfaces including the XBOX 360 Hack [12], Nest Thermostat Hack [13], and
several smartphone jailbreaking techniques.
Hardware Reverse-Engineering Adversary: In this model, the adversary is
assumed to be able to reverse-engineer the silicon implementation for on-chip secrets
identification. In practice, such reverse-engineering may depend on sniffing inter-
faces as discussed for naïve hardware adversaries. In addition, they can depend
on advanced techniques such as laser-assisted device alteration [14] and advanced
chip-probing techniques [15]. Hardware reverse engineering can be further divided
into two categories: (1) chip-level and (2) IP core functionality reconstruction. Both
attack vectors bring security threats into the hardware systems, and permit extrac-
tion of secret information (e.g., cryptographic and DRM keys coded into hardware),
which cannot be otherwise accessed through software or debugging interfaces.
2 Security Validation in Modern SoC Designs 17
One may wonder, why is it not possible to reuse traditional functional verification
techniques to this problem? This is due to the fact that IP trust validation focuses
on identifying malicious modifications such as hardware Trojans. Hardware Trojans
typically require two parts: (1) a trigger, and (2) a payload. The trigger is a set of con-
ditions that their activation deviates the desired functionality from the specification
and their effects are propagated through the payload. An adversary designs trigger
conditions such that they are satisfied in very rare situations and usually after long
hours of operation [16]. Consequently, it is extremely hard for a naïve functional vali-
dation technique to activate the trigger condition. Below we discuss a few approaches
based on simulation-based validation as well as formal methods. A detailed descrip-
tion of various IP trust validation techniques is available in [17, 18].
Simulation-Based Validation: There are significant research efforts on hardware
Trojan detection using random and constrained-random test vectors. The goal of
logic testing is to generate efficient tests to activate a Trojan and to propagate its
effects to the primary output. These approaches are beneficial in detecting the pres-
ence of a Trojan. Recent approaches based on structural/functional analysis [19–
21] are useful to identify/localize the malicious logic. Unused Circuit Identification
(UCI) [19] approaches look for unused portions in the circuit and flag them as mali-
cious. The FANCI approach [21] was proposed to flag suspicious nodes based on
the concept of control values. Oya et al. [20] utilized well-crafted templates to iden-
tify Trojans in TrustHUB benchmarks [22]. These methods assume that the attacker
uses rarely occurring events as Trojan triggers. Using “less-rare” events as trigger
18 S. Ray et al.
will void these approaches. This was demonstrated in [23], where Hardware Trojans
were designed to defeat UCI [19].
Side-Channel Analysis: Based on the fact that a trigger condition usually has
extremely low probability, the traditional ATPG-based method for functional testing
cannot fulfill the task of Trojan activation and detection. Bhunia et al. [16] proposed
the multiple excitation of rare occurrence (MERO) approach to generate more effec-
tive tests to increase the probability to trigger the Trojan. A more recent work by
Saha et al. [24] can improve MERO to get higher detection coverage by identify-
ing possible payload nodes. Side-channel analysis focuses on the side channel sig-
natures (e.g., delay, transient, and leakage power) of the circuit [25], which avoids
the limitations (low trigger probability and propagation of payload) of logic testing.
Narasimhan et al. [26] proposed the Temporal Self-Referencing approach on large
sequential circuits, which compares the current signature of a chip at two different
time windows. This approach can completely eliminate the effect of process noise,
and it takes optimized logic test sets to maximize the activity of the Trojan.
Equivalence Checking: In order to trust an IP block, it is necessary to make sure
that the IP is performing the expected functionality—nothing more and nothing less.
From security point of view, verification of correct functionality is not enough. The
verification engineer has to confirm that there are no other activities besides the
desired functionality. Equivalence checking ensures that the specification and imple-
mentation are equivalent. Traditional equivalence checking techniques can lead to
state space explosion when large IP blocks are involved with significantly different
specification and implementation. One promising direction is to use Gröbner basis
theory to verify arithmetic circuits [27]. Similar to [28], the reduction of specifica-
tion polynomial with respect to Gröbner basis polynomials is performed by Gaussian
elimination to reduce verification time. In all of these methods, when the remainder
is nonzero, it shows that the specification is not exactly equivalent with the imple-
mentation. Thus, the nonzero remainder can be analyzed to identify the hidden mal-
functions or Trojans in the system.
Model Checking: Model checking is the process of analyzing a design for the
validity of properties stated in temporal logic. A model checker takes the Regis-
ter Transfer Level (RTL) (e.g., Verilog) code along with the property written as a
Verilog assertion and derives a Boolean satisfiability (SAT) formulation for validat-
ing/invalidating the property. This SAT formulation is fed to a SAT engine, which
then searches for an input assignment that violates the property [29]. In practice,
designers know the bounds on the number of steps (clock cycles) within which a
property should hold. In Bounded Model Checking (BMC), a property is determined
to hold for at least a finite sequence of state transitions. The Boolean formula for val-
idating/ invalidating the target property is given to a SAT engine, and if a satisfying
assignment is observed within specific clock cycles, that assignment is a witness
against the target property [30]. The properties can be developed to detect Trojans
that corrupt critical data and verify the target design for satisfaction of these proper-
ties using a bounded model checker.
2 Security Validation in Modern SoC Designs 19
Theorem Proving: Theorem provers are used to prove or disprove properties of sys-
tems expressed as logical statements. However, verifying large and complex systems
using theorem provers require excessive effort and time. Despite these limitations,
theorem provers have currently drawn a lot of interest in verification of security prop-
erties on hardware. In [31–33], the Proof-Carrying Hardware (PCH) framework was
used to verify security properties on soft IP cores. Supported by the Coq proof assis-
tant [34], formal security properties can be formalized and proved to ensure the trust-
worthiness of IP cores. The PCH method is inspired from the proof-carrying code
(PCC), which was proposed by Necula [35]. The central idea is that untrusted devel-
opers/vendors certify their IP. During the certification process, the vendor devel-
ops safety proof for the safety policies provided by IP customers. The vendor then
provides the user with the IP design, which includes the formal proof of the safety
properties. The customer becomes assured of the safety of the IP by validating the
design using a proof checker. A recent approach presented a scalable trust validation
framework using a combination of theorem proving and model checking [36].
We now turn to the problem of system-level security validation for the SoC designs.
This process takes place in the SoC design house and continues across the system
design life cycle. When performing system-level validation, the constituent IPs are
assumed to have undergone a level of standalone trust validation before integration.
Figure 2.4 provides a high-level overview of the SoC design life cycle. Each com-
ponent of the life cycle, of course, involves a large number of design, development,
and validation activities. Here, we summarize the key activities involved along the
life cycle, that pertain to security. Subsequent sections will elaborate on the individ-
ual activities.
Risk Assessment. Security requirements definition is a key part of product plan-
ning, and happens concurrently with (and in close collaboration with) the definition
of architectural features of the product. This process involves identifying the secu-
rity assets in the system, their ownership, and protection requirements, collectively
defined as security policies (see below). The result of this process is typically the
generation of a set of documents, often referred to as product security specification
(PSS), which provides the requirements for downstream architecture, design, and
validation activities.
Security Architecture. The goal of a security architecture is to design mecha-
nisms for protection of system assets as specified by the PSS. It includes several
components, as follows: (1) identifying and classifying potential adversary for each
asset; (1) determining attacker entry points, also referred to as threat modeling; and
(3) developing protection and mitigation strategies. The process can identify addi-
tional security policies—typically at a lower level than those identified during risk
assessment (see below)—which are added to the PSS. The security definition typi-
20 S. Ray et al.
cally proceeds in collaboration with architecture and design of other system features,
including speed, power management, thermal characteristics, etc., with each compo-
nent potentially influencing the others.
Security Validation. Security validation represents one of the longest and most crit-
ical part of security assurance for industrial SoC designs, spanning the architecture,
design, and post-silicon components of the system life cycle. The actual validation
target and properties validated at any phase, of course, depends on the collateral
available in that phase. For example, we target, respectively, architecture, design,
implementation, and silicon artifacts as the system development matures. Below
we will discuss some of the key validation activities and associated technologies.
One key component of security validation is to develop techniques to subvert the
advertised security requirements of the system, and identify mitigation measures.
Mitigation measures for early-stage validation targeting architecture and early sys-
tem design often include significant refinement of the security architecture itself. At
later stages of the system life cycle, when architectural changes are no longer feasi-
ble due to product maturity, mitigation measures can include software or firmware
patches, product defeature, etc.
Unfortunately, the role of security validation is different from most other kinds of val-
idation (such as functional or power-performance or timing) since the requirements
are typically less precise. In particular, the goal of security validation is to “validate
conditions related to security and privacy of the system that are not covered by other
2 Security Validation in Modern SoC Designs 21
validation activities.” The requirement that security validation focuses on targets not
covered by other validation is important given the strict time-to-market constraints,
which preclude duplication of resources for the same (or similar) validation tasks;
however, it puts onus on the security validation organization to understand activi-
ties performed across the spectrum of the SoC design validation and identify holes
that pertain to security. To exacerbate the problem, a significant amount of security
objectives are not clearly specified, making it difficult to (1) identify validation tasks
to be performed, and (2) develop clear coverage/success criteria for the validation.
Consequently, the validation plan includes a large number of diverse activities that
range from the science to the art and sometimes even “black magic.”
At a high level, security validation activities can be divided roughly among the
following four categories.
Functional Validation of Security-sensitive Design Features. This is essentially
extension to functional validation, but pertain to design elements involved in crit-
ical security feature implementations. An example is the cryptographic engine IP.
A critical functional requirement for the crypographic engine is that it encrypts and
decrypts data correctly for all modes. As with any other design block, the crypto-
graphic engine is also a target of functional validation. However, given that it is a
critical component of a number of security-critical design features, security valida-
tion planning may determine that correctness of cryptographic functionality to be
crucial enough to justify further validation beyond the coverage provided by vanilla
functional validation activities. Consequently, such an IP may undergo more rigorous
testing, or even formal analysis in some cases. Other such critical IPs may include
IPs involved in secure boot, on-field firmware patching, etc.
Validation of Deterministic Security Requirements. Deterministic security
requirements are validation objectives that can be directly derived from security
policies. Such objectives typically encompass access control restrictions, address
translations, etc. Consider an access control restriction that specifies a certain range
of memory to be protected from Direct Memory Access (DMA) access; this may
be done to ensure protection against code-injection attacks, or protect a key that is
stored in such location, etc. An obvious derived validation objective is to ensure that
all DMA calls for access to a memory whose address translates to an address in the
protected range must be aborted. Note that validation of such properties may not
be included as part of functional validation, since DMA access requests for DMA-
protected addresses are unlikely to arise for “normal” test cases or usage scenarios.
Negative Testing. Negative testing looks beyond the functional specification of
designs to identify if security objectives can be subverted or are underspecified.
Continuing with the DMA-protection example above, negative testing may extend
the deterministic security requirement (i.e., abortion of DMA access for protected
memory ranges) to identify if there are any other paths to protected memory in addi-
tion to address translation activated by a DMA access request, and if so, potential
input stimulus to activate such paths.
22 S. Ray et al.
Recall from above that focused functional validation of security-critical design com-
ponents form a key constituent of security validation. From that perspective, secu-
rity validation includes and supersedes all functional validation tools, flows, and
methodologies. Functional validation of SoC designs is a mature and established
area, with a number of comprehensive surveys covering different aspects [37, 38].
In this section, we instead consider validation technologies to support other vali-
dation activities, e.g., negative testing, white-box hacking, etc. As discussed above,
these activities inherently depend on human creativity; tools, methodologies, and
infrastructures around them primarily act as assistants, filling in gaps in human rea-
soning and providing recommendations.
Security validation today primarily uses three key technologies: fuzzing, pene-
tration testing, and formal or static analysis. Here we provide a brief description of
these technologies. Note that fuzzing and static analysis are very generic techniques
with applications beyond security validation; our description will be confined to their
applications only on security.
Fuzzing. Fuzzing, or fuzz testing [39], is a testing technique for hardware or soft-
ware that involves providing invalid, unexpected, or random inputs and monitoring
the result for exceptions such as crashes, or failing built-in code assertions or mem-
ory leaks. Figure 2.5 demonstrates a standard fuzzing framework. It was developed
as a software testing approach, and has since been adapted to hardware/software
systems. It is currently a common practice in industry for system-level validation.
In the context of security, it is effective for exposing a number of potential attacker
entry points, including through buffer or integer overflows, unhandled exceptions,
race conditions, access violations, and denial of service. Traditionally, fuzzing uses
either random inputs or random mutations of valid inputs. A key attraction to this
approach is its high automation compared to other validation technologies such as
penetration testing and formal analysis. Nevertheless, since it relies on randomness,
fuzzing may miss security violations that rely on unique corner-case scenarios. To
address that deficiency, there has been recent work on “smart” input generation for
fuzzing, based on domain-specific knowledge of the target system. Smart fuzzing
2 Security Validation in Modern SoC Designs 23
Fig. 2.5 A pictorial representation of fuzzing framework used in post-silicon SoC security vali-
dation
may provide a greater coverage of security attack entry points, at the cost of more
up front investment in design understanding.
Penetration Testing. A penetration test is an attack on a computer system with the
intention to find security weakness, potentially gaining access to it, its functionality,
and data. It is typically performed by expert hackers often with deep knowledge
of system architecture, design, and implementation characteristics. Note that while
there are commonalities between penetration testing and testing done on functional
validation, there are several important differences. In particular, roughly, penetration
testing involves iterative application of the following three phases:
1. Attack Surface Enumeration. The first task is to identify the features or aspects
of the system that are vulnerable to attack. This is typically a creative process
involving a smorgasbord of activities, including documentation review, network
service scanning, and even fuzzing or random testing (see below).
2. Vulnerability Exploitation. Once the potential attacker entry points are discov-
ered, applicable attacks and exploits are attempted against target areas. This may
require research into known vulnerabilities, looking up applicable vulnerability
class attacks, engaging in vulnerability research specific to the target, and writ-
ing/creating the necessary exploits.
3. Result Analysis. If the attack is successful, then in this phase the resulting state of
the target is compared against security objectives and policy definitions to deter-
mine if the system was indeed compromised. Note that even if a security objective
is not directly compromised, a successful attack may identify additional attack
surface which must then be accounted for with further penetration testing.
Note that while there are commonalities between penetration testing and testing done
functional validation, there are several important differences. In particular, the goal
of functional testing is to simulate benign user behavior and (perhaps) accidental
failures under normal environmental conditions of operation of the design as defined
by its specification. Penetration testing goes outside the specification to the limits set
by the security objective, and simulates deliberate attacker behavior.
24 S. Ray et al.
Clearly, the efficacy of penetration testing critically depends on the ability to iden-
tify the attack surface in the first phase above. Unfortunately, rigorous methodologies
for achieving this are lacking. Following are some of the typical activities in current
industrial practice to identify attacks and vulnerabilities. We classify them below
as “easy,” “medium,” and “hard” depending on the creativity necessary. Note that
there are tools to assist the human in many of the activities below [40, 41]. How-
ever, determining the relevancy of the activity, identifying the degree to which each
activity should be explored, and inferring a potential attack from the result of the
activity involve significant creativity.
∙ Easy Approaches. These include review of available documentation (e.g., speci-
fication, architectural materials, etc.), known vulnerabilities or misconfigurations
of IPs, software, or integration tools, missing patches, use of obsolete or out-of-
date software versions, etc.
∙ Medium Approaches. These include inferring potential vulnerabilities in the
target of interest from information about misconfigurations, vulnerabilities, and
attacks in related or analogous products, e.g., a competitor product, a previous
software version, etc. Other activities of similar complexity involve executing rel-
evant public security tools or published attack scenarios against the target.
∙ Hard Approaches. This includes full security evaluation of any utilized third-
party components, integration testing of the whole platform, and identification of
vulnerabilities involving communications among multiple IPs or design compo-
nents. Finally, vulnerability research involves identifying new classes of vulnera-
bilities for the target which have never been seen before. The latter is particularly
relevant for new IPs or SoC designs for completely new market segments.
Static or Formal Reasoning. This involves making use of mathematical logic to
either derive a security assurance requirement formally, or identifying flaws in the
target system (architecture, design, or implementation). Application of formal meth-
ods typically involve significant effort, either in the manual exercise of performing
deductive reasoning or in developing abstractions of the security objective which are
amenable to analysis by automated formal tools [38, 42]. In spite of the cost, how-
ever, the effort is justified for highly critical security objectives, e.g., cryptographic
algorithm implementation. Furthermore, for some critical properties, automated for-
mal methods can be used in a light-weight manner as effective state exploration tools.
For example, TOCTOU property violations often involve scenarios of overlapping
execution of different instances of the same protocol, which are effectively exposed
by formal methods tools [43]. Finally, formal proofs have also been used as certifi-
cation mechanisms for third party IP vendors to convey security assurance to SoC
system integration teams [33].
2 Security Validation in Modern SoC Designs 25
2.9 Summary
References
1. Ramamoorthy, G.: Market share analysis: semiconductor design intellectual property, world-
wide (2012). https://www.gartner.com/doc/2403015/market-share-analysis-semiconductor-
design
2. Skorobogatov, S., Woods, C.: Breakthrough silicon scanning discovers backdoor in military
chip. In: CHES, pp. 23–40 (2012)
3. Messmer, E.: RSA security attack demo deep-fries Apple Mac components (2014). http://www.
networkworld.com/news/2014/022614-rsa-apple-attack-279212.html
4. Nahiyan, A., Xiao, K., Forte, D., Jin, Y., Tehranipoor, M.: AVFSM: a framework for identifying
and mitigating vulnerabilities in FSMs. In: Design Automation Conference (DAC) (2016)
5. Tehranipoor, M., Guin, U., Forte, D.: Counterfeit Integrated Circuits: Detection and Avoidance.
Springer (2014)
6. Greenwald, S.J.: Discussion topic: what is the old security paradigm. In: Workshop on New
Security Paradigms, pp. 107–118 (1998)
26 S. Ray et al.
7. Davi, L., Sadeghi, A.R., Winandy, M.: Dynamic integrity measurement and attestation:
towards defense against return-oriented programming attacks. In: Proceedings of the 2009
ACM workshop on Scalable trusted computing, STC’09 (2009)
8. Schuster, F., Tendyck, T., Liebchen, C., Davi, L., Sadeghi, A.R., Holz, T.: Counterfeit object-
oriented programming: On the difficulty of preventing code reuse attacks in C++ applications.
In: Proceedings of the 36th IEEE Symposium on Security and Privacy (2015)
9. Kocher, P.C.: Timing attacks on implementations of Diffie-Hellman, RSA, DSS, and other
systems. In: 16th Annual International Cryptology Conference, pp. 104–113 (1996)
10. Kocher, P.C., Jaffe, J., Jun, B.: Differential power analysis. In: 19th Annual International Cryp-
tology Conference, pp. 398–412 (1999)
11. Ray, S., Yang, J., Basak, A., Bhunia, S.: Correctness and security at odds: post-silicon valida-
tion of modern SoC designs. In: Proceedings of the 52nd Annual Design Automation Confer-
ence (2015)
12. Homebrew Development Wiki: JTAG-Hack. http://dev360.wikia.com/wiki/JTAG-Hack
13. Hernandez, G., Arias, O., Buentello, D., Jin, Y.: Smart nest thermostat: a smart spy in your
home. In: Black Hat USA (2014)
14. Rowlette, R., Eiles, T.: Critical timing analysis in microprocessors using near-IR laser assisted
device alteration (LADA). In: IEEE International Test Conference, pp. 264–273 (2003)
15. http://www.chipworks.com/
16. Chakraborty, R.S., Wolff, F., Paul, S., Papachristou, C., Bhunia, S.: MERO: A statistical
approach for hardware trojan detection. In: Workshop on Cryptographic Hardware and Embed-
ded Systems (2009)
17. Mishra, P., Bhunia, S., Tehranipoor, M.: Hardware IP Security and Trust. Springer (2016)
18. Guo, X., Dutta, R.G., Jin, Y., Farahmandi, F., Mishra, P.: Pre-silicon security verification and
validation: a formal perspective. In: ACM/IEEE Design Automation Conference (DAC) (2015)
19. Hicks, M., Finnicum, M., King, S., Martin, M., Smith, J.: Overcoming an untrusted comput-
ing base: detecting and removing malicious hardware automatically. In: IEEE Symposium on
Security and Privacy (SP), pp. 159–172 (2010)
20. Oya, M., Shi, Y., Yanagisawa, M., Togawa, N.: A score-based classification method for iden-
tifying hardware-trojans at gate-level netlists. In: Design Automation and Test in Europe
(DATE), pp. 465–470 (2015)
21. Waksman, A., Suozzo, M., Sethumadhavan, S.: Fanci: identification of stealthy malicious logic
using boolean functional analysis. In: ACM SIGSAC Conference on Computer and Commu-
nications Security, pp. 697–708 (2013)
22. Trust-HUB. https://www.trust-hub.org/
23. Sturton, C., Hicks, M., Wagner, D., King, S.: Defeating UCI: building stealthy and malicious
hardware. In: 2011 IEEE Symposium on Security and Privacy (SP), pp. 64–77 (2011)
24. Saha, S., Chakraborty, R., Nuthakki, S., Anshul, Mukhopadhyay, D.: Improved test pattern
generation for hardware trojan detection using genetic algorithm and boolean satisfiability. In:
Cryptographic Hardware and Embedded Systems (CHES), pp. 577–596 (2015)
25. Aarestad, J., Acharyya, D., Rad, R., Plusquellic, J.: Detecting trojans through leakage current
analysis using multiple supply pad Iddq s. In: IEEE Transactions on Information Forensics and
Security, pp. 893–904 (2010)
26. Narasimhan, S., Wang, X., Du, D., Chakraborty, R., Bhunia, S.: Tesr: a robust temporal self-
referencing approach for hardware trojan detection. In: Hardware-Oriented Security and Trust
(HOST), pp. 71–74 (2011)
27. Farahmandi, F., Mishra, P.: Automated test generation for debugging arithmetic circuits. In:
Design Automation and Test in Europe (DATE) (2016)
28. Lv, J., Kalla, P., Enescu, F.: Efficient groebner basis reductions for formal verification of galois
field arithmetic circuits. IEEE Trans. CAD (TCAD) 32, 1409–1420 (2013)
29. Cadence Berkeley Lab: The cadence SMV model checker. http://www.kenmcmil.com
30. Biere, A., Cimatti, A., Clarke, E., Zhu, Y.: Symbolic model checking without BDDs. In: Tools
and Algorithms for the Construction and Analysis of Systems, p. 193207 (1999)
2 Security Validation in Modern SoC Designs 27
31. Jin, Y.: Design-for-security vs. design-for-testability: A case study on dft chain in cryptographic
circuits. In: IEEE Computer Society Annual Symposium on VLSI (ISVLSI) (2014)
32. Jin, Y., Yang, B., Makris, Y.: Cycle-accurate information assurance by proof-carrying based
signal sensitivity tracing. In: IEEE International Symposium on Hardware-Oriented Security
and Trust (HOST), pp. 99–106 (2013)
33. Love, E., Jin, Y., Makris, Y.: Proof-carrying hardware intellectual property: a pathway to
trusted module acquisition. IEEE Trans. Inf. Forensics Secur. 7(1), 25–40 (2012)
34. INRIA: The coq proof assistant (2010). http://coq.inria.fr/
35. Necula, G.C.: Proof-carrying code. In: POPL ’97: Proceedings of the 24th ACM SIGPLAN-
SIGACT Symposium on Principles of Programming Languages, pp. 106–119 (1997)
36. Guo, X., Dutta, R., Mishra, P., Jin, Y.: Scalable SoC trust verification using integrated theo-
rem proving and model checking. In: IEEE International Symposium on Hardware-Oriented
Security and Trust (HOST) (2016)
37. Bhadra, J., Abadir, M.S., Wang, L., Ray, S.: A survey of hybrid technqiues for functional ver-
ification. IEEE Des. Test Comput. 24(2), 112–122 (2007)
38. Gupta, A.: Formal hardware verification methods: a survey. Formal Methods Syst. Des. 2(3),
151–238 (1992)
39. Takanen, A., DeMott, J.D., Mille, C.: Fuzzing for Software Security Testing and Quality Assur-
ance. Artech House (2008)
40. Corporation, M.: Microsoft free security tools microsoft baseline security analyzer (2015).
https://blogs.microsoft.com/cybertrust/2012/10/22/microsoft-free-security-tools-microsoft-
baseline-security-analyzer/
41. Software, F.: (2012). http://secunia.com
42. Clarke, E.M., Grumberg, O., Peled, D.A.: Model-Checking. The MIT Press, Cambridge, MA
(2000)
43. Krstic, S., Yang, J., Palmer, D.W., Osborne, R.B., Talmor, E.: Security of SoC firmware load
protocol. In: IEEE HOST (2014)
Chapter 3
SoC Security and Debug
3.1 Introduction
Post-silicon debug includes a diverse range of activities performed after chip man-
ufacturing to diagnose issues on a chip. The debugging activities are performed
at several post-silicon stages. One of such stages is post-silicon validation. Due to
the increasing complexity of hardware implementation, SoC development nowadays
often requires multiple tapeouts. Post-silicon validation has become a necessary step
for validating the functionality and performance of an SoC. Post-silicon validation
offers the benefits of running tests at the chip’s working frequency compared with a
typical frequency of thousands of Hz in pre-silicon simulation. It accelerates discov-
ery of issues in both hardware and software and thus reduces validation time. Once
an issue is found, validation engineers must be able to get access to the internal states
and signals of the SoC in order to localize the bug and resolve the issue. Therefore,
comprehensive support of post-silicon debug capabilities is mandatory for modern
SoC development. Often, such debugging capabilities need to be extended after post-
silicon validation and manufacturing test. For example, authorized application soft-
ware developers need to diagnose why an application software crashes on a specific
SoC. Moreover, when a chip fails in field and is sent back to the manufacturer for
hardware evaluation, the analyst needs the debugging capabilities to find out the root
cause of the failure. The debugging capabilities of an SoC can be needed during its
entire product life cycle, spanning from post-silicon validation and chip bring-up to
platform software development and field return evaluation.
One of the most notable challenges in post-silicon debug is the reducedpg
observability and controllability compared with that in pre-silicon debug. Therefore,
a variety of Design-for-Debug (DfD) structures were developed to be instrumented
on-chip for increasing observability and controllability of the internal states of an
SoC. Such instrumentation circuitry enables getting access to the processor regis-
ters and memory from external test and debug interfaces. While the DfD circuitry
facilitates post-silicon debug, it exhibits security risks of exposing the secrets or
intellectual property (IP) stored on-chip under attack. It can be exploited as a back-
door by attackers to steal on-chip secrets or make unauthorized modification of the
IP. The needs for observability and controllability for debugging purposes seemingly
have inherent conflicts with the security requirements of an SoC. Deliberate consid-
erations must be taken in SoC design to make balance between the requirements of
post-silicon debug and system security. Ideally, the goal is to prevent access to confi-
dential or critical information from unauthorized entities and yet to allow debugging
functions from trusted entities. Toward this goal, many solutions have been proposed
by academia and industry, and it is still an open research area.
In this chapter, we introduce the basics of SoC debug circuitry and discuss the
security risks imposed by it. Countermeasures to address the security issues pro-
posed by both academia and industry and their virtues and limitations are reviewed.
The rest of the chapter is outlined as follows. Section 3.2 reviews the requirements
of SoC post-silicon debug and major components of an SoC debug architecture.
Section 3.3 discusses the security hazards induced by the DfD circuitry. Counter-
measures protecting the SoC against security hazards are reviewed in Sect. 3.4.
Section 3.5 summarizes the chapter.
Post-silicon debug can be performed at several different stages in the product life
cycle of an SoC: post-silicon validation, laboratory bring-up, application software
debugging by authorized developers, and field return evaluation. It is aiming to
uncover varieties of issues, including functional bugs, electrical errors and perfor-
mance issues in hardware design, application software bugs, and defects that escaped
manufacturing test. Compared with pre-silicon debug, the observability and control-
lability of SoC internal signals in a post-silicon debug environment is quite limited.
Therefore, the ultimate goal of DfD techniques is to allow the observation and manip-
ulation of internal circuit states via externally accessible interfaces. An SoC debug
architecture is a system comprising protocols for such observation and manipulation
and the supporting DfD circuitry. The architecture should be able to provide debug
capabilities for different debugging scenarios at different stages of SoC life cycle.
Some general requirements for such an SoC debug architecture are listed as below:
3 SoC Security and Debug 31
∙ Observability of system registers and processor states combined with the capabil-
ity to modify their contents out of the program execution flow.
∙ Ability to halt and run the processors as per need.
∙ Ability to obtain information of multiple software threads running on an SoC so
as to debug and tune the software for better performance. Provision for triggering
the collection of such information upon occurrence of a particular run-time event.
∙ Mechanism of securing the SoC from unauthorized access using DfD circuitry.
In a typical debugging environment, a user connects a host computer to the SoC
under debug. The debugger software running on the host sends debugging com-
mands to the SoC via debug interfaces following a certain protocol. The commands
trigger debugging events of the on-chip DfD instrumentation such as halting the
processors. The information gathered from the debugging events can be sent back
to the host as responses to the commands. On the SoC side, the components of the
debug architecture include the debug interface and on-chip DfD instrumentation.
The debug interface is the port on the SoC that is used to communicate with the
external debugger. It consists of the physical interface (external pins of the SoC)
and hardware implementation of the standard communication protocol for receiving
debug commands and sending the required response. We introduce three commonly
used debug interfaces as follows.
BDM
SWD
into header, response and data, with the data being skipped if the interface is not
ready. SWD provides full access to the debug and trace functionality on an SoC. It
provides the communication channel to get access to the internal debug bus in an
ARM CoreSight compliant system. SWD also provides simple parity checking for
error detection. SWD is present in most ARM-based SoCs.
JTAG
IEEE Std. 1149.1, Standard Test Access Port and Boundary-Scan Architecture [26],
which came out from recommendations of the joint test action group (JTAG), was
originally proposed to allow effective testing of the interconnections between chips
on a board. Mostly referred to as JTAG, it defines a device architecture comprising
the following components as illustrated in Fig. 3.1:
∙ A Test Access Port (TAP) that includes four mandatory pins-Test Data In (TDI),
Test Data Out (TDO), Test Mode Select (TMS), and Test Clock (TCK)- and one
optional asynchronous Test Reset (TRST) pin.
∙ A series of boundary-scan cells on the device primary input and primary out-
put pins, connected internally to form a serial boundary-scan register (Boundary
Scan).
∙ An n-bit (n ≥ 2) instruction register (IR), holding the current instruction.
∙ A TAP controller, which allows instructions to be shifted into the IR and data to
be shifted into the Boundary Scan (test data register). State transitions of the TAP
controller are controlled by the value of TMS on the rising edge of TCK.
∙ A 1-bit bypass register (Bypass).
∙ An optional 32-bit identification register (Ident) that can be loaded with a perma-
nent device identification code.
At any time, only one register can be connected from TDI to TDO (e.g., IR,
Bypass, Boundary-scan, Ident, or even some appropriate register inside the core
logic). The selected register is determined by the decoded output of the IR. IEEE
Std. 1149.1 defines several mandatory instructions including BYPASS, EXTEST,
and SAMPLE, and optional instructions such as RUNBIST and INTEST. It also
allows adding custom instructions to the controller to aid in configuration, test, or
debug.
Although JTAG was originally proposed for board test, it was being exploited for
other purposes such as post-silicon debug. In early days, its utility was deployed to
support access to chips for in-circuit emulation (ICE), albeit often with additional
pins for proprietary signals [30]. Chip designers had been creative in leveraging the
JTAG capabilities for debug. A few examples of JTAG debug capabilities are listed
as follows [39]:
∙ Loading an internal counter used as a breakpoint
∙ Shadow capturing key registers (with a SAMPLE-like function)
∙ Masking or overwriting key registers (with an EXTEST-like function)
∙ Replacing data in key registers (with an UPDATE-like function)
∙ Selection of scan dump mode (enabling scan-out)
The use of the JTAG TAP as a debug interface was first standardized by NEXUS
5001 (although still requiring additional signaling for many cases) [30]. Today,
thanks to its ubiquity and extensibility, the JTAG TAP is one of the most widely
used debug interfaces. For example, ARM Coresight debug architecture supports
JTAG as the physical interface to get access to its Debug Access Port (DAP).
The principal purpose of on-chip DfD instrumentation is to fulfill all the post-silicon
debug requirements mentioned earlier without noticeable performance impact on
the SoC. Those requirements can be fulfilled by different types of instrumenta-
tion circuitry. For example, observation of system registers and modification of
their contents can be realized by inserting scan chains, which was originally a
design-for-test (DfT) technique. Scan chain insertion replaces the normal flip-flops
with scan flip-flops at the SoC design phase. The scan flip-flops act like normal
flip-flops in the functional mode and can be connected as a shift register chain (scan
34 W. Chen et al.
chain) in the test mode. By scan chain insertion targeting the key registers on the
SoC, the important system states can be controlled and observed.
Another example is halting the processors using the hardware watchpoint/
breakpoint. One of the most common methods of debugging is halting the proces-
sors or getting system states at a particular point of code execution. One approach
to realize this is using software breakpoints, where an instruction is inserted in the
code stored on RAM so that the processor will halt when it executes the inserted
instruction. However, the software breakpoint cannot work when debugging the code
from ROM, which does not allow modifications of the code. In this case, hardware
watchpoint/breakpoint is essential for supporting the debugging functionality. Hard-
ware watchpoint support is implemented in the form of watchpoint registers, which
are programmed with values of address, control and data signals at which a watch-
point should occur. Comparison and mask circuitry compares the current values of
the signals with that programmed in the watchpoint register and generates an out-
put in case of a match, indicating the occurrence of a watchpoint. When a processor
encounters a watchpoint, usually a trace message is emitted or the system state at that
point gets reflected on the debug software. Watchpoints can be programmed to act
as breakpoints, which halt the processor when the program counter reaches a certain
address.
The two example features mentioned so far are primarily concerned with the
observation and control of system state at a single point of time. For complex debug-
ging scenarios such as those involving multiple threads, the user needs to obtain
information on part of the system states in a contiguous period of time. Such infor-
mation is referred to as traces and the collection of traces is realized by the tracing
mechanism. Tracing instrumentation in the SoC captures and compresses the state
data in real time upon triggering, to form traces. Then the traces can be made avail-
able to external debuggers either via a trace port or by being stored in an embedded
trace memory that can be read offline.
The implementation of on-chip DfD instrumentation varies from one SoC design
to another. There have been industrial efforts to establish standards for common
DfD components that can be reused across different SoC implementations. These
efforts result in several popular hardware debug architectures and ecosystems, such
as Nexus and ARM Coresight, which standardize the DfD components and their
communication protocols. We will take the ARM Coresight Debug Architecture as
an example to illustrate DfD components commonly implemented in today’s SoC
debug architecture.
CTM
∙ Debug Access Port (DAP): The debug access port (DAP) acts as a bridge between
the external debug interface and multiple core domains and memory mapped
peripherals on the SoC. Each DAP has a debug port (DP), which serves as a mas-
ter device on the DAP bus. External debuggers send debug commands to DP via
interfaces such as a full-fledged JTAG port or a reduced-pin-count SWD port. The
debug commands are then translated as read or write transactions sent to access
ports (AP), which act as slave devices on the DAP bus. As shown in Fig. 3.2, APs
can be connected to system buses (e.g., AXI) or peripheral buses (e.g., Debug
APB), acting as bus masters, thus providing memory-based access to SoC com-
ponents. In addition, an AP can also be connected to an on-chip JTAG scan chain.
∙ ROM Table: The ROM table, as part of DAP, lists the memory mapped addresses
of all CoreSight components present in an SoC. It is to be noted that one ROM
table can point to another ROM table. The ROM table is used for discovery of
on-chip Coresight debug components by the external debugger.
∙ EmbeddedICE: The processor debug and monitor features can vary on differ-
ent processors. Watchpoints and breakpoints are among the most typical ones.
EmbeddedICE is a Coresight macrocell containing watchpoint control and status
registers to facilitate watchpoint functionality on ARM cores which can also act
as breakpoints when debugging from ROM or Flash.
∙ Cross Triggerring: Cross triggering refers to triggering a particular operation in
one debug component from a debug event that happened in another debug compo-
nent. It is essential when debugging complex interactions among multiple cores
[42]. In Coresight, wherever there are signals to sample or drive, a cross trigger
interface (CTI) is used to control the selection of signals that are trigger sources
or trigger targets. Most systems will implement a CTI per processor, and at least
one CTI for system-level components. The CTIs in the system are interconnected
36 W. Chen et al.
using a cross trigger matrix (CTM) which broadcasts the trigger from the CTI to
all other CTIs, to synchronize the operations among different components.
∙ Trace Sources The trace data can be collected from different sources. One impor-
tant source is the processor traces, which consist of mainly program flow traces
and sometimes data traces. Processor trace capturing is implemented by embed-
ded trace macrocells (ETM) or program trace macrocells (PTM). Another source
is the instrumentation traces or system traces, which are driven by instrumented
messaging in the application software, for capturing the active contexts in the sys-
tem. It is implemented with system trace macrocells (STM).
∙ Trace Interconnect: The AMBA trace bus (ATB) protocol is defined for carrying
the trace around the SoC. One advantage of using a standard trace bus protocol
is that a small set of modular components can be used to form sophisticated trace
infrastructure. These components include replicators and funnels for manipulating
data streams, and trace buffers. For example, in Fig. 3.2 the trace funnel combines
the trace data from ETM of two cores and the replicator replicates the trace from
a single STM and sends them to two trace sinks (TPIU and ETB).
∙ Trace Sinks: A trace sink is the terminal CoreSight component in a trace intercon-
nect. A system can have more than one trace sink, configured to collect overlapping
or distinct sets of trace data. Trace streams are stored in on-chip trace buffers called
embedded trace buffers (ETB), sent through an interface called trace port interface
unit (TPIU) to be stored in an off-chip buffer to be read by external debuggers, or
routed to shared system memory.
root of system security, e.g., configuration fuses, chip unique ID, and the secure
boot firmware.
In some circumstances, when integrity cannot be fully ensured, an alternative prop-
erty called authenticity is a must, which requires detecting whether the integrity of
an asset has been compromised. In this case the content of the asset can be altered
by an attacker, but the defender will be able to detect the alteration before the asset
is used and thus mitigate the risks of further loss.
To satisfy the security requirements, chip architects and designers propose various
security policies to defend against the possible risks of information leak or compro-
mised integrity. However, the enforcement of security policies requires diverse com-
ponents across the entire system to coordinate with each other in a seamless manner,
which is very difficult to implement correctly. Moreover, security risks are aggra-
vated by the existence of debug components mentioned in Sect. 3.2. The increased
observability and controllability of the internal states of the SoC enabled by the
debug circuitry inherently conflict with the requirements of preserving confidential-
ity and integrity. Attackers can exploit the debug access to obtain and affect the inter-
nal states of the SoC, which might reveal the secrets stored on-chip or compromise
integrity of the IPs.
The security hazards induced by SoC debug circuitry fall into two categories. The
first is related to the extensive use of external debug interfaces as a mechanism to
transport configuration data to the SoC. This capability can be abused by a mali-
cious entity to reprogram the firmware or the system configuration. This type of
hazard mostly violates the integrity requirement. The most well-known example is
JTAG, which can be used by attackers to upload corrupted firmware in flash mem-
ories. The second can be attributed to the enhanced controllability and observabil-
ity of the on-chip DfD instrumentation, often accompanied by the communication
channel provided by external debug interfaces. The selected signals to be controlled
and observed by DfD instrumentation are usually related to important system states.
Simply snooping those signals themselves sometimes reveals secret information on
the SoC. Moreover, attackers can manipulate the control and data flow at their will
to reach system states that might leak confidential information. This type of haz-
ard usually violates the confidentiality requirement. A notable example is a category
of attacks called scan-based attacks, which exploit the internal scan chain to derive
secret keys used in cryptographic engines.
Of course, the two types of hazards can often be jointly exploited. For example,
attackers might first try to hack the secret key for verifying signed firmware and
then use the key to sign malicious code and overwrite the embedded flash with the
malicious code. In this example, not only confidentiality and integrity but also the
authenticity requirement is violated. Also note that exploits of on-chip DfD instru-
38 W. Chen et al.
mentation do not always need to go through external debug interfaces. Attack vectors
injected from the supply chain such as hardware Trojans and malicious third-party
IPs can also leverage the on-chip instrumentation to aggravate their attacks.
In this chapter, we will review two exemplary security hazards induced by DfD
circuitry: firmware hazards and scan hazards. These two are the most well-known in
publications. The review by no means includes all the DfD induced security hazards
but suffices to illustrate how the hazards would be incurred.
Firmware Hazards
Scan Hazards
Scan chain insertion is a widely used DfT technique for SoC testing as it provides
full observability and controllability of state elements included in the scan chain
and tremendously reduces the complexity of test generation for sequential circuits.
Furthermore, the scan chains can be connected to the JTAG interface to provide on-
chip debug capability in field [27]. Security hazards induced by the scan-based DfT
technique fall into two categories, the observability and controllability ones [22].
3 SoC Security and Debug 39
Fig. 3.4 Illustration of scan-based attacks on symmetric-key cryptography and public-key cryp-
tography [14]
It is generally difficult to figure out the key by observing the plain-text and the
cipher-text since there are many iterations in between. However, it is relatively eas-
ier to infer the key by observing the input data and the intermediate results of one
iteration. Therefore, a scan-based attack on symmetric-key cryptographic engines
focuses on retrieving the intermediate value resulted from the first iteration. The
attacker chooses a plain-text and uses the flow shown in Fig. 3.3 to collect the inter-
mediate value after the first iteration. The attacker repeats this process X times so that
enough information can be collected to reveal the key. Attacking public-key crypto-
graphic engines is slightly different because the attacker has to collect intermediate
values in different iterations to figure out different bits of the key. X pairs of plain-
text and intermediate result are collected for deriving the first bit of the key. Then
the other bits of the key are revealed by shifting out intermediate states of further
iterations.
With knowledge of the cryptographic algorithms, the attackers can simulate the
intermediate results based on hypothetical keys. By comparing the retrieved interme-
diate results with the simulated ones, the attackers can confirm the correct hypothesis
and thus figure out the key. For example, in public-key cryptography, the intermedi-
ate result of the first iteration depends on the first bit of the key, which can be 0 or 1.
The attacker can simply simulate the intermediate results of a plain-text based on
both hypotheses and see which one matches the actual intermediate result. If both
match the intermediate result, then the attacker cannot tell the key bit based on this
(plain-text, intermediate result) pair and need to use other pairs to determine the key
bit. The attack on symmetric-key cryptography is similar though the attacker needs
3 SoC Security and Debug 41
to simulate more hypotheses at once since the intermediate result depends on more
key bits.
The first scan-based attack in the literature [43] was proposed to break a DES
block cipher. By loading 64 pairs of known plain-texts with one-bit difference in
the functional mode and then scanning out the internal states in the test mode, they
first determine the positions of all scan elements in the scan chain. Then only three
chosen plain-texts are applied to recover the first-round keys (48 bits). And similar
attacks can be performed at the second and third rounds to recover the rest of key bits.
Later, the same authors proposed a differential scan-based attack on the AES engine
[44]. After that, scan-based attacks have been found effective on stream ciphers [31],
RSA [41], and ECC [32]. Publications also show that the scan-based attacks can
be performed in the presence of the advanced DfT structures such as partial scan,
X-masking, and X-tolerant architecture thus making the attacks rather practical to
perform [12, 13, 15, 16, 18, 28].
Most state-of-the-art scan-based attacks rely on the ability of switching from the
functional mode to the test mode under the assumption that the data in the scan flip-
flops can be preserved intact. Therefore, the designers can develop a countermea-
sure which injects random noise to the scan chain whenever there is a switch from
functional mode to test mode, thwarting all these attacks. One simple form of this
countermeasure is resetting the data in the scan elements whenever there is a switch
from the functional mode to the test mode [22]. This solution can defend most of the
scan-based attacks, but it also compromises the debugging capability since some-
times, the authorized users need this capability for debugging purposes. Moreover,
a recent work proposed a new scan-based attack using only the test mode [1]. The
initial attack analysis shows that only 375 test vectors are sufficient to reveal the 128-
bit AES secret key with negligible time complexity. Whether this type of attack can
succeed in the presence of advanced DfT structures is still under investigation.
To prevent the security hazards induced by SoC debug components, one might sim-
ply attempt to disable the debug access after manufacturing tests and validation or
customer configuration. This can be accomplished by blowing a fuse associated with
the debug interface so as to disable access via the debug interface. The TI MSP430
Microcontroller is one example where the JTAG interface can be disabled [11]. The
problem with this approach is that it compromises the capability to debug the SoC
for purposes such as field return evaluation. To regain access to the debug interface
after the fuse is blown, one must resort to complex and expensive techniques. Typ-
ically, using focused ion beam (FIB) modification to blow a counterpart fuse in the
SoC can regain access to the debug interface. However, it has several problems. First,
42 W. Chen et al.
the equipment for performing FIB modification is expensive and complicated and the
modification is often unsuccessful. Second, FIB modification often requires destruc-
tive de-encapsulation of the IC device and thus can prevent future evaluation. Third,
FIB modification also results in an override of the customer configuration, therefore
preventing subsequent access to such configuration information for further analysis.
Finally, FIB modification is also relatively temporary due to metal migration, which
at some point reconnects the blown counterpart fuse and returns the IC device to a
state where the debug interface is inaccessible [9].
Even if the access to debug interfaces is disabled, much of the on-chip DfD instru-
mentation still remains after production. Though it is possible to disable the on-chip
instrumentation, it will change the power/performance/energy profile of the produc-
tion system from what has been used for validation. And again, the instrumentation
is critical to field return evaluation or making in-field patches.
Though we should not permanently disable all debug access after production,
restrictions should be in place in accordance with the life stage of the SoC. Design
trade-off must be made between the debugging capabilities and security of an SoC
along its product life cycle. After an SoC is designed, it has to be manufactured,
tested, assembled, built into a product and shipped to customers. As it progresses
toward a later stage of its product life cycle, the needs for protection gradually out-
weigh the needs for debug access, and therefore the SoC should be configured in
a way that more restrictions are imposed on the debug access and fewer debug-
ging features are allowed. Such configurations should be irreversible under most
circumstances. In some situations such as field return evaluation, the protection of
the device needs to be lowered temporarily. However, this feature should not rede-
fine the default protection. Furthermore, this feature ought to be available to trusted
entities only with restrictive authentication. Following these principles, a pragmatic
designer would identify the security and debug requirements for each stage or mode
and implement corresponding access control of the debug components (most likely
the debug interface) based on authentication. We will give a historical review of the
solutions proposed by both academia and industry along this line.
One of the first approaches utilizes key-based locking/unlocking mechanism for con-
trolling the access to JTAG functionality [33]. The user can unlock JTAG by shifting
in the correct key that matches the secret key stored on the chip boundary. Other-
wise, JTAG bypasses all the data from TDI to TDO. The process of shifting in the
key is un-encrypted, which is vulnerable to eavesdropping on the JTAG communi-
cation. In addition, it is often too restrictive to have only two levels of access. In
[10], authors proposed to reuse the flip-flops in the boundary cells as a linear shift
feedback register to generate the key to reduce the area overhead for key storage.
The methods proposed in [23, 29] are similar in that they use key-based lock-
ing/unlocking mechanism to secure the JTAG scan chains. They allow the user to
3 SoC Security and Debug 43
freely shift the contents of the scan chains without the correct key; however, the
bits themselves are in a random order. The order is generated by a random number
generator, and the correct order can be only restored using a secret key. Instead of
restricting the access to the debug interface, this approach restricts the debug access
by giving incorrect outputs. It still allows users to supply data to the scan chain,
which could be exploited for fault injection. It is also a binary security mechanism
since it either gives full access to the scan chains or scrambles the order.
To prevent eavesdropping attacks, a simple authentication scheme based on cryp-
tograpic hash and shared key can suffice. The SoC device and the authorized user
shall share a private key. For each test instruction (or debug command) to be issued,
the user can use a cryptographic hash algorithm such as Secure Hash Algorithm
(SHA) to generate a signature based on the test instruction and the shared key alto-
gether. Then the test instruction and signature are sent to the SoC. The SoC can run
the same hash on the received test instruction and the shared key to generate a signa-
ture and verify whether it matches the received signature. This approach eliminates
the eavesdropping attack; however, it is still vulnerable to replay attack, in which
an attacker monitors the message communication, duplicates, and replays the whole
message (test instruction + signature) to spoof the SoC.
This replay hazard can be thwarted by the challenge/response authentication-
based scheme proposed in [11], which utilizes SHA-256 cryptographic hash engine,
shared secret key and random number generator to generate a challenge per commu-
nication session and verify the response. The SoC first generates a random number
and sends it to the user as a challenge. The user uses the cryptographic hash to gen-
erate the signature based on the test instruction, shared key and challenge altogether,
and then sends the instruction and signature as the response to the challenge. The SoC
device can verify whether received signature is generated based on the same random
number and the correct shared key, and thus verify the authenticity of the user. In
addition, the authors proposed that a set of instructions should be public while others
that can get access to sensitive information should be private. The public instructions
do not require keys while each private instruction must have an independent secret
key, thus making up different security levels of debug access.
The security of authentication based on shared secret keys largely relies on the
confidentiality of the key. Reliable key management is a challenging problem. If
each device of an SoC product family shares the same key, compromising the key of
a single device would render the protection of all devices useless. If each device is
assigned a unique key, the device provider needs to maintain a database recording the
key corresponding to each device unique ID, which is costly and thus undesirable.
In [8], the authors proposed a three-entity authentication scheme, utilizing a sep-
arate secure server to authenticate the user for improved security. The device will
generate a challenge every time upon the user’s request to access the debug port.
The user is connected to the secure server and relays the challenge to the server. The
secure server’s role is user authorization and verification upon reception of a chal-
lenge and generation of response to a given challenge upon successful verification.
The challenge/response algorithm is based on ECC. The device owns a public-key
and the secure server holds the private key. This approach offers a higher level of
44 W. Chen et al.
security by hiding the key from the user, and supports various security policies for
user authentication and authorization on the server. However, the need for a server to
authenticate the user on each debug interface access requires continuous communi-
cation with the server, disabling debug access and lowering overall availability when
network is unavailable.
To improve the availability over [8], authors of [34] proposed a user authentica-
tion scheme in which the server issues to the authenticated user a credential with
which to authenticate oneself to a particular device. The device verifies the user sub-
mitted credentials and opens the debug port to the users with valid credentials. This
approach eliminates the need for networking with the server after a credential has
been issued. The same authors later proposed an improved solution by incorporating
the the maximum number of authentication allowed for one credential and mecha-
nisms to deal with expired credentials [35].
The authors of [36, 37] formalized the concept of multilevel secure JTAG archi-
tecture and provided detailed hardware implementation specifications for enforcing
the multilevel policy. Each debug instruction is assigned an access level and each user
will be assigned a permission level after authentication. A user with a permission
level Pi can execute any instruction with an access level A if A ≤ Pi . The hardware
implementation of such an architecture is composed of two primary components,
the secure authentication module (SAM) and the access monitor (AM). SAM’s func-
tions are to provide an unlocking communication protocol, to set the user level, and
to allow modification of access levels. The AM prevents potentially harmful data
from being loaded into the scan chain. This work depicts a practical implementation
scheme for multilevel secure debug access, where SAM serves as the security policy
decision point and the AM acts as the security policy enforcement point.
Besides the academic proposals, industrial solutions for securing the debug access
have also been offered. The ARM TrustZone architecture extension [2] is a system-
wide approach to SoC security. JTAG port is a security-related block and thus is
part of the architecture extension. Trustzone imposes hardware-based restrictions
on JTAG operations, such as restricted or nonrestricted debug access, which can be
configured by blowing fuses. Nonrestricted debug access is only used in the devel-
opment phase of the product. The device delivered to customers is in the restricted
mode, in which only the basic, noninvasive debugging functions of JTAG are avail-
able. However, the noninvasive debugging functions of JTAG might still be exploited
as a backdoor to on-chip secrets. Moreover, Trustzone does not address the need of
offering different levels of debug access to different users at different phases.
Freescale Semiconductor introduced the secure JTAG controller since i.MX31
and i.MX31L product families [5]. Freescale secure JTAG controller provides several
configurations determined by a set of fuses. The secure JTAG controller in the latest
i.MX6 processor allows four different JTAG security modes [20]:
∙ Mode 1: No JTAG—Maximum security. All JTAG features are permanently
blocked.
∙ Mode 2: No Debug—High security. All security sensitive JTAG features are per-
manently disabled.
3 SoC Security and Debug 45
The existing solutions are at best pragmatic workarounds to combat the security haz-
ards induced by DfD components. A more fundamental approach is to incorporate
the debug access as part of the security requirement and architecture definition.
Besides confidentiality and integrity, there is another less frequently emphasized
security requirement called availability. It requires that an asset must be accessi-
ble to an entity that requires such access per correct system functionality. The debug
requirement can be viewed as an availability requirement [38]. Such a perspective to
view security and debug requirements as an integral one would be more helpful to
developing a comprehensive solution to address the trade-off between them.
The processes for defining the security and debug requirements and architectures
are complex and involving multiple stake-holders. An effective solution for address-
46 W. Chen et al.
ing security and debug trade-off should address a comprehensive set of aspects. An
incomplete set of key aspects highlighted in [38] include:
∙ Centralized Architecture: The current security and debug architectures are largely
decentralized, which makes it difficult to implement them correctly. The policy
decision making should be centralized so that it can be effectively introspected for
possible violations of the requirements.
∙ Late Variability: The debug requirements are often subject to late changes, which
can happen during SoC integration or even after a silicon stepping. An effective
solution should allow easy adaptation to the changing requirements and quick val-
idation of the security impacts of the DfD changes.
∙ Reusability: The current solutions are ad-hoc and thus error-prone. A systematic
design methodology with reusable components are essential for a viable solution.
The authors of [38] had a preliminary proposal of a centralized, firmware-controlled
framework to fulfill the requirements of secure post-silicon debug. However, this still
remains an open research area that requires extensive efforts.
3.5 Summary
The capabilities to debug an SoC at post-silicon stages are essential for SoC devel-
opment. SoC debug circuitry, while offering increased observability and control-
lability of the internal states of the circuit, can be a backdoor for security attacks.
Design decisions must be made regarding the trade-off of the debugging capabili-
ties and the security protection at different stages of the SoC product life cycle. In
this chapter, we review the common SoC debug architectures and give a compre-
hensive analysis of the known security hazards induced by SoC debug access. We
review the published solutions for preventing debug access from untrusted entities
while preserving debugging functionality, most of which implement access control
mechanisms based on authentication of trusted entities.1
References
1. Ali, S., Sinanoglu, O., Saeed, S., Karri, R.: New scan-based attack using only the test mode. In:
2013 IFIP/IEEE 21st International Conference on Very Large Scale Integration (VLSI-SoC),
pp. 234–239 (2013)
2. ARM: Designing with trustzone hardware requirements. ARM whitepaper (2005)
3. ARM: Coresight technical Introduction. ARM whitepaper (2013)
1
Freescale and the Freescale logo are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat.
& Tm. Off. All other product or service names are the property of their respective owners. ARM and
Cortex are trademark(s) or registered trademarks of ARM Ltd or its subsidiaries. 2014 Freescale
Semiconductor, Inc.
3 SoC Security and Debug 47
4. Ashfield, E., Field, I., Harrod, P., Houlihane, S., Orme, W., Woodhouse, S.: Serial wire debug
and the coresighttm debug and trace architecture (2006)
5. Ashkenazi, A.: Security features in the i.mx31 and i.mx31l multimedia applications processors.
Freescale Semiconductor Inc. (2006)
6. Bennetts, B.: IEEE 1149.1 JTAG and boundary scan tutorial. http://www.asset-intertech.com/
Products/Boundary-Scan-Test/e-Book-JTAG-Tutorial (2012)
7. Biham, E., Shamir, A.: Differential fault analysis of secret key cryptosystems. In: Proceed-
ings of the 17th Annual International Cryptology Conference on Advances in Cryptology.
CRYPTO ’97, pp. 513–525. Springer, London (1997)
8. Buskey, R., Frosik, B.: Protected JTAG. In: 2006 International Conference on Parallel Process-
ing Workshops. ICPP 2006 Workshops, pp. 8–414 (2006)
9. Case, L., Ashkenazi, A., Chhabra, R., Covey, C., Hartley, D., Mackie, T., Muir, A., Redman,
M., Tkacik, T., Vaglica, J., et al.: Authenticated debug access for field returns. https://www.
google.com.ar/patents/US20100199077 (2010). US Patent App. 12/363,259
10. Chiu, G.M., Li, J.M.: A secure test wrapper design against internal and boundary scan attacks
for embedded cores. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 20(1), 126–134 (2012)
11. Clark, C.: Anti-tamper JTAG TAP design enables DRM to JTAG registers and P1687 on-
chip instruments. In: 2010 IEEE International Symposium on Hardware-Oriented Security and
Trust (HOST), pp. 19–24 (2010)
12. Da Rolt, J., Das, A., Di Natale, G., Flottes, M., Rouzeyre, B., Verbauwhede, I.: A scan-based
attack on elliptic curve cryptosystems in presence of industrial design-for-testability structures.
In: 2012 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotech-
nology Systems (DFT), pp. 43–48 (2012)
13. Da Rolt, J., Das, A., Di Natale, G., Flottes, M.L., Rouzeyre, B., Verbauwhede, I.: A new scan
attack on RSA in presence of industrial countermeasures. In: Proceedings of the Third Interna-
tional Conference on Constructive Side-Channel Analysis and Secure Design. COSADE’12,
pp. 89–104. Springer, Berlin (2012)
14. Da Rolt, J., Das, A., Di Natale, G., Flottes, M.L., Rouzeyre, B., Verbauwhede, I.: Test versus
security: past and present. IEEE Trans. Emerg. Top. Comput. 2(1), 50–62 (2014). doi:10.1109/
TETC.2014.2304492
15. Da Rolt, J., Di Natale, G., Flottes, M.L., Rouzeyre, B.: Are advanced DFT structures sufficient
for preventing scan-attacks? In: VLSI Test Symposium (VTS), 2012 IEEE 30th, pp. 246–251
(2012)
16. DaRolt, J., Di Natale, G., Flottes, M.L., Rouzeyre, B.: Scan attacks and countermeasures in
presence of scan response compactors. In: European Test Symposium (ETS), 2011 16th IEEE,
pp. 19–24 (2011)
17. Dishnet: In house made with locking script. http://www.satcardsrus.com/dish_net%203m.htm
(2012)
18. Ege, B., Das, A., Gosh, S., Verbauwhede, I.: Differential scan attack on AES with x-tolerant
and x-masked test response compactor. In: 2012 15th Euromicro Conference on Digital System
Design (DSD), pp. 545–552 (2012)
19. Freescale: Introduction to HCS08 background debug mode (2006)
20. Freescale: i.mx 6solox applications processor reference manual (2014)
21. Greenemeier, L.: iphone hacks annoy AT&T but are unlikely to bruise apple. Scientific Amer-
ican (2007)
22. Hely, D., Bancel, F., Flottes, M.L., Rouzeyre, B.: Test control for secure scan designs. In: Test
Symposium, 2005. European, pp. 190–195 (2005)
23. Hely, D., Flottes, M.L., Bancel, F., Rouzeyre, B., Berard, N., Renovell, M.: Scan design and
secure chip [secure IC testing]. In: On-Line Testing Symposium, 2004. IOLTS 2004. Proceed-
ings. 10th IEEE International, pp. 219–224 (2004)
24. Homebrew development wiki JTAG-hack. http://dev360.wikia.com/wiki/JTAG-Hack (2012)
25. IEEE standard for in-system configuration of programmable devices: IEEE Std 1532–2001,
pp. 1–130 (2001)
48 W. Chen et al.
26. IEEE standard test access port and boundary scan architecture. IEEE Std 1149.1-2001,
pp. 1–212 (2001)
27. Josephson, D., Poehhnan, S., Govan, V.: Debug methodology for the Mckinley processor. In:
Test Conference, 2001. Proceedings. International, pp. 451–460 (2001)
28. Kapur, R.: Security vs. test quality: are they mutually exclusive? In: Test Conference, 2004.
Proceedings. ITC 2004. International, pp. 1414– (2004)
29. Lee, J., Tehranipoor, M., Patel, C., Plusquellic, J.: Securing scan design using lock and key
technique. In: 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI
Systems, 2005. DFT 2005, pp. 51–62 (2005)
30. Ley, A.: Doing more with less—an IEEE 1149.7 embedded tutorial: standard for reduced-pin
and enhanced-functionality test access port and boundary-scan architecture. In: Test Confer-
ence, 2009. ITC 2009. International, pp. 1–10 (2009). doi:10.1109/TEST.2009.5355572
31. Liu, Y., Wu, K., Karri, R.: Scan-based attacks on linear feedback shift register based stream
ciphers. ACM Trans. Des. Autom. Electron. Syst. 16(2), 20:1–20:15 (2011)
32. Nara, R., Togawa, N., Yanagisawa, M., Ohtsuki, T.: Scan-based attack against elliptic curve
cryptosystems. In: Design Automation Conference (ASP-DAC), 2010 15th Asia and South
Pacific, pp. 407–412 (2010)
33. Novak, F., Biasizzo, A.: Security extension for IEEE Std 1149.1. J. Electron. Test. 22(3), 301–
303 (2006)
34. Park, K., Yoo, S.G., Kim, T., Kim, J.: JTAG security system based on credentials. J. Electron.
Test. 26(5), 549–557 (2010)
35. Park, K.Y., Yoo, S.G., Kim, J.: Debug port protection mechanism for secure embedded devices.
J. Semicond. Technol. Sci. 12(2), 241 (2012)
36. Pierce, L., Tragoudas, S.: Multi-level secure JTAG architecture. In: 2011 IEEE 17th Interna-
tional On-Line Testing Symposium (IOLTS), pp. 208–209 (2011)
37. Pierce, L., Tragoudas, S.: Enhanced secure architecture for joint action test group systems.
IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 21(7), 1342–1345 (2013)
38. Ray, S., Yang, J., Basak, A., Bhunia, S.: Correctness and security at odds: post-silicon
validation of modern SoC designs. In: Design Automation Conference (DAC), 2015 52nd
ACM/EDAC/IEEE, pp. 1–6 (2015)
39. Rearick, J., Eklow, B., Posse, K., Crouch, A., Bennetts, B.: IJTAG (internal JTAG): a step
toward a DFT standard. In: Test Conference, 2005. Proceedings. ITC 2005. IEEE International,
pp. 8–815 (2005)
40. Rolt, J.D., Natale, G.D., Flottes, M.L., Rouzeyre, B.: A novel differential scan attack on
advanced DFT structures. ACM Trans. Des. Autom. Electron. Syst. 18(4), 58:1–58:22 (2013)
41. Ryuta, N., Satoh, K., Yanagisawa, M., Ohtsuki, T., Togawa, N.: Scan-based side-channel attack
against RSA cryptosystems using scan signatures. IEICE Trans. Fundam. Electron. Commun.
Comput. Sci. 93(12), 2481–2489 (2010)
42. Tang, S., Xu, Q.: In-band cross-trigger event transmission for transaction-based debug. In:
Design, Automation and Test in Europe, 2008. DATE ’08, pp. 414–419 (2008)
43. Yang, B., Wu, K., Karri, R.: Scan based side channel attack on dedicated hardware imple-
mentations of data encryption standard. In: Test Conference, 2004. Proceedings. ITC 2004.
International, pp. 339–344 (2004)
44. Yang, B., Wu, K., Karri, R.: Secure scan: A design-for-test architecture for crypto chips. IEEE
Trans. Comput.-Aided Des. Integr. Circuits Syst. 25(10), 2287–2293 (2006)
Chapter 4
IP Trust: The Problem
and Design/Validation-Based Solution
4.1 Introduction
Upon the request for trusted IP cores, various IP protection and certification meth-
ods at the pre-silicon stage have been recently developed. In this chapter, most of
these approaches will be introduced including hardware locking/encryption, FPGA
bitstream protection, theorem proving, and equivalence checking.
The rest of the chapter is organized as follows: Sect. 4.2 discusses various hard-
ware locking/encryption methods for preventing various threats to IP cores. For a
better explanation, we divide these methods into three categories (i) combinational
logic locking/encryption, (ii) finite state machine locking/encryption, and (iii) lock-
ing using reconfigurable components. In this section, we also discuss FPGA bit-
stream protection methods. Section 4.3 primarily discusses the existing equivalence
checking and theorem proving methods for ensuring trustworthiness of soft IP cores.
Finally, Sect. 4.4 concludes the chapter.
Fig. 4.2 Different stages at which hardware locking/encryption methods are applied
locking [14–18], (ii) finite state machine (FSM) locking [19–28], and (iii) locking
using reconfigurable components [29–31]. The combinational logic locking method
includes cryptographic algorithm for locking and logic encryption. The FSM lock-
ing methods include hardware obfuscation techniques and active hardware metering.
The FPGA protection methods focus on securing the bitstream. In the rest of this
section we will describe the threat model and each prevention method in details.
Security threats to an IP/IC vary depending on the location of the adversary in the
supply chain (see Fig. 4.3). Below we briefly explain these threats to an IP/IC:
∙ Cloning: Adversary creates an exact copy or clone of the original product and
sell it under a different label. To carry out cloning of ICs, an attacker should be
either a manufacturer, system integrator, or a competing company equipped with
necessary tools.
∙ Counterfeiting: When cloned products are sold under the label of the original ven-
dor, without their authorization, it is called counterfeiting. This can be performed
by an attacker in a manufacturing facility or by companies having capability to
manufacture replicas of the original chip.
∙ IC Overbuilding: Another threat to an IC designer is overbuilding. In overbuilding,
manufacturer or system integrator fabricates more IC than authorized.
∙ IP Piracy: In IP piracy, a system integrator steals the IP to claim its ownership or
sell it illegally.
∙ Reverse Engineering: By analyzing an existing IC, manufacturers, system integra-
tors, or companies having reverse engineering capabilities, can bypass security
52 R.G. Dutta et al.
way that attackers needed exponential number of brute force attempts to decipher
the key. Compared to random insertion, this procedure incurred less area overhead
as it required less number of XOR/XNOR as key gate. Another limitation of the
combinational logic locking method of [14] was that inappropriate key input did not
affect the output of the circuit.
Logic encryption was proposed in [16, 17], which used conventional fault simu-
lation techniques and tools to guide the XOR/XNOR gate insertions and produced
wrong output with 50 % Hamming distance between the correct and wrong outputs
for an invalid key. This method masked the functionality and the implementation of
a design by inserting key gates into the original design. To prevent collusion attack,
physical unclonable functions (PUFs) were used to produce unique user keys for
encrypting each IC. Instead of encrypting the design file by a cryptographic algo-
rithm, the logic encryption method encrypted the hardware functionality. The perfor-
mance overhead of this method was smaller than random key gate insertion method
as it used smaller number of XOR/XNOR gates to achieve the 50 % Hamming dis-
tance.
Another combinational locking method was proposed in [18], which protected
ICs from illegal overproduction and potential hardware Trojans insertion by mini-
mizing rare signals of a circuit. The method made it harder for an attacker to exploit
rare signals of a circuit to incorporate a hardware Trojan. An encryption algorithm
modified the circuit, but preserved its functionality. The algorithm uses a probability-
based method [32] to identify signals with low controllability. Among the identified
signals, candidate signals for encryption were the ones with an unbalanced proba-
bility (signals with probability below 0.1 or above 0.9). For encryption, AND/OR
gates were inserted in paths with large slack time and unbalanced probability. The
type of gates to be inserted depended on the value of probability on the signal. When
the probability of the signal was close to 0, an OR gate was included and the cor-
responding key value was 0 and when the probability was close to 1, an AND gate
was included and the corresponding key value was 1. However, this method could
not create multiple encryption key for the same design and hence, all the IP con-
sumers of the design used the same key. Due to this limitation, it was not effective
for preventing IP piracy.
Finite state machine (FSM) locking obfuscates a design by augmenting its state
machine with a set of states. The modified FSM transit from the obfuscated states to
the normal operating states after applying the specific input sequence (aka obfusca-
tion key). The obfuscation method approximately transform the hardware design by
preserving its functionality.
The FSM-based obfuscation method protects an IP/IC from reverse engineering,
IP piracy, IC overproduction, and hardware Trojan insertion. Several variations of
this method have been proposed in the literature [14–31]. In this chapter, we discuss
54 R.G. Dutta et al.
three of its variations: (i) obfuscation by modifying gate-level netlist, (ii) obfuscation
by modifying RTL code, and (iii) obfuscation using reconfigurable logic.
Obfuscation by Modifying Gate-Level Netlist
One of the methods for obfuscating a hardware design is to insert an FSM in the gate-
level netlist [19–23]. The method in [19] obfuscated and authenticated an IP core by
incorporating structural modification in the netlist. Along with the state transition
function, large fan-in and fan-out nodes of the netlist were modified such that the
design produces undesired output until a specific vector was given at the primary
inputs. The circuit was re-synthesized after the modification to hide the structural
changes. The FSM, which was inserted into the netlist to modify the state transition
graph (STG), was connected to the primary inputs of the circuit. Depending on an
initialization key sequence, the FSM operated the IP core in either the normal mode
or the obfuscated mode. The maximum number of unique fan-out nodes (Nmax ) of
the netlist were identified using an iterative ranking algorithm. The output of the
FSM and the modified (Nmax ) nodes were given to an XOR gate. When the FSM
output was 0, the design produced correct behavior. Although the method required
less area and power overheads, it did not analyze security of the design. New meth-
ods were proposed to overcome this limitation [20–22]. A metric was developed to
quantify the mismatch between the obfuscated design and the original design [20].
Also, the effect of obfuscation on security of the design was evaluated. Certain mod-
ifications were made to the methodology of [19] such as embedding “modification
kernel function” for modifying the nodes selected by the iterative ranking algorithm
and adding an authentication FSM, which acted as a digital watermark. These modi-
fications prevented attacks from untrusted parties in the design flow with knowledge
of the initialization sequence.
The obfuscation scheme of [20] was extended in [21, 22] to prevent insertion of
trigger-activated hardware Trojans. The new method also ensured that such Trojans,
when activated in the obfuscated mode, did not affect normal operation of the cir-
cuit. To incorporate these changes, the obfuscation state space was divided into (i)
initialization state space and (ii) isolation state space [21]. On applying the correct
key at power on, the circuit transitioned from the initialization state space to nor-
mal state space. However, an incorrect key transitioned the circuit to isolation state
space from which the circuit could not return to the normal state space. Due to the
extreme rareness of the transition condition for normal state space, it was assumed
that the attacker was stuck in the isolation state space. Also, the insertion of a Trojan
with wrong observability/controllability in the obfuscation mode would increase its
detection probability at post-manufacturing testing. To further increase the proba-
bility of Trojan detection, the state space of the obfuscated mode was made larger
than the normal mode. The proposed methodology was robust enough to prevent
reverse engineering of modified netlist with large sequential circuits. Also, the area
and power overheads of the method were relatively low.
However, the methodology of [20] could not protect an evaluation version of firm
IP core. In [23], this problem was overcome by embedding a FSM in the IP netlist.
The FSM disrupted normal functional behavior of the IP after its evaluation period.
4 IP Trust: The Problem and Design/Validation-Based Solution 55
The number of cycles required to activate the FSM depended on the number of bits in
the state machine. Also, the activation probability decreased if the number of trigger
nodes were increased. This method helped in putting an expiry date on the evaluation
copy of the hardware IP. To distinguish between legally sold version of an IP and its
evaluation version containing the FSM, IP vendors either (i) used disabling key to
deactivate the FSM or (ii) provided a FSM-free version. The method structurally
and functionally obfuscated the FSM to conceal it during reverse engineering. Area
overhead of this method was directly proportional to the size of the FSM. However,
the overhead decreased with an increase in size of the original IP.
Obfuscation by Modifying RTL Code of Design
Apart from netlist, obfuscation can also be carried out in RTL code of the design
[24–27]. A key-based obfuscation approach for protecting synthesizable RTL cores
was developed in [27]. The RTL design was first synthesized into a technology inde-
pendent gate-level netlist and then obfuscated using the method of [19]. Then, the
obfuscated netlist was decompiled into RTL in such a way that the modifications
made on the netlist were hidden and the high-level HDL constructs preserved. A sim-
ple metric was presented to quantify the level of such structural and semantic obfus-
cation. This approach incurred minimal design overhead and did not affect adversely
the automatic synthesis process of the resultant RTL code. However, decompilation
removed some preferred RTL constructs and hence made the design undesirable for
certain preferred design constraints. This limitation was overcome in [26], where the
RTL code was first converted into a control and data flow graph (CDFG) and then a
key-activated “Mode-Control FSM” was inserted to obfuscate the design. The CDFG
was built by parsing and transforming concurrent blocks of RTL code. Small CDFGs
were merged to build larger ones with more nodes. This helped in better obfuscation
of the “Mode-Control FSM,” which operated the design either in normal or obfus-
cated mode. This FSM was realized in the RTL by modifying a set of host registers.
To further increase the level of obfuscation, state elements of the FSM were dis-
tributed in a non-contiguous manner inside one or more registers. After hosting the
FSM in a set of selected host registers, several CDFG nodes were modified using the
control signals generated from the FSM. The nodes with large fan-out cones were
selected for modification, as they ensured maximum change in functional behavior
at minimal design overhead. At the end of modifications on the CDFG, the obfus-
cated RTL was generated, which on powering up, initialized the design at the obfus-
cated mode. Only on the application of a correct input key, the “mode-control” FSM
transited the system through a sequence of states to the normal mode. This method
incurred low area and power overheads.
Another approach obfuscates the RTL core by dividing the overall functionality
of the design into two modes, “Entry/Obfuscated mode” and “Functional mode,”
and encoding the path from the entry/obfuscated mode to the functional mode with a
“code-word” [24]. The functionality of the circuit was divided by modifying the FSM
representing the core logic. To modify the FSM, its states were extended and divided
into entry/obfuscated mode and functional mode states. Only on the application of
the right input sequence at the entry mode, the correct Code-Word was formed, which
56 R.G. Dutta et al.
produced correct transitions in the functional mode. Unlike those methods where an
invalid key disallowed entry to the normal mode, this method always allowed entry
to the functional mode. However, the behavior in the functional mode depended on
the value of the Code-Word. This Code-Word was not stored anywhere on chip, but it
was formed dynamically during the entry mode. It was integrated into the transition
logic to make it invisible to the attacker. In this method, the length of the Code-Word
was directly related to the area overhead and security level required by the designer.
A longer Code-Word meant higher level of security against brute force attacks, but
at the cost of higher area overhead.
In [25], a key-based obfuscation method having two modes of operation, normal
mode and slow mode, was developed to prevent IP piracy on sequential circuits. This
method modified the state transition graph (STG) in such a way that the design oper-
ated in either mode depending on whether it was initialized with the correct key state.
The key state was embedded in the power-up states of the IC and was known only to
the IP owner. When the IP owner received the fabricated chip, power-up states were
reset from the fixed initial state to the key state. As the number of power-up states
was less in the design, chances of the IC being operational in the normal mode on
random initialization were significantly reduced. Moreover, powering up the design
with an incorrect initial state operated the IC in the slow mode, where it functioned
slower than the normal mode without causing significant performance difference.
This functionality prevented IP pirates from suspecting the performance degrada-
tion in the IC and the presence of key state in the design. To modify the STG, four
structural operations were performed: (i) retiming, (ii) resynthesis, (iii) sweep, and
(iv) conditional stuttering. In retiming, registers were moved in the sequential circuit
using any one of the two operations. These operations included (i) adding a regis-
ter to all outputs and deleting a register from each input of a combinational node
and (ii) deleting a register from all outputs and adding a register to each input of a
combinational node. Resynthesis restructured the netlist within the register bound-
aries whereas removal of redundant registers and logic that did not affect output
was done using sweep. Both the resynthesis and the retiming operations preserved
logical functionality of the design. Conditional stuttering involved addition of con-
trol logic to the circuit to stutter the registers under a given logic condition. On the
other hand, inverse conditional stuttering removed certain control logic. Stuttering
operations were done to obtain circuits which were cycle–accurate–equivalent. This
method mainly focused on those real-time applications which were very sensitive
to throughput. Unlike existing IC metering techniques, the secret key in this method
was implicit, thus making it act as a hidden watermark. However, the area and power
overheads of this method were higher than previous approaches.
An active hardware metering approach prevents overproduction of IC by equip-
ping designers with the ability to lock each IC and unlock it remotely [28]. In this
method, new states and transitions were added to the original finite state machine
(FSM) of the design to create a boosted finite state machine (BFSM). This structural
manipulation preserved the behavioral specification of the design. Upon activation,
the BFSM was placed in the power-up state using an unique ID which was generated
by the IC. To bring the BFSM into the functional initial state, the designer used an
4 IP Trust: The Problem and Design/Validation-Based Solution 57
input sequence generated from the transition table. Black hole states were integrated
with the BFSM to make the active metering method highly resilient against the brute
force attacks. This method incurred low overhead and was applicable to industrial-
size design.
Obfuscation Using Reconfigurable Logic
Reconfigurable logic was used in [29, 30] for obfuscation of ASIC design. In [29],
reconfigurable logic modules were embedded in the design and their final implemen-
tation was determined by the end user after the design and the manufacturing process.
This method assumed that the supply chain adversary has knowledge of the entire
design except the exact implementation of the function and the internal structure of
reconfigurable logic modules. The lack of knowledge prohibited an adversary from
tampering the logic blocks. Combining this method with other security techniques
provided data confidentiality, design obfuscation, and prevention from hardware Tro-
jans. In the demonstration, a code injection Trojan was considered which, when trig-
gered by a specific event, changed input–output behavior or leaked confidential infor-
mation. The Trojan was assumed to be injected at the instruction decoded unit (IDU)
of a processor during runtime and it was not detected by non-lock stepping concur-
rent checking methods, code integrity checker, and testing. To prevent such a Trojan
attack, instruction set randomization (ISR) was done by obfuscating the IDU. For
obfuscation, reconfigurable logic was used, which concealed opcode check logic,
instruction bit permutation, or XOR logic. This method prevented the Trojan from
monitoring an authentic computation. Moreover, Trojans which were designed to
circumvent this method no longer remained stealthy and those trying to duplicate
the IDU or modify it caused significant performance degradation. It was shown that
the minimum code injection Trojan with a 1 KB ROM resulted in an area increase
of 2.38 % for every 1 % increase in the area of the LEON2 processor.
To hide operations within the design and preserve its functionality, the original
circuit was replaced with PUF-based logic and FPGA in [30]. PUF was also used
to obfuscate signal paths of the circuit. The architecture for signal path obfuscation
was placed in a location where most flip-flops were affected most number of times.
To prevent this technique from affecting critical paths, wire swapping components
(MUX’es with PUF as select input) were placed between gates with positive slack.
The PUF-based logic and signal path obfuscation techniques were used simultane-
ously to minimize delay constraints of the circuit and maximize its security under
user-specified area and power overheads. Two types of attacks were considered: (i)
adversary can read all flip-flops, but can only write to primary inputs; (ii) adversary
can read and write to all flip-flops of the circuit [30]. It was assumed that the adver-
sary has complete knowledge of circuit netlist, but not of input–output mapping of
any PUFs. For preventing the first type of attack, FPGA was used after the PUF and
PUF were made large to accept a large challenge. To prevent the second attack, a PUF
was placed in a location that was difficult to control directly using the primary inputs
of the circuit. By preventing this attack, reverse engineering was made difficult for an
adversary. These two methods were demonstrated on ISCAS 89 and ITC 99 bench-
marks and functionality of circuits were obfuscated with area overhead upto 10 %.
58 R.G. Dutta et al.
In [31], obfuscation of DSP circuit was done using high-level transformation, key-
based FSM, and a reconfigurator. High-level transformation does structural obfus-
cation of the DSP circuit at HDL or netlist level by preserving its functionality. This
transformation was chosen based on the DSP application and performance require-
ments (e.g., area, speed, power, or energy). For performing high-level transformation
on the circuit, reconfigurable secure switches were designed in [31]. These switches
were implemented as multiplexers, whose control signals were obtained from a FSM
designed using ring counters. Securities of these switches were directly related to the
design of the ring counters. Another FSM, called the obfuscated FSM, was incorpo-
rated in the DSP circuit along with the reconfigurator. A configuration key was given
to the obfuscated FSM for operating the circuit correctly. This key consisted of two
parts: an L-bit initialization key and a K-bit configure data. The initialization key was
used to activate reconfigurator via the obfuscated FSM, whereas configure data was
used by the reconfigurator to control the operation of the switches. As configuration
of the switches required correct initialization key and configure data, attacks target-
ing either of them could not affect the design. An adversary attempting to attack a
DSP circuit, obfuscated with this method, had to consider the length of the config-
uration key and the number of input vectors required for learning the functionality
of each variation mode. Structural obfuscation degree (SOD) and functional obfus-
cation degree (FOD) were used as metrics for measuring simulation-based attack
and manual attack (visual inspection and structural analysis). SOD was estimated
for manual attacks, whereas FOD estimated obfuscation degree of simulation-based
attacks. A higher value of SOD and FOD indicated a more secure design. The area
and the power overheads of this method were low.
Field-programmable gate arrays (FPGAs) have been widely used for many applica-
tions since 1980s. They provide considerable advantage in regards to design cost
and flexibility. Due to accelerated time-to-market, designing a complete system on
FPGA is becoming a daunting task. To meet the demands, designers have started
using/reusing third-party intellectual property (IP) modules, rather than developing
a system from scratch. However, this has raised the risk of FPGA IP piracy. Protec-
tion methods [33–36] have been proposed to mitigate this issue. In [35], a protection
scheme was proposed which used both public-key and symmetric-key cryptography.
To reduce area overhead, the public-key functionality was moved to a temporary con-
figuration bitstream. Using five basic steps, the protection scheme enabled licensing
of FPGA IP to multiple IP customers. However, this scheme restricted system inte-
grators to the use of IP from a single vendor. The scope of the FPGA IP protection
method of [35] was extended in [36], where system integrators could use cores from
multiple sources. In [33], implementation of the protection methods of [35, 36] was
carried out on commercially available devices. For securely transporting the key,
4 IP Trust: The Problem and Design/Validation-Based Solution 59
[33] used symmetric cryptography and trusted third-party provider. Use of symmet-
ric cryptography also reduced the size of temporarily occupied reconfigurable logic
for building the IP decryption key.
A practical and feasible protection method for SRAM-based FPGA was given
in [34]. This approach allowed licensing of IP cores on a per-device basis and it
did not require contractual agreement with trusted third-parties, large bandwidth,
and complicated communication processes. The IP instance was encrypted for each
system integrator and decryption key was generated using the license for the chips.
This procedure ensured that the licensed IP core was used only on the contracted
devices. Moreover, it helped to prevent IP core counterfeiting by tracking the unique
fingerprint embedded in the licensed IP instance of the vendor. The proposed scheme
did not require an external trusted third-party (TTP) and was applicable on IP cores
designed for commercial purposes. It also helped in secure transaction of IP cores and
prevented sophisticated attackers from cloning, reverse engineering, or tampering
contents of the IP core.
A summary of all the above-described protection methods is shown in Fig. 4.4.
4.3 IP Certification
The PCH framework was also extended to support verification of gate-level circuit
netlist [44]. With the help of the new gate-level framework, authors in [44] formally
analyzed the security of design-for-test (DFT) scan chains, which is the industrial
standard testing method, and formally proved that a circuit with scan chain can vio-
late data secrecy property. Although various attack and defense methods have been
developed to thwart the security concerns raised by DFT scan chains [53–58], meth-
ods for formally proving the vulnerability of scan chain inserted designs did not exist.
For the first-time vulnerability of such a design was proved using the PCH framework
of [44]. The same framework was also applied in built-in-self-test (BIST) structure
to prove that BIST structure can also leak internal sensitive information [44].
Equivalence Checking
Orthogonal to the theorem proving approach is equivalence checking, which ensure
that the specification and the implementation of a circuit are equivalent. The tradi-
tional equivalence checking approach uses a SAT solver for proving functional equiv-
alence between two representations of a circuit. In this approach, if the specification
and the implementation were equivalent, the output of the “xor” gate was always zero
(false). If the output was true for any input sequence, it implied that the specifica-
tion and the implementation produced different outputs for the same input sequence.
Following the equivalence checking approach, [38] proposed a four-step procedure
to filter and locate suspicious logic in third-party IPs. In the first step, easy-to-detect
signals were removed using functional vectors generated by a sequential ATPG. In
the next step, hard-to-excite and/or propagate signals were identified using a full-
scan N-detect ATPG. To narrow down the list of suspected signals and identify the
actual gates associated with the Trojan, a SAT solver was used in the third step for
equivalence checking of the suspicious netlist containing the rarely triggered signals
against the netlist of the circuit exhibiting correct behavior. At the final step, clusters
of untestable gates in the circuit were determined using the region isolation approach
on the suspected signals list.
However, traditional equivalence checking techniques could result in state space
explosion when large IP blocks were involved with significantly different specifica-
tions and implementations. They also could not be used on complex arithmetic cir-
cuits with larger bus widths. An alternative approach was to use computer symbolic
algebra for equivalence checking of arithmetic circuit. These circuits constituted a
significant portion of datapath in signal processing, cryptography, multimedia appli-
cations, error root causing codes, etc. Due to this, their chances of malfunctioning
were very high. The new equivalence checking approach allowed verification of such
large circuits and it did not cause state space explosion.
4 IP Trust: The Problem and Design/Validation-Based Solution 63
4.4 Conclusion
In this chapter, we analyzed existing prevention and certification methods for soft/
firm hardware IP cores. The prevention methods largely consisted of various hard-
ware locking/encryption schemes. These methods protected IP cores from piracy,
overbuilding, reverse engineering, cloning, and malicious modifications. On the
other hand, formal methods, such as theorem proving and equivalence checking,
helped validate the trustworthiness of IP cores. These methods can help certify the
trustworthiness of IP cores. Meanwhile, after a thorough analysis of all these pro-
posed IP validation/protection methods, we realized that a single method is not suffi-
cient to eliminate all the threats to IP cores. Combination of these methods becomes
a necessity in order to ensure the security of IP cores and further secure the modern
semiconductor supply chain.
Acknowledgements This work was supported in part by the National Science Foundation (CNS-
1319105).
References
1. Kocher, P.: Advances in Cryptology (CRYPTO’96). Lecture Notes in Computer Science, vol.
1109, pp. 104–113 (1996)
2. Kocher, P., Jaffe, J., Jun, B.: Advances in Cryptology–CRYPTO’99, pp. 789–789 (1999)
3. Quisquater, J.J., Samyde, D.: Smart Card Programming and Security. Lecture Notes in Com-
puter Science, vol. 2140, pp. 200–210 (2001)
4. Gandolfi, K., Mourtel, C., Olivier, F.: Cryptographic Hardware and Embedded Systems
(CHES) 2001. Lecture Notes in Computer Science, vol. 2162, pp. 251–261 (2001)
5. Chari, S., Rao, J.R., Rohatgi, P.: Cryptographic Hardware and Embedded Systems—Ches
2002. Lecture Notes in Computer Science, vol. 2523, pp. 13–28. Springer, Berlin (2002)
6. Messerges, T.S., Dabbish, E.A., Sloan, R.H.: IEEE Trans. Comput. 51(5), 541 (2002)
7. Tiri, K., Akmal, M., Verbauwhede, I.: Solid-State Circuits Conference, 2002. ESSCIRC 2002.
Proceedings of the 28th European, pp. 403–406 (2002)
8. Fan, Y.C., Tsao, H.W.: Electr. Lett. 39(18), 1316 (2003)
9. Torunoglu, I., Charbon, E.: IEEE J. Solid-State Circuits 35(3), 434 (2000)
10. Kahng, A.B., Lach, J., Mangione-Smith, W.H., Mantik, S., Markov, I.L., Potkonjak, M.,
Tucker, P., Wang, H., Wolfe, G.: IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 20(10),
1236 (2001)
11. Lach, J., Mangione-Smith, W.H., Potkonjak, M.: IEEE Trans. Comput.-Aided Des. Integr. Cir-
cuits Syst. 20(10), 1253 (2001)
12. Qu, G., Potkonjak, M.: Proceedings of the 37th Annual Design Automation Conference, pp.
587–592 (2000)
13. Chang, C.H., Zhang, L.: IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 33(1), 76
(2014)
14. Roy, F.K.J.A., Markov, I.L.: Design, Automation and Test in Europe (DATE), vol. 1 (2008)
15. Rajendran, J., Pino, Y., Sinanoglu, O., Karri, R.: Design Automation Conference (DAC), 2012
49th ACM/EDAC/IEEE, pp. 83–89 (2012)
16. Rajendran, J., Pino, Y., Sinanoglu, O., Karri, R.: Design, Automation Test in Europe Confer-
ence Exhibition (DATE), vol. 2012, pp. 953–958 (2012). doi:10.1109/DATE.2012.6176634
64 R.G. Dutta et al.
17. Rajendran, J., Zhang, H., Zhang, C., Rose, G., Pino, Y., Sinanoglu, O., Karri, R.: IEEE Trans.
Comput. 99 (2013)
18. Dupuis, S., Ba, P.S., Natale, G.D., Flottes, M.L., Rouzeyre, B.: Conference on IEEE 20th Inter-
national On-Line Testing Symposium (IOLTS), IOLTS ’14, pp. 49–54 (2014)
19. Chakraborty, R., Bhunia, S.: IEEE/ACM International Conference on Computer-Aided Design
2008. ICCAD 2008, pp. 674–677 (2008). doi:10.1109/ICCAD.2008.4681649
20. Chakraborty, R., Bhunia, S.: IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 28(10),
1493 (2009). doi:10.1109/TCAD.2009.2028166
21. Chakraborty, R.S., Bhunia, S.: J. Electr. Test. 27(6), 767 (2011). doi:10.1007/s10836-011-
5255-2
22. Chakraborty, R., Bhunia, S.: IEEE/ACM International Conference on Computer-Aided
Design—Digest of Technical Papers, 2009. ICCAD 2009, pp. 113–116 (2009)
23. Narasimhan, S., Chakraborty, R., Bhunia, S.: IEEE Des. Test Comput. 99(PrePrints) (2011).
http://doi.ieeecomputersociety.org/10.1109/MDT.2011.70
24. Desai, A.R., Hsiao, M.S., Wang, C., Nazhandali, L., Hall, S.: Proceedings of the Eighth Annual
Cyber Security and Information Intelligence Research Workshop, CSIIRW ’13, pp. 8:1–8:4.
ACM, New York (2013). doi:10.1145/2459976.2459985
25. Li, L., Zhou, H.: 2013 IEEE International Symposium on Hardware-Oriented Security and
Trust, HOST 2013, Austin, TX, USA, June 2–3, pp. 55–60 (2013). doi:10.1109/HST.2013.
6581566
26. Chakraborty, R., Bhunia, S.: 23rd International Conference on VLSI Design, 2010. VLSID’10,
pp. 405–410 (2010). doi:10.1109/VLSI.Design.2010.54
27. Chakraborty, R., Bhunia, S.: IEEE International Workshop on Hardware-Oriented Security
and Trust, 2009. HOST ’09, pp. 96–99 (2009). doi:10.1109/HST.2009.5224963
28. Alkabani, Y., Koushanfar, E.: USENIX Security, pp. 291–306 (2007)
29. Liu, B., Wang, B.: Design. Automation and Test in Europe Conference and Exhibition (DATE),
pp. 1–6 (2014). doi:10.7873/DATE.2014.256
30. Wendt, J.B., Potkonjak, M.: Proceedings of the 2014 IEEE/ACM International Conference on
Computer-Aided Design, ICCAD’14, pp. 270–277. IEEE Press, Piscataway (2014). http://dl.
acm.org/citation.cfm?id=2691365.2691419
31. Lao, Y., Parhi, K.: IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 99, 1 (2014). doi:10.
1109/TVLSI.2014.2323976
32. Natale, G.D., Dupuis, S., Flottes, M.L., Rouzeyre, B.: Workshop on Trustworthy Manufactur-
ing and Utilization of Secure Devices (TRUDEVICE13) (2013)
33. Maes, R., Schellekens, D., Verbauwhede, I.: IEEE Trans. Inf. Forensics Secur. 7(1), 98 (2012)
34. Zhang, L., Chang, C.H.: IEEE Trans. Inf. Forensics Secur. 9(11), 1893 (2014)
35. Guneysu, T., Moller, B., Paar, C.: IEEE International Conference on Field-Programmable
Technology, ICFPT, pp. 169–176 (2007)
36. Drimer, S., Güneysu, T., Kuhn, M.G., Paar, C.: (2008). http://www.cl.cam.ac.uk/sd410/
37. Wolff, E., Papachristou, C., Bhunia, S., Chakraborty, R.S.: IEEE Design Automation and Test
in Europe, pp. 1362–1365 (2008)
38. Banga, M., Hsiao, M.: IEEE International Symposium on Hardware-Oriented Security and
Trust (HOST), pp. 56–59 (2010)
39. Hicks, M., Finnicum, M., King, S.T., Martin, M.M.K., Smith, J.M.: Proceedings of IEEE Sym-
posium on Security and Privacy, pp. 159–172 (2010)
40. Sturton, C., Hicks, M., Wagner, D., King, S.: 2011 IEEE Symposium on Security and Privacy
(SP), pp. 64–77 (2011)
41. Zhang, X., Tehranipoor, M.: 2011 IEEE International Symposium on Hardware-Oriented
Security and Trust (HOST), pp. 67–70 (2011)
42. Love, E., Jin, Y., Makris, Y.: IEEE Trans. Inf. Forensics Secur. 7(1), 25 (2012)
43. Jin, Y., Yang, B., Makris, Y.: IEEE International Symposium on Hardware-Oriented Security
and Trust (HOST), pp. 99–106 (2013)
44. Jin, Y.: IEEE Computer Society Annual Symposium on VLSI (ISVLSI) (2014)
45. INRIA: The coq proof assistant (2010). http://coq.inria.fr/
4 IP Trust: The Problem and Design/Validation-Based Solution 65
46. Guo, X., Dutta, R.G., Jin, Y., Farahmandi, F., Mishra, P.: Design Automation Conference
(DAC), 2015 52nd ACM/EDAC/IEEE (2015) (To appear)
47. Love, E., Jin, Y., Makris, Y.: 2011 IEEE International Symposium on Hardware-Oriented Secu-
rity and Trust (HOST), pp. 12–17 (2011)
48. Drzevitzky, S., Kastens, U., Platzner, M.: International Conference on Reconfigurable Com-
puting and FPGAs, pp. 189–194 (2009)
49. Drzevitzky, S.: International Conference on Field Programmable Logic and Applications, pp.
255–258 (2010)
50. Necula, G.C.: POPL’97: Proceedings of the 24th ACM SIGPLAN-SIGACT Symposium on
Principles of Programming Languages, pp. 106–119 (1997)
51. Drzevitzky, S., Platzner, M.: 6th International Workshop on Reconfigurable Communication-
Centric Systems-on-Chip, pp. 1–8 (2011)
52. Drzevitzky, S., Kastens, U., Platzner, M.: Int. J. Reconfig. Comput. 2010 (2010)
53. Yang, B., Wu, K., Karri, R.: Test Conference, 2004. Proceedings. ITC 2004. International, pp.
339–344 (2004)
54. Nara, R., Togawa, N., Yanagisawa, M., Ohtsuki, T.: Proceedings of the 2010 Asia and South
Pacific Design Automation Conference, pp. 407–412 (2010)
55. Yang, B., Wu, K., Karri, R.: IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 25(10),
2287 (2006)
56. Sengar, G., Mukhopadhyay, D., Chowdhury, D.: IEEE Trans. Comput.-Aided Des. Integr. Cir-
cuits Syst. 26(11), 2080 (2007)
57. Da Rolt, J., Di Natale, G., Flottes, M.L., Rouzeyre, B.: 2012 IEEE 30th VLSI Test Symposium
(VTS), pp. 246–251 (2012)
58. Rolt, J., Das, A., Natale, G., Flottes, M.L., Rouzeyre, B., Verbauwhede, I.: Constructive Side-
Channel Analysis and Secure Design. In: Schindler, W., Huss, S. (eds.) Lecture Notes on Com-
puter Science, vol. 7275, pp. 89–104. Springer, Berlin (2012)
Chapter 5
Security of Crypto IP Core: Issues
and Countermeasures
5.1 Introduction
The value of information in modern world has increased manifold in the last decade.
Global spreading of Internet along with the recent advances in IoTs (Internet of
Things) and PAN (Personalized Area Network) has forced the modern SoCs to han-
dle large amount of sensitive information which are needed to be protected. Hence,
cryptographic modules, protecting the sensitive information, are integral parts of
modern SoCs. Most of the SoCs now contain HSM (Hardware Security Module)
which is used for secure key generation and management for cryptographic opera-
tions along with secure crypto-processing. HSMs provide logical and physical pro-
tection for the digital keys which are used for encryption and authentication of secret
information. Along with HSMs, a SoC may also contain dedicated hardware accel-
erator for some cryptographic operations to reduce the delay of the system. Gener-
ally, cryptographic operations involve computation intensive operations; hence usage
of hardware accelerator is highly encouraged as it provides significant performance
improvement of the SoC.
Both hardware accelerators and software routines generally implement stan-
dard cryptographic algorithms which are secure against theoretical attacks. The
algorithms are standardized by FIPS (Federal Information Processing Standards)
and NIST (National Institute of Standards and Technology) and we can consider
them free of any theoretical or mathematical weakness which can be exploited
by the adversaries. Example of such standard cryptographic algorithms are AES
(Advanced Encryption Standard), RSA (Rivest Shamir Adleman Algorithm), ECC
(Elliptic Curve Cryptography), etc. AES is the most popular symmetric key algo-
rithm whereas RSA and ECC are the popular asymmetric key algorithms. Imple-
mentation of these algorithms, both in hardware and software, can be considered as
the crypto-IP core. They often form root of trust of the entire security module and
hence need to be protected against malicious adversaries.
However, although these algorithms are proved to be secured in the classical sense
where the attacker has the controllability of the input (plain-texts) and observability
of the output (cipher-texts), in real life the scenarios are often different. This gap
leads to the failure of the proofs when the adversary has access to much more infor-
mation than what the classical cryptanalyst anticipated. Furthermore, the conven-
tional designer engages several optimization techniques to improve the performance
of a given algorithm. But in the field of cryptographic engineering it has been found
repeatedly that naïve implementation optimizations had lead to catastrophic failures
of the crypto-systems. Thus, performance of the crypto-systems is a delicate issue
and the overhead of adding cryptographic layers should be as minimal as possible.
Hence, a fresh approach is often required for developing cryptographic IP, where
security is not an afterthought, but taken care from the beginning of the design cycle.
In this chapter we will show that security cannot be guaranteed by just implement-
ing a perfectly secure crypto-algorithm. A physical design of a crypto-algorithm
can become vulnerable due to unintended leakages emitting from the design. This
chapter provides a detailed analysis of such unintended leakages, popularly known as
side channel leakages. We will also discuss what are the available countermeasures
which can prevent these kinds of threats.
In the recent past, it has been often found that security of a design is compromised
due to the loopholes present in the design techniques. An example of this type of
phenomenon is the recent Heart Bleed Attack on OpenSSL [1, 2] which affected
lots of websites across the Internet. In OpenSSL, a heartbeat echoes back the data
that the user has sent. The length of the data can be at most 64 KB. Thus a heartbeat
comprises of two parts: data and length of the data. However, if an adversary sets the
length as 64 KB and send just one byte of data, he will still get 64 KB of data as echo.
This 64 KB of data comprised of one byte that the adversary has sent in heartbeat and
rest of them are secret information which should not be disclosed to the adversary.
Thus the reason of this Heart Bleed Attack is just a missing bound check in the code
which does not affect the functionalities of OpenSSL, but compromises the security
of the entire module. The web services affected by this Heart Bleed Attack includes
Gmail, Facebook and Yahoo [3, 4]. Thus one can understand how much impact this
kind of wrong engineering can have on security.
Optimization is another issue that need to be taken care of during design of a
crypto-primitive. Modern CAD tools are powerful and are capable of analyzing a
design and optimizing it if possible. However, optimizations may expose a design
to several threats which may leak the key through some other stealthy channels. It
5 Security of Crypto IP Core: Issues and Countermeasures 69
is hence imperative for the designer of crypto-IPs to understand the possible threats
and ensure that those leakages are properly mitigated. In particular in the following
sequel, we shall consider the threats from side channel attacks, which leak informa-
tion of intermediate states of a cipher which can be exploited to develop knowledge
of the secret keys.
Key
Key
Plain−Text Cipher−Text
Crypto−algorithm
Plain−Text Cipher−Text Implementation
Crypto−algorithm
(Black Box)
Power Time E.M Sound
Side Channel
Adversary Adversary
Side channel analysis can be broadly classified into two different classes, namely
passive and active side channel analysis, which are described as follows:
∙ Passive Side Channel Analysis: In passive side channel analysis, an adversary can
only observe or record side channel information emitting from the system. For
example, an adversary can observe power signatures or record electromagnetic
radiations of the system to learn about the internal states of the circuits. These
attacks, called as power attack and electromagnetic attack, are instances of passive
side channel analysis.
∙ Active Side Channel Analysis: In active side channel analysis, an adversary cannot
only observe the side channel information, but can also interfere with the circuit
operation. An example of such active side channel analysis is fault attack. In this
scenario an adversary intentionally injects faults during the circuit operation to
obtain faulty outputs, which can be used to obtain the secret information.
In this chapter we are going to study in details about the different side channel
analysis with more emphasis on power based side channel attack. We will start by
describing various attack strategies and popular countermeasures. This chapter will
also introduce the readers about the fault attack and side channel attack at the testing
phase. In the next section we will introduce the rationals behind power based side
channel analysis and corresponding attack strategies.
Most modern VLSI circuits are made of CMOS (Complementary Metal Oxide Semi-
conductor) gates which have a power characteristic which depend on the transitions
of data. This forms the basis for power analysis of cryptographic cores. Figure 5.2
shows a CMOS inverter which represents these class of underlying gates and shows
the different charge and discharge paths which leads to different energy consump-
tions when the output capacitance gets charged or discharged, denoted by E0→1 and
E1→0 respectively. Likewise when there is no transition we denote the energy con-
sumptions by E0→0 and E1→1 depending on the output voltage. Consider an AND
gate which is made by a similar complementary realization.
The transitions of the AND gate, denoted as y = AND(a, b) = a ∧ b, where a and b
are bits are shown in Table 5.1. The energy levels are annotated in the fourth column.
This column can be used to estimate the average energy when the output bit is 0 or
1, namely E(q = 0) and E(q = 1) respectively. We show that the power consumption
of the device is correlated by the value of the output bit. It may be emphasized that
this observation is central to the working of a DPA attack.
The average energies when q = 0 or q = 1 are:
Charging Path
CL
Discharging
Path
GND
Observe that if the four transition energy levels are different, then in general
|E(q = 0) − E(q = 1)| ≠ 0. This simple computation shows that if a large number of
power traces are accumulated and divided into two bins: one for q = 0 and the other
for q = 1 and when the means for the 0-bin and 1-bin are computed, the difference-
72 D.B. Roy and D. Mukhopadhyay
of-mean (DOM) is expected to have a non-zero difference at some point. This forms
the basis of Differential Power Analysis (DPA), where the variation of power of a
circuit w.r.t. data is exploited. There are several types of power analysis. The most
primitive form of power analysis is called as Simple Power Analysis or SPA. It is a
technique that involves directly interpreting power consumption measurements with
cryptographic operations. The objective of an SPA attack is to obtain the secret key
in one or few traces. That makes an SPA quite challenging in practice. We first pro-
vide an overview on the same, before describing other forms of power analysis like
Differential Power Attacks (DPA).
This subsection focuses on Simple Power Analysis (SPA), which is extremely easy
to execute and if possible, could be extremely deadly for the security of the crypto-
cores. SPA is applicable to the implementations where value of key determines the
type of operations to be executed. Attacker tries to exploit this operation dependency
on the key bits through SPA. Generally, two different operations (for example multi-
plication and addition/squaring) have different power signature and hence it is easy
to identify the type of operation from power traces. If the key value determines the
operation type, an attacker can easily extract the key by classifying the operations
from the power traces.
Apart from attacking, SPA can also be used to extract feature points from the
power traces. Feature points are the points on the power traces which on being ana-
lyzed by DPA will give you the key. For example, while doing DPA on AES, we
need to focus on the last round register update. By using SPA, we can pinpoint the
last round register update and can record the power traces for only those points. This
reduces both number of trace acquisition requirement and attack complexity.
We will provide example for both of the above instances. Using SPA, it is very
easy to administer a successful attack on Elliptic Curve Cryptographic (ECC) oper-
ations, implemented with Double-and-Add algorithm. The next discussion focuses
on this.
SPA on ECC
Apart from attacking, SPA can be used to identify operations like register update,
multiplication, etc. If these operations are not key dependent, SPA cannot be used to
extract the key and we need to shift our attack methodologies to more sophisticated
74 D.B. Roy and D. Mukhopadhyay
attack strategies like DPA. However, success of DPA depends upon selecting a par-
ticular region in a power trace where data dependency of the key can be exploited
(basic principal of DPA). For example, AES can be attacked by DPA by targeting
last round register update. SPA helps us to identify the last round register update
and in turn enhances the success probability of DPA. Figure 5.4a shows the corre-
sponding power trace of an AES implementation, obtained on SASEBO-GII board.
From Fig. 5.4a, we can easily identify each round of AES algorithm. Moreover, we
can now only focus on the last round register update operation, reducing the attack
complexity. Figure 5.4b shows a zoomed view of last round register update.
Thus, we have seen how an adversary can exploit operation dependency on key
bits through SPA. However, SPA is easy to prevent and can be easily avoided by very
simple countermeasures. For example, SPA on ECC can be countered by employing
Montgomery ladder [5], which removes operation dependency on key bits. But, this
is not the case for Differential Power Attacks (DPA) where data dependency on the
key bits are exploited by complex statistical analysis. In the next subsection we are
going to focus on this.
In this model, we assume that the power consumption of a CMOS circuit is propor-
tional to the value at the tth time instance, say vt . The model is oblivious of the state
at the (t − 1)th time instance, vt−1 . Thus, given the state vt , the estimated power is
denoted as HW(vt ), where HW is the Hamming Weight function. It may be argued
that this is an inaccurate model, given the fact that the dynamic power of a CMOS
circuit rather depends on the transition vt−1 → vt than on the present state. How-
ever, the Hamming Weight model provides an idea of power consumption in several
occasions. Consider situations where the circuit is pre-charged to a logic 1 or 0 (i.e.,
vt−1 =0 or 1), the power in that case is either directly or inversely proportional to
the value vt . In situations where the initial value vt is also uniformly distributed,
the dependence still holds directly or inversely owing to the fact that the transition
power due to a 0 → 1 toggle or 1 → 0 switch are not same. This is because of dif-
ferent charge and discharge paths of a CMOS gate (Fig. 5.2). Thus assuming that the
actual power is better captured by P = HW(vt−1 ⊕ vt ), there may exist a correlation
between P and HW(vt ) due to this asymmetric power consumption of CMOS gates.
This is a more accurate model of the power consumption of a CMOS gate. Here the
power consumption of a CMOS circuit is assumed to be proportional to the Ham-
ming Distance of the input and the output vector. In short, the power consumption
is modeled as P = HD(vt−1 , vt ) = HW(vt−1 ⊕ vt ), where HD denotes the Hamming
Distance between two vectors. This is a more accurate model as it captures the no of
toggles in a net of the circuit. However, in order to use the model the attacker needs
more knowledge than that for using the Hamming Weight model. The attacker here
needs to know the state of the circuit in successive clock cycles. This model is useful
for modeling the power consumption of registers and buses. On the contrary, they
are incapable for estimating the power consumption due to combinatorial circuits as
transitions of combinatorial circuits are unknown due to the presence of glitches.
conquer strategy: he assumes a portion of the key which is required to perform the
deciphering for a portion of the cipher for one round. Based on the guessed key he
computes a target bit, typically the computation of which requires evaluation of an S-
Box. Depending on whether the target bit is 0 or 1, the traces are divided into a 0-bin
and 1-bin. Then the mean of all the traces in the 0-bin and 1-bin are computed and
finally we compute Difference-of-Mean (DoM) of the mean curves. It is expected
that for the correct key guess, there will be a time instance for which there is a non-
negligible value, manifesting as a spike in the difference curve. The correlation of
the power consumption of the device on the target bit is thus exploited to distinguish
the correct key from the wrong ones.
Let us consider a sample run of the AES algorithm. We provide several runs of
the AES algorithm with NSample randomly chosen plaintexts. Consider an iterated
AES hardware where a register is updated by the output of the AES round every
encryption. The power trace is stored in the array sample[NSample][NPoint], where
NPoint is the length of the power trace corresponding to the power consumption
after each round of the encryption. For each of the power traces we also store the
corresponding ciphertexts in the array Ciphertext[NSample]. One can check that the
power consumption is corresponding to the state of the register before the encryption,
then updated by the initial key addition, followed by the state after each of the 9
rounds of AES, and finally the ciphertext after the 10th round.
The attack algorithm first targets one of the key bytes, key, for which one of the 16
S-Boxes of the last round is aimed at. We denote the corresponding ciphertext byte
in the array Ciphertext[NSample] by the variable cipher. For each of the NSample
plaintexts, the analysis partitions the traces, sample[NSample][NPoint] into a zero-
bin and one-bin, depending on a target bit at the input of a target S-Box. For comput-
ing or estimating the target bit at the input of the target S-Box, the attack guesses the
target key byte, and then computes the Inverse-SBox on the XOR of the byte cipher
and the guessed key byte, denoted as key. One may observe that the ciphertexts for
which the target byte in the ciphertext, cipher is same, always goes to the same bin.
Thus the traces can be stored in a smaller array, sample[NCipher][NPoint], where
NCipher is the number of cipher bytes (which is of course 256).
It essentially splits the traces into the two bins based on a target bit, say the LSB.
The algorithm then computes the average of all the 0-bin and the 1-bin traces, and
then computes the difference of the means, denoted as DoM. Let the number of traces
present in the 0-bin be count0 and the number of traces present in the 1-bin be count1 .
Then the value of DoM is calculated according to the following equation:
| ∑count0 ∑count1
| sample[i][NPoint]cipher[0]=0 sample[i][NPoint]cipher[0]=1 ||
DoM = || i=0
− i=0 |
| (5.1)
| count0 count1 |
| |
It is expected that the correct key will have a significant Difference of Mean,
compared to the wrong keys which have almost a negligible DoM. We then store the
highest value of the DoM as the biasKey[NKey] for each of the key guesses. The key
which has the highest bias value is returned as the correct key. An example of DoM
on AES is shown in Fig. 5.5a.
5 Security of Crypto IP Core: Issues and Countermeasures 77
Like in the DoM-based DPA attack, the Correlation Power Attack (CPA) also relies
on targeting an intermediate computation, typically the input or output of an S-Box.
These intermediate values are as seen previously computed from a known value,
typically the ciphertext and a portion of the key, which is guessed. The power model
is subsequently used to develop a hypothetical power trace of the device for a given
input to the cipher. This hypothetical power values are then stored in a matrix for
several inputs and can be indexed by the known value of the ciphertext or the guessed
key byte. This matrix is denoted as H, the hypothetical power matrix. Along with
this, the attacker also observes the actual power traces, and stores them in a matrix
for several inputs. The actual power values can be indexed by the known value of the
ciphertext and the time instance when the power value was observed. This matrix
is denoted as T, the real power matrix. It may be observed that one of the columns
of the matrix H corresponds to the actual key, denoted as kc . In order to distinguish
the key from the others, the attacker looks for similarity between the columns of the
matrix H and those of the matrix T. The similarity is typically computed using the
Pearson’s Correlation coefficient.
The actual power value for all the NSample encryptions are observed and stored
in the array trace[NSample][NPoint]. The attacker first scans each column of this
array and computes the average of each of them, and stores in meanTrace[NPoint].
Likewise, the hypothetical power is stored in an array hPower[NSample][NKey] and
the attacker computes the mean of each column and stores in meanH[NKey] by scan-
ning each column of the hypothetical matrix. The attacker then computes the corre-
lation value to find the similarity of the ith column of the matrix hPower and the
jth column of trace. The correlation is computed as follows and stored in the array
result[NKey][NPoint]:
78 D.B. Roy and D. Mukhopadhyay
∑NSample
k=0
(hPower[i][k] − meanH[i])(trace[j][k] − meanTrace[j])
result[i][j] = ∑NSample ∑NSample
k=0
(hPower[i][k] − meanH[i])2 k=0 (hPower[i][k] − meanH[i])2
The corresponding attack result is shown in Fig. 5.5b in which correct key value
is easily distinguishable from the wrong key guesses.
Till now we discussed attack strategies in which attacker has the access to the
device only during the attack phase, and does not have any control on the device
prior the attack. These attack strategies are known as non-profiling attacks. On the
other hand, if the adversary has the access to the device prior to the attack phase
and during which adversary can control the inputs to observe power signature for
different key values, he can build more sophisticated attack methodologies, known
as profiling attack. One of the popular profiling attack strategy is template attack,
which is described in the next subsection.
Template attack was introduced by Chari et al. in [6] as “the strongest form of side
channel attack possible in an information theoretic sense.” In the template attack, it is
assumed that the attacker has an access to the device or clone of the device which he
can use for profiling of the device. During profiling the attacker collect power traces
for different key values and classes. In the attack phase, the adversary recovers the
correct key from the design under attack using the estimated leakage distribution.
Profiling attacks are more generic than their non-profiling counterparts in the sense
that they require weaker assumption on the leakage model. But their applications are
limited by the requirement of the access to a cloned device. The first template attack
proposed in [6] consists of following steps:
∙ Template Building:
1. Collect T number of power traces for each class of key value. For example, an
attacker can collect power traces for each hamming weight class of the key. Let
us assume that there are L different classes of key.
2. Compute mean of the power traces for each L classes. Let us denote this mean
power traces as M = (M1 , M2 , M3 , … , ML ).
3. Now, we need to identify high SNR region in the power traces. This is extremely
important for the success of the template attack as it is crucial to reduce the trace
length to reduce the execution time of template attack. There are several ways
to reduce the trace length to a small (n) high SNR sample. In [6], the author
proposed a method in which pairwise difference between mean power traces were
taken and only the points with high differences were considered. Let us assume
these points as (P = P1 , P2 , … , Pn ). There are other possible methods also. For
example, NICV [7] or analysis in principal subspaces [8] can also be used to find
out the high SNR points.
5 Security of Crypto IP Core: Issues and Countermeasures 79
4. In the next step, noise matrix N is calculated for each of the power traces. The
noise matrix for ith trace is calculated as follows: Ni = (Ti [0] − Ml [0], Ti [1] −
Ml [1], … , Ti [n] − Ml [n]), where l is the class of the corresponding trace. Then
we compute covariance matrix between the Ni s of a particular class, denoted as
∑
Ni [u, v] = cov(Ni (Pu ), Ni (Pv )). ∑
5. Thus, we have now built the template for each of the classes (Mi , Ni ).
∙ Attack Phase:
1. Let us denote the trace under attack as t. We compute ntesti = Mi − t for each
classes.
2. The probability that the trace under attack t belongs to particular class i can be
computed by maximum likelihood test in multivariate gaussian distribution using
following formula
∑
1 − 12 ntesti T ( N )−1 ntesti
p(ntesti ) = ∑ e i (5.2)
(2𝜋)n | Ni |
Template attack, based on stronger adversary attack model, requires very few
number of power traces compared to standard DPA. To protect crypto-systems
against this kind of sophisticated profiling attack or standard non-profiling DPA
attack, we need to develop efficient countermeasures which is the focus of the next
section.
subsection we are going to give a brief description of private circuit with an analysis
of its advantage and disadvantages. For theoretical proofs and analysis of circuit, we
encourage the readers to go through [9].
In t-private circuit approach, a circuit is transformed in such a way that any adversary,
having capability of observing t nets, cannot get access to a single bit of sensitive
information. The minimum number of probes required by an adversary to extract
one bit of information is t + 1. This subsection provides a brief description of such
transformation.
∙ Input Encoding:- Any input bit a is transformed into a vector of 2t + 1 bits. The
first 2t bits are random values (a1 , a2 , … , a2t ) and the last bit (a2t+1 ) is computed
by the following way:
⨁2t
a2t+1 = a ⊕ ai . (5.3)
i=1
∙ AND gate: Inputs a and b of AND gate are transformed into vectors à = (a1 , a2 , … ,
a2t+1 ) and b̀ = (b1 , b2 , … , b2t+1 ). Output of the AND gate is also a vector
c̀ = (c1 , c2 , … , c2t+1 ), which is calculated by following steps:
1. Generate random bits ri,j , where i ≠ j and 1 ≤ i ≤ j ≤ 2t + 1.
2. Compute rj,i = (ri,j ⊕ ai bj ) ⊕ aj bi , where i ≠ j and 1 ≤ i ≤ j ≤ 2t + 1.
⨁
3. Compute ci = ai bi ⊕ j≠i ri,j , where 1 ≤ i ≤ 2t and 1 ≤ j ≤ 2t.
∙ NOT gate: Input a is transformed into a vector à = (a1 , a2 , … , a2t+1 ). Output ā̀ is
computed by inverting any bit of a. ̀ E.g., ā̀ = (a1 , a2 , … , a2t+1 ).
∙ XOR gate: Like AND gate; inputs a and b of XOR gate are transformed into
vectors à = (a1 , a2 , … , a2t+1 ) and b̀ = (b1 , b2 , … , b2t+1 ). Output c̀ = (c1 , c2 , … ,
c2t+1 ) is calculated in the following way. Perform:
ci = ai ⊕ bi , 1 ≤ i ≤ 2t . (5.4)
Using these transformations, one can easily transform any digital circuit to a t-
private circuit, because this set of gates is universal.
Let us consider an AND gate in t-private circuit for t = 1. Inputs of the AND
gate are two vectors à = (a1 , a2 , a3 ) and b̀ = (b1 , b2 , b3 ), encoded according to (5.3).
Output c̀ = (c1 , c2 , c3 ) is calculated as follows:
The t-private circuits are secure against an adversary capable of observing any t
nets of the circuit at any given time instant. Construction of t-private circuits involves
2t number of random bits for each input bit of the circuit. Moreover, it requires 2t + 1
random bits for each 2-input AND gate present in the circuit. The overall complexity
of the design is (nt2 ) where n is the number of gates in the circuit.
The complexity of (nt2 ) is often considered impractical for several practical
implementations. After the publication of [9], there have been many works which
try to improve its result, in particular the area overhead. In [10], the authors have
improved the complexity of private circuit from (nt2 ) to (nt). Moreover, in their
recent work [11], they have further improved it to ⌈t∕2⌉ for private circuits. They
have also provided theoretical analysis and improvement of private circuit in context
of power-based side channel attack and glitch [12, 13].
Due to its high area requirement, it is not practical to use private circuit as a side
channel countermeasure despite being theoretically secure. However, private circuit
provides the basis of one of the most popular countermeasure Masking, which is
described in the next subsection.
5.3.2 Masking
Masking is probably the most popular countermeasure against side channel analysis
and has been studied in great details. The objective of the masking scheme is to
randomize the intermediate results and is to make power consumption of the device
independent of the sensitive data processed. The countermeasure is based on the fact
that the power consumption of the devices are uncorrelated with the actual data as
they are masked with a random value.
In masking every intermediate value which is related to the key is concealed by
a random value m which is called as the mask. Thus, we transform the intermediate
value v as vm = v ∗ m, where m is randomly chosen and varies from encryption to
encryption. The attacker does not know the value m. The operation ∗ could be either
exclusive or, modulo addition, or modulo multiplication. Boolean masking is a spe-
cial term given to the phenomenon of applying ⊕ as the above operation to conceal
the intermediate value. If the operation is addition or multiplication, the masking is
often referred to as arithmetic masking.
𝛿 1 = 𝛾1 d ′ (5.10)
𝛿0 = (𝛾1 + 𝛾0 )d′ (5.11)
d = 𝛾12 𝜇 + 𝛾1 𝛾0 + 𝛾02 (5.12)
′ −1
d =d (5.13)
Next we consider the masking of these operations. The masked values corre-
sponding to the input is thus, (𝛾1 + mh )Y + (𝛾0 + ml ), of which the inverse is to be
computed such that the output of the equations of Eq. 5.10 are also masked by ran-
dom values, respectively m′h , m′l , md , m′d .
Let us consider the masking of Eq. 5.10. Thus we have:
One has to take care when adding correction terms that no intermediate values
are correlated with values, which an attacker can predict. We thus mask d′ in the
correction term as follows: (𝛾1 + mh )m′d + mh (d′ + m′d ) + mh m′d + m′h .
Thus, the entire computation can be written as:
We thus have:
Likewise, one can derive the remaining 2 equations (Eqs. 5.11 and 5.12) in the
masked form. For Eq. 5.11, we have:
Again the resulting circuit complexity can be reduced by reusing the same mask. we
can choose m′l = ml and m′d = mh = m′h . Thus we have:
84 D.B. Roy and D. Mukhopadhyay
d + md = 𝛾12 p0 + 𝛾1 𝛾0 + 𝛾02 + md
= (𝛾1 + mh )2 p0 + (𝛾1 + mh )(𝛾0 + ml ) + (𝛾0 + ml )2 + (𝛾1 + mh )ml + (𝛾0 + ml )mh
+m2h p0 + m2l + mh ml + md
= fd ((𝛾1 + mh ), (𝛾0 + ml ), p0 , mh , ml , md ) (5.21)
Masking Eq. 5.13 involves masking an inverse operation in GF(24 ). Hence the
same masking operations as above can be applied while reducing the inverse to that
in GF(22 ). Thus, we can express an element in GF(24 ) 𝛿 = Γ1 Z + Γ0 , where Γ1 and
Γ0 ∈ GF(22 ). Interestingly, in GF(22 ) the inverse is a linear operation making mask-
ing easy! Thus we have, (Γ + m)−1 = Γ−1 + m−1 . This reduces the gate count con-
siderably.
One of the most popular countermeasures to prevent DPA attacks at gate level is
masking. Although there are various techniques to perform masking, the method
of masking has finally evolved to the technique proposed in [15]. The principle
of this masking technique is explained briefly, in reference to a 2 input and gate.
The same explanation may be extended to other gates, like or, xor etc. The gate
has two inputs a and b and the output is q = a and b. The corresponding mask
values for a, b and q are respectively ma , mb and mq . Thus the masked values
are: am = a ⊕ ma , bm = b ⊕ mb , qm = q ⊕ mq . Hence the masked and gate may be
expressed as: qm = f (am , bm , ma , mb , mq ). The design proposed in [15] proposes a
5 Security of Crypto IP Core: Issues and Countermeasures 85
hardware implementation for the function f for the masked and gate, which may be
easily generalized to a masked multiplier. This is because the 2-input and gate is a
special of case of an n-bit multiplier, as for the and gate we have n = 1. The masked
multiplier (or masked and gate by assuming n = 1) is depicted in Fig. 5.6. The cor-
rectness of the circuit may be established by the following argument:
qm = q ⊕ mq
= (ab) ⊕ mq
= (am ⊕ ma )(bm ⊕ mb ) ⊕ mq
= (am bm ⊕ bm ma ⊕ am mb ⊕ ma mb ⊕ mq )
The ordering follows to ensure that the unmasked values are not exposed during
the computations. Further it should be emphasized that one cannot reuse the mask
values. For example one may attempt to make one of the input masks, ma same as
the output mask, mq . While this may seem to be harmless, it can defeat the purpose
of the masking. In the subsequent discussions, we will give examples of side channel
attacks which take advantage of improper implementation of masking scheme.
Attack on Masking
Masking, being the most widely used countermeasure, has been constantly evalu-
ated against sophisticated side channel analysis. There are several instances where a
masked hardware has failed to prevent side channel attacks. Masking prevents first
order (1o) differential power attacks and can be broken by second order differential
attacks. However, in literature it has been shown that masking can be broken by first
order differential attack if it is not implemented in a correct way. We will mainly
focus on two different attack methodologies: Attack due to glitches and collision-
correlation attack.
The Masked and Gate and Vulnerabilities Due to Glitches
The circuit for computing the masked AND gate is shown in Fig. 5.6. The same cir-
cuit can be applied for masking a GF(2n ) multiplier as well. We observe that the
masked multiplier (or, AND gate) requires four normal multipliers (or, AND gates)
and four normal n-bit (or 1-bit) XOR gates. Also it may be observed that the multi-
pliers (or, AND gates) operate pairwise on (am , bm ), (bm , ma ), (am mb ) and (ma , mb ).
86 D.B. Roy and D. Mukhopadhyay
ab ⊕ mq
Each of the element of the pairs has no correlation to each other (if the mask value
is properly generated) and are independent of the unmasked values a, b and q. One
can obtain a transition table and obtain the expected energy for generating q = 0 and
q = 1. The gate now has 5 inputs and thus there can be 45 = 1024 transitions (like
Table 5.1). If we perform a similar calculation as before for unmasked gates, we find
that the energy required to process q = 0 and q = 1 are identical. Thus if we compute
the mean difference of the power consumptions for all the possible 1024 transitions
for the two cases: q = 0 and q = 1, we should obtain theoretically zero. Likewise
the energy levels are also not dependent on the inputs a and b and thus supports
the theory of masking and show that the masked gate should not leak against a first
order DPA. However in this analysis we assume that the CMOS gates switch once
per clock cycle, which is true in the absence of glitches.
But glitches are a very common phenomenon in digital circuits, as a result of
which the CMOS gates switch more than once in a clock signal before stabilizing to
their steady states. One of the prime reasons of glitches in digital circuits is different
arrival times of the input signals, which may occur in practice due to skewed circuits,
routing delays, etc. As can be seen in the circuit shown in Fig. 5.6, the circuit is
unbalanced which leads to glitches in the circuit.
The work proposed in [16] investigates various such scenarios which causes
glitches and multiple toggles of the masked AND gate. The assumption is that each
of the 5 input signals toggle once per clock cycle and that one of the inputs arrive at
different time instance than the others. Moreover we assume that the delay between
the arrival time of two distant signals is more than the propagation time of the gate.
As a special case, consider situations when only one of the five inputs arrives at a
different moment of time than the remaining four inputs.
5 Security of Crypto IP Core: Issues and Countermeasures 87
There exist ten such scenarios as each one of the 5 input signals can arrive either
before or after the four other ones. In every scenario there exist 45 = 1024 possible
combinations of transitions that can occur at the inputs. However, in each of the
ten scenarios where the inputs arrive at two different moments of time, the output
of the masked and gate performs two transitions instead of one. One transition is
performed when the single input performs a transition and another one is performed
when the other four input signals perform a transition. Thus the Transition Table
for such a gate in this scenario would consist of 2048 rows and we observe that
the expected mean for the cases when q = qm ⊕ mq = 0 is different from that when
q = 1. Similar results were found in other scenarios as well. This bias of leakage in
masked gates in presence of glitches can be exploited to apply successful attacks on
masking countermeasure.
Collision -Correlation Attack:
The main objective of a collision-correlation attack is to find out the region in the
power traces which handles same data, and use this knowledge to get access to
secret information. To illustrate this attack methodology, let us consider an exam-
ple. Assume that we are going to implement the attack on an masked AES imple-
mentation. Internal states of the AES design is denoted as (x0 , x1 , … , x15 ), where
xi = a data byte. Similarly, key is also denoted as (k0 , k1 , … , k15 ). S-Boxes of the
AES implementation is masked and input masks for all the bytes is same. So each
masked S-box has a input xi′ = xi + u, u is the input mask. Output of the masked S-
Box(S′ ) is S′ (xi′ ) = y′i = yi + v, v is the output mask and yi = S(xi ). Now we obtain
N power traces for a same message M and identify the power traces for each masked
S-Box access in the first round. Using any statistical distinguisher (for example Pear-
son’s correlation coefficient), we can find out whether any of the S-Boxes are having
collision, i.e., whether they are handling the same data or not.
Once we get a collision between the ith S-Box access and jth S-Box access, we
can obtain following formulation.
xi′ = xj′
xi ⊕ u = xj ⊕ u
mi ⊕ ki = mj ⊕ kj
mi ⊕ mj = ki ⊕ kj
This experiment is repeated for multiple times with different messages (mi = ith
byte of plain-text) to obtain sufficient number of collisions such that the guessing
entropy of the key reduces. The attack methodology described here was first intro-
duced in [17]. Consequently more advanced attack methodology with different sta-
tistical tools have been presented in [18, 19].
Generally when a crypto-algorithm is developed, side channel security is not
considered as a required parameter. The crypto-algorithm is initially formulated to
thwart theoretical cryptanalysis techniques. After the development of the algorithm,
side channel protection is provided during the implementation by adding an external
88 D.B. Roy and D. Mukhopadhyay
side channel countermeasure like masking. This approach leads to resource hungry
designs and has a huge overhead. For example, gates required by masked AES is
three times of original AES implementation.
Another alternative approach is to design crypto-algorithms which will be secure
against side channel by its construction. An example of such technique is given in
the next subsection.
DRECON
The secret in DRECON comprises of the tuple (t, k), where t is called the tweak and k
the key used in the block cipher. The key k is held constant for all encryptions, while
the tweak t changes for each encryption, using a tweak generation algorithm. The
tweak is used to select a function from the set {𝖥1 , 𝖥2 , ⋯ , 𝖥r }, where 𝖥j ∶ 𝔽2n ↦ 𝔽2n
and (1 ≤ j ≤ r), are cryptographically strong sbox functions. For every application
of the sbox on X, a function from is selected depending on the value of the tweak
(t) and applied to X. This sbox, known as the tweaked-sbox, is represented by S(⋅, ⃗ ⋅)
and defined as follows:
R
⃗ X) ← 𝖥t (X)
S(t, where t ← {1, 2, ⋯ , r}.
In a typical iterative block cipher, the first round key is added to plaintext before
the sbox operation and the sbox operation has the form S(x ⊕ k). However, in
DRECON, we choose to omit the key-whitening at the beginning and end of the
encryption. Thus, each round except the last round consists of a substitution layer,
diffusion layer and key addition layer. The last round consists of only substitution
layer. Each of the sboxes of the substitution layers is replaced by the tweaked-sbox.
For all round, the same tweaks are used while the sboxes in a round have differ-
5 Security of Crypto IP Core: Issues and Countermeasures 89
X1 Xq
First Round
S(X1 , T1 ) S(Xq , Tq )
Diffusion Layer
Z1 Zq
k1 ... kq
Z1 ⊕ k1 Zq ⊕ kq
Second Round
T1 Tweaked ... Tq Tweaked
Sbox Sbox
...
...
Fig. 5.7 First round of DRECON. The same structure is repeated for all rounds except the last
round which consists of only substitution layer
ent tweaks. The first round of DRECON is shown in Fig. 5.7. It may be noted that
DRECON requires no key-whitening at the beginning and end of the block cipher
since the tweaked-sboxes provide the required randomization of the input and output
respectively.
From the master tweak agreed upon by the sender and receiver, tweaks need to be
generated for each encryption. The tweak generation needs to produce uniformly
random tweaks in the range of 1 to r in order to select one of the r sboxes (for
DRECON-AES r = 16 or 256). Further, the algorithm needs to be secure against
power attacks as is discussed in detail in [22].
Any mask generation function (MGF) or stream cipher implemented in a secure
manner can be used as a tweak generator. However, given the fact that the adver-
sary has no control or knowledge of the input and output of the tweak generator,
lightweight solutions can be developed by balancing registers and minimizing the
combinatorial logic, which can otherwise leak [23]. A possible construction for a
tweak generation algorithm makes use of an LFSR as shown in Fig. 5.8. The design
uses two pairs of shift registers (S and S), each comprising of 128 flip-flops. The
flip-flops in S are a complement of the flip-flops in S. To obtain such a state, the
master tweak is used to seed S and the complement of the master tweak is used to
seed S. Further, the feedback obtained from a 128 degree primitive polynomial is
90 D.B. Roy and D. Mukhopadhyay
S
sd−1 sd−2 sd−3 sd−4 s0
complemented before being fed back to S. Since all clocks toggle at the same time,
the leakage from the registers is minimized. The alternate source of leakage, from the
combinatorial paths, is also kept minimum by choosing a primitive polynomial with
small number of coefficients. For DRECON-AES, the primitive polynomial chosen
is 𝛼 128 ⊕ 𝛼 95 ⊕ 𝛼 57 ⊕ 𝛼 45 ⊕ 𝛼 38 ⊕ 𝛼 36 ⊕ 1.
Key
Add Round Key
Mix Columns
Mix Columns
Add Round Key
Mix Columns
Shift Rows
Shift Rows
Shift Rows
Ciphertext
Plaintext
Table 5.2 Comparing resource requirements for 4 × 4 DRECON-AES with masking on an FPGA
(XC5VLX50-2FF324)
Implementation Slices LUTs Registers Clock cycles Clock period (ns)
4 × 4 AES 1120 3472 1270 11 11.14
Masked 4 × 4 AES 3427 10589 1765 11 23.83
4 × 4 DRECON-AES 1379 3868 1583 11 10.3
16
r=1
14 r=2
Guessing Entropy
r=4
12 r=8
r=16
10
8
6
4
2
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Number of Measurements
Fig. 5.10 Guessing Entropy versus number of measurements for different size of tweaks
When a crypto-chip is designed, it is imperative to analyze the chip for both func-
tionality and side channel security. Now there are two ways to test a design for side
92 D.B. Roy and D. Mukhopadhyay
channel security. The first one is to carry out actual attacks on the chip to see whether
it is possible for an adversary to get the secret information. However, this approach
poses several problems like:
1. The evaluator needs to have a deep understanding of the hardware implementa-
tion of the crypto-algorithm.
2. He needs to carry out several different attacks with different attack models to
become absolutely sure about the side channel resistance of the crypto-chip.
3. Due to the above two constraints, this approach is time consuming and not suit-
able for commercial testing mechanisms.
Another approach is to have a simple test where instead of doing actual attacks,
we try to measure the information leakage from the design. This testing methodology
is fast and does require neither understanding of the intrinsic details of the hardware
design nor attack model. Moreover, this type of leakage testing can help to identify
high SNR (signal to noise ratio) region in the power traces, leading to more efficient
attacks.
In literature, there are two different such test methodologies:
∙ Normalized Inter-Class Variance (NICV) Test: NICV test [7] is similar to sta-
tistical F-test, where we try to find high SNR value in given power traces. The
steps involved in computing NICV are as follows:
1. Let us assume that plain-text input to the design can be separated into n number
of different classes.(For example, a plaintext byte can be divided into 9 different
classes depending upon its hamming weight)
2. Collect power traces and separate them into the n classes depending upon the
input plain-text value
3. Compute the mean of each class and then compute the variance of these mean
curves. Let us denote this variance as mean class variance (MCV)
4. Compute the variance of all the power traces. Let us denote this variance as all
trace variance (ATV)
5. NICV = MCVATV
Generally NICV is used to find out favorable points in the power traces to execute
power attack. It can also be useful for profiling and training phase of template
attack. An NICV plot, obtained during power analysis on block cipher SIMON is
shown in Fig. 5.11a.
∙ Test Vector Leakage Assessment (TVLA): TVLA test is similar to the statistical
T-test. Like NICV, this test can be used to find out high SNR region in the power
traces. Moreover, this test provides yes-no answer to the question whether the
device is side channel secure or not. This testing method is fast and can be used in
high speed commercial testing of the crypto-chips.
The steps in TVLA test are as follows:
5 Security of Crypto IP Core: Issues and Countermeasures 93
(a) NICV plot of power analysis of a (b) TVLA plot of power analysis of a
SIMON implementation SIMON implementation
1. Create two different Dataset (Q1 and Q2 ), each with n instances of plain-text
and key. One comprising of same plain-text and the other one contains random
plain-texts. Both the data-set has same key.
2. Now obtain n power traces for each dataset and compute the TVLA metric
according to the following formula
mean(Q1 ) − mean(Q2 )
TVLA = √ (5.24)
Var(Q1 ) Var(Q )
n
+ n 2
3. For any sample point in the power trace, if |TVLA| ≥ 4.5, we consider the device
not secure against side channel attack.
This methodology can be applied on each bit of the plain-text also, i.e., it is possi-
ble to generate TVLA plot for each bit of the plain-text. A TVLA plot of a SIMON
implementation is shown in Fig. 5.11b.
In this chapter we have given a brief introduction of power based side channel
attack and the countermeasures to mitigate the resulting threats. Next we are going
to give readers brief idea about fault analysis which can also threaten the security of
crypto-systems.
Programmable Gate Array) platforms. Recently the enhancement of the FPGAs has
lead to the use of these platforms for in-house development of cryptographic IPs.
The fact that the entire design can be performed in the laboratory, without relying
on an untrusted third party fab makes such design flows ideal from the point of view
of security [27].
In a System-on-Chip (SoC) the cores are pretested and pre-verified. However the
test and verification is mostly functional and the integrator is satisfied if the core
meets its functionalities. However for cryptographic cores, apart from the normal
functionality it is also important to model the core under not normal conditions. A
related study of cryptographic algorithms and the designs thereof is known as Dif-
ferential Fault Analysis (DFA) [28]. This analysis technique investigates the nature
of induced faults when a device is stressed beyond its normal operating conditions.
While the core integrator is mostly satisfied when the faults are not permanent, for
the attacker a single transient fault is enough to get the complete key for even stan-
dard ciphers like AES-128 [29].
In this subsection, we study the basic principle of DFA, which shall be subsequently
applied for the AES algorithm. As apparent from the name, DFA combines the con-
cepts of differential cryptanalysis with that of fault attacks. DFA is applicable to
almost any secret key crypto-system proposed so far in the open literature. DFA has
been used to attack many secret key crypto-systems, including DES, IDEA, and RC5
[30].
There has been considerable amount of work about DFA of AES. Some of the
DFA proposals are based on theoretical model [31–38], while others launched suc-
cessful attacks on ASIC and FPGA devices using previously proposed theoretical
models [37, 39–42]. The key idea of DFA is composed of three steps as shown in
Fig. 5.12. (1) Run the cryptographic algorithm and obtain non-faulty ciphertexts. (2)
Inject faults, i.e., rerun the algorithm with the same input, but in unexpected envi-
ronmental conditions, and obtain faulty ciphertexts (3) Analyze relationship between
the non-faulty and faulty ciphertexts to significantly reduce the key space.
Probably AES is the block cipher which has been studied most against DFA. There
has been several works on DFA of AES, using various types of fault models, like (1)
bit faults, (2) random byte faults, (3) multiple byte faults. In the following sequel, we
provide an understanding of DFA on AES.
We consider the attack based on byte level faults. We assume that certain bits of a
byte is corrupted by the induced fault and the induced difference is confined within
5 Security of Crypto IP Core: Issues and Countermeasures 95
a byte. The fact that the fault is induced in the penultimate round implies that apart
from using the differential properties of S-box (as used in the bit-level DFA on last
round of AES), the attacker also uses the differential properties due to the diffusion
properties of the Mix-Columns operation of AES. AES diffusion is provided using
a 4 × 4 MDS matrix in the Mix-Columns. Due to this matrix multiplication, if one
byte difference is induced at the input of a round function, the difference is spread
to 4 bytes at the round output. Figure 5.13 shows the flow of fault.
The induced fault has generated a single byte difference at the input of the 9th
round Mix-Columns. Let f be the byte value of the difference and the correspond-
ing 4-byte output difference is (2f , f , f , 3f ), where 2, 1, and 3 are the elements of the
first row of the Mix-Columns matrix. The 4-byte difference is again converted to
(f0 , f1 , f2 , f3 ) by the non-linear S-box operation in tenth round. The Shift-Rows oper-
ation will shift the differences to 4 different locations. The attacker has access to the
fault-free ciphertext C and faulty ciphertext C∗ , which differs only in 4 bytes. Now,
we can represent the 4-byte difference (2f , f , f , 3f ) in terms of the tenth round key
K 10 and the fault-free and faulty ciphertexts by the following equations:
S1
MixCol
(9)th Round
2f
f
S2
f
3f
K9
SubByte
ShiftRow
th
10 Round f0
f1
f2 S3
f3
K10
Therefore, the reduced search space is given by ( 218 )M ⋅ (28 )N = (28 )N−M . For our
case, we have four equations which consist of five unknown variables: f , K0,0 10 10
, K1,3 ,
10 10
K2,2 , and K3,2 . Therefore, the four equations will reduce the search space of the vari-
ables to (28 )5−4 = 28 . That means out of 232 hypotheses of the 4 key bytes, only 28
hypotheses will satisfy the above 4 equations. Therefore, using one fault the attacker
can reduce the search space of the 4 key byte to 28 . Using two such faulty ciphertexts
one can uniquely determine the key quartet. For one key quartet one has to induce 2
faults in the required location. For all the 4 key quartets, i.e., for the entire AES key
an attacker thus needs to induce 8 faults. Therefore using 8 faulty ciphertexts and a
fault-free ciphertext, it is expected to uniquely determine the 128-bit key of AES.
The attack can further be improved. It was shown in [36] that instead of inducing
fault in 9th round, if we induce fault in between 7th and 8th round Mix-Columns, we
can determine the 128-bit key using only 2 faulty ciphertexts. Figure 5.14 shows the
spreading of faults when it is induced in such a fashion. The single byte difference at
5 Security of Crypto IP Core: Issues and Countermeasures 97
S1
MixCol
8th Round
2p
p
S2
p
3p
K8
SubByte
ShiftRow
p0
p1
p2 S3
p3
MixCol
9th Round
2p0 p3 p2 3p1
p0 p3 3p2 2p1
S4
p0 3p3 2p2 p1
3p0 2p3 p2 p1
K9
SubByte
ShiftRow
th
10 Round
K10
98 D.B. Roy and D. Mukhopadhyay
the input of 8th round Mix-Columns is spread to 4 bytes. The Shift-rows operation
ensures that there is one disturbed byte in each column of the state matrix. Each
of the 4-byte difference again spreads to 4 bytes at 9th round Mix-Columns output.
Therefore, the relation between the fault values in the 4 columns of difference-state
matrix S4 is equivalent to 4 faults at 4 different columns of 9th-round input-state
matrix as explained in the previous attack. This implies that using 2 such faults we
can uniquely determine the entire AES key.
Note that the exact working of the DFA proposed in [36] is slightly different from
above, though the underlying principle is the same. The attack maintains a list for
each column of the difference matrix S4 assuming a one-byte fault in the input of
the penultimate round Mix-Columns. The size of the table is thus 4 × 255 4-byte
values, as the input fault can occur in any byte of a column and can take 255 non-zero
values. Assuming that the fault occurs in the difference matrix S3 in the first column,
then equations similar to equation (5.25) can be written, with the left hand side of the
equations being a 4-byte tuple (𝛥0 , 𝛥1 , 𝛥2 , 𝛥3 ). It is expected that the correct guess of
the keys K0,010
, K1,3
10
, K2,2
10 10
, and K3,2 should provide a 4-byte tuple which belongs to the
list . There are other wrong keys which also pass this test, and analysis shows that
on an average 1036 elements pass this test with a single fault. Repeating the same
for all the 4-columns of the difference matrix S4 reduces the AES key to 10364 ≈ 240
(note that as the fault is assumed to be between 7th and 8th round each column of
S3 has a byte disturbed). However, if 2 faults are induced then with a probability of
0.98 the unique AES key is returned.
This is the best-known DFA of AES to date when the attacker does not have access
to the plaintext and the attacker needs to determine the key uniquely. However with
access to the plaintexts, the attacker can still improve the attack by performing the
DFA using only fault and a further reduced brute force guess. Also it is possible to
reduce the time complexity of the attack further from 232 to 230 .
When the attacker has access to the plaintexts in addition to the ciphertexts [43],
the attacker can do brute-force on the possible keys. The objective of this attack or
its extensions is to perform the attack using only one fault. While a unique key may
not be obtainable with a single fault, the AES key size can reduce to such a small size
that a brute force search can be easily performed. It may be noted that reducing the
number of fault requirements from 2 to 1 should not be seen in terms of its absolute
values. In an actual fault attack, it is very unlikely that the attacker can have absolute
control over the fault injection method and hence may need more trials. Rather these
attacks are capable of reducing the number of fault requirements by half compared
to the attacks proposed in [36].
This attack is comprised of two phases: the first phase reducing the key space of
AES to around 236 values, while the second phase reducing it to around 28 values.
Consider the Fig. 5.14, where from the first column of S4 we get following 4 dif-
ferential equations:
5 Security of Crypto IP Core: Issues and Countermeasures 99
In the above 4 differential equation we only guess the 28 values of p0 and get the
corresponding possible 28 hypotheses of the key quartet by applying the S-box dif-
ference distribution table. Therefore, one column of S4 will reduce the search space
of one quartet of key to 28 choices. Similarly, solving the differential equations from
all the 4 columns we can reduce the search space of all the 4 key quartets to 28 values
each. Hence, if we combine all the 4 quartets we get (28 )4 = 232 possible hypotheses
of the final round key K 10 . We have assumed here that the initial fault value was in
the (0, 0)th byte of S1 . If we allow the fault to be in any of the 16 locations, the key
space of AES is around 236 values. This space can be brute-force-searched within
practical time and hence shows that effectively one fault is sufficient to reduce the
key space to practical limits.
The search space of the final round key can be further reduced if we consider the
relation between the fault values at the state matrix S2 , which was not utilized in the
previous attacks. This step serves as a second phase, which is coupled with the first
stage on all the 232 keys (for an assumed location of the faulty byte). Hence using
only one faulty ciphertext one can reduce the search space of AES-128 key to 256
choices. However, the time complexity of the attack is 232 as we have to test all the
hypothesis of K 10 . The time complexity of the attack can however be improved to
230 by exploiting a property of the key-schedule.
In FPGA implementations, the key schedule operation is often done prior to the AES
encryption and the round keys are stored in RAM which is being read out for each of
the 10 rounds of AES-128 implemented in iterative fashion for the first nine rounds.
An AES round includes all the four transformation operations in a combinatorial
logic. The S-box in SubBytes module is implemented in a look-up table fashion.
Figure 5.15a shows the block diagram of 32-bit AES-128 key schedule. The four 32-
bit registers R0, R1, R2, R3 hold the four words of the round keys (word represents
a data of 32 bit width). As the design is 32-bit, therefore only one word of the AES
round key is generated in each clock cycle. In first four clock cycles, the select1 and
select0 lines will load the initial AES key into the four registers. In the subsequent
clock cycles the select0 line will load the value of output register Wi to R0, R1, R2,
R3. The value of Wi will be stored in one of the four registers depending on the write
enable signals WR0–WR3 of these registers. SW and RW in figure represent the
SubWord and RotWord operations of AES key schedule [44]. In each cycle one
100 D.B. Roy and D. Mukhopadhyay
select0
WR3
32
1
R3
select1 0
Wselect Spartan 3E
32 WR2 Arbitrary
1 32 Function Generator
32 0 R2 3
32 32
Tektronix AFG3252
MUX0
32
2
Glitched
WR1 CLK CLK
AES
32 1
Initial Key
1
32
R1
MUX
0
SW RW 3
32
MUX1
2
PC ChipScope
1
Pro
0
32 JTAG
CLK
word of round key is generated and stored in register Wi and ultimately stored in the
RAM. Therefore, in the first 44 clock cycles, all the ten round keys are generated.
Figure 5.15b shows the fault injection setup. We use two clocks CLK and
FAST_CLK, generated from Tektronix AFG3252 arbitrary function generator. One
is the normal clock (CLK) and the other one is a fast clock. The trigger generator
generates the CLK_SEL signal which initially being low selects CLK. At the begin-
ning of eighth round, the trigger generator makes the CLK_SEL signal high for one
clock pulse which selects FAST_CLK and thus generates glitch in the clock line.
This creates setup time violation in the path LP1 ∶ R0 → SW → RW → MUX1 →
XOR → Wi , which results in faulty data in register Wi depending on time period
(glitch width) of the fast clock. This is the critical (longest) path in the key sched-
ule module of AES-128 architecture. Let the other long paths in decreasing order
of lengths and affected by timing violations be LP2, LP3 and so on which we will
use later on. As the fault is generated during key schedule operation, this fault gets
propagated to the subsequent round keys. The architecture of the AES-128 cipher
is implemented using Verilog HDL in Xilinx Spartan-3E FPGA XC3S500E device
with input clock CLK at an operating frequency of 36 MHz. We used ChipScope Pro
7.1 analyzer to observe the faulty output.
In the experimental setup, the frequency of CLK is held constant at 36 MHz as the
operating frequency of the AES while the FAST_CLK is increased in steps of 1
MHz from 36 MHz onwards to generate faults in the AES key schedule [45]. In
each step we run 512 encryptions and collect the samples through ChipScope. Until
85 MHz no fault occurs in register Wi . The experimental observations to follow are
specific to our hardware implementation of AES-128. The specific observations of
5 Security of Crypto IP Core: Issues and Countermeasures 101
of Same Type
400 6
300 4
200
2
100
0 0
85 90 95 100 105 110 115 120 85 90 95 100 105 110 115 120
fault occurrences in other implementations may vary but the trend and comparisons
across different types of faults are similar.
From 87 MHz onwards faults start appearing whose distribution can be seen in
Fig. 5.16a. Initially only single-byte faults occur. The number of samples with single-
byte faults increases with the increase in FAST_CLK frequency. At 93 MHz, all the
512 samples in the trace are infected in one byte. The first fault appears in one of
the bits (corresponding to LP1) of the 4-byte Wi register. When the FAST_CLK fre-
quency is further increased, the next fault occurs in another bit (corresponding to
LP2). If LP1 and LP2 are in the same byte then multiple 1-byte faults occur else it
is a 2-byte fault. But the probability of fault occurrence in the same byte ( 8−1
31
, i.e.,
7
31
)is lesser than the probability of fault occurrence in other bytes ( 32−831
, i.e., 24
31
).
In the overlapping region of 1-byte and 2-byte fault in Fig. 5.16a, the next faulty bit
occurs in a different byte which causes a 2-byte fault in some of the samples. The
same argument applies to other overlapping regions between 2-byte fault and 3-byte
fault and between 3-byte and 4-byte fault.
From this distribution of different faults it is obvious that initially only 1-byte
faults occur and beyond a frequency range (in our case it is 119 MHz) all the samples
have 4-byte faults. From the observed results in Fig 5.16a, it is seen that beyond some
upper limit frequency all bytes can be corrupted by the glitch. Therefore, generating
an all-byte fault is much easier than generating a 1-byte.
We observed that the number of instances of different types of faults increases with
the increase in fault-width of the fault model, i.e., the number of faulty bytes in the
fault model. As shown in Fig. 5.16b, in case of single byte fault model, we get only
one instance of the faults. This can be attributed to the fact that when some samples
suffer timing violation in the two long paths LP1 and LP2, the paths occur in two
different output bytes rather than in the same output byte which is more probable.
With increasing operating frequency, the number of instances of 2-byte faults
gradually increases. This means that the next few long paths, for example LP3,
LP4, … , LPk2 , (where k2 refers to the final long path infected with 2-byte fault)
which suffer timing violation lie in the same two bytes corresponding to LP1, LP2.
Also from Fig. 5.16b, the number of different instances of 2-byte faults increases as
the operating frequency increases beyond 101-MHz. As the FAST_CLK frequency
is increased further, we see that the number of instances of 2-byte faults reduces
to one and 3-byte faults increases. This happens because all the instances of tim-
ing violations LP1, LP2, … , LPk2 , occur simultaneously and each of the separate
individual occurrence of previous fault instances disappear from the samples. Here
ki is a integer number, where 2 ≤ i ≤ 4. At the same time some new long paths
LP(k2 + 1), … , LPk3 , suffered from timing violation in some of the samples. Each
of these long paths leads to a faulty byte which is different from the faulty bytes con-
tained in the 2-byte fault. If frequency is increased in such a way, we observe that
after certain frequency range only 4-byte fault instances exist. For even higher levels
of frequencies, we see increasingly more instances of 4-byte faults. This is observed
since more and more paths LP(k3 + 1) … , LP(k4 ), suffer timing violations leading
to more affected faulty bytes and this observation is seen in almost all samples as we
keep on increasing the FAST_CLK frequency. Also from Fig. 5.16b, the number of
different types of 4-byte faults increases beyond 113 MHz and the number of such
different instances is the highest amongst all the different fault patterns. To sum up,
we see that 1-byte faults are seen to exist within a limited operating frequency win-
dow (only single instance been seen). Beyond this frequency window only multi-byte
faults occur and after a certain maximum operating frequency, only 4-byte faults and
that too with different instances exist in abundance. The existence of such numer-
ous different instances of 4-byte faults in the experiments makes this class of multi
byte faults the most effective to perform DFA of AES Key schedule as we can have
numerous faulty ciphertexts at a particular frequency and hence multiple equations
for many unknown variables in terms of faulty ciphertexts as revealed in the next
subsection.
From the experimental results in the previous subsection it is observed that 4-byte
fault in the AES key schedule can be easily injected using methods like clock glitch-
ing. In this subsection, we present DFA on AES-192 and AES-256 key schedule
[45]. The challenge in 4-byte fault model compared to single byte fault model is that
5 Security of Crypto IP Core: Issues and Countermeasures 103
it induces more number of unknown variables in the differential equations that need
to be solved in order reduce the search space of the key. Especially when the fault
is induced in the key schedule, the challenge increases manifold due to the diffu-
sion in the key schedule of AES. When it comes to AES-192 and AES-256, we need
to find two round keys in-order to get the master key which makes the job of the
attacker more difficult as he cannot directly apply the technique of AES-128 which
only retrieves the final round key. In this work we propose new technique which
shows that 4-byte fault model can also be used against AES-192 and AES-256 with
relatively lesser number of fault inductions compared to DFA using single byte fault
model in [46].
The proposed attack on AES-192 requires only two faulty ciphertexts to reduce
the search space to 232 . The flow of fault in AES-192 key schedule is shown in
Fig. 5.17a. Figure 5.17b shows the corresponding flow of fault in AES-192 states.
Here, a, b, c, d, e, f , g, and h are the fault values. We have two different faulty cipher-
texts C1∗ and C2∗ , based on above figures, corresponding to which we have two sets
of values (a1 , b1 , c1 , d1 ) and (a2 , b2 , c2 , d2 ) of (a, b, c, d).
Consider the first row of S0 , we can represent these values of a in terms of fault-
free and faulty ciphertexts and the final round key K 12 . Therefore, we get a set of four
equations corresponding to four key bytes of K 12 . Using two faulty ciphertexts we
get two sets of equations. This is also true for the other three rows of S0 . It may be
observed that each set of equations consists of two variables of (a, b, c, d, e, f , g, h)
and four key bytes of K 12 . For example, equations from the first row will have (a, e)
and first row bytes of K 12 . We solve these equations in row-wise fashion.
For the first row equations, we guess a1 , e1 , e2 , and get the values of four key bytes
from the first set of equations (of C1 ). We test each of these values by the second set
of equations (of C2 ). Only, the right candidates of the four key bytes will satisfy both
a a a a
Round11 b b b b
c c c c
a a a a d d d d
Round12 b b b b K11
RotWord
c c
S0
c c
SubWord a a a a K11
b b b b d d d d
Rcon
c c c c
d d d d SubBytes K12
RotWord
e e e⊕a e
ShiftRows
f f f⊕b f
SubWord e e e⊕a e
f f f⊕b f K12 g g g⊕c g
Rcon
g g e⊕c g
h h h⊕d h h h h⊕d h
(a) Flow of fault in the AES-192 key schedule, (b) Flow of fault in the last
from the first column of K 11 two rounds of AES-192
the sets of equations [29, Sect. 3.1]. We apply, the same technique for the rest of the
three rows of S0 and uniquely determine the values of corresponding key quartets.
Thus, combining all the four quartets we get K 12 .
It may be observed in Fig. 5.17a, that in order to get the master key we need K 12
and the last two columns of K 11 . For the last column of K 11 , we consider the relations
between the fault values in Fig. 5.17b. The two sets of fault values (a, b, c, d) and
(e, f , g, h) in Fig. 5.17a, are related by the following equations:
11
e = SB(K1,3 ) ⊕ SB(K1,3
11
⊕ b) f = SB(K2,3
11
) ⊕ SB(K2,3
11
⊕ c)
11
g = SB(K3,3 ) ⊕ SB(K3,3
11
⊕ d) h = SB(K0,3
11
) ⊕ SB(K0,3
11
⊕ a),
r
where Ki,j is the (i, j) byte of the r-th round key K r . We have two faulty ciphertexts
corresponding to which we get two sets of equations. In these two sets of equations
the two sets of values of a, b, c, d, e, f , g, h are already determined while retrieving
K 12 . Therefore, using the two sets of values of the variables we can uniquely deter-
mine the four key bytes (K1,3 11
, K2,3
11
, K3,3
11
, K0,3
11
). Hence, using two faulty ciphertexts the
attack reduces the search space of the AES-192 key to 32 bits which is in practical
search limits.
Information Redundancy
In this technique, the input message is encoded to generate a few check bits, and
these bits are propagated along with the input message. These check bits are derived
from the output message and compared with the check bits encoded from the input.
Three information redundancy techniques are discussed below:
1. Parity-1: One can use single bit parity for the entire 128-bit state, and the parity
bit is checked once for the entire round [48].
2. Parity-16: One parity bit can be generated for each input byte. While some
parity-16 techniques depend on the S-box implementations [49, 50], a general
parity formation is proposed in [51].
3. Robust Code: The parity code suffers from nonuniform fault coverage [52],
e.g., parity-16 cannot detect an even number of faulty bits in a byte. Robust
5 Security of Crypto IP Core: Issues and Countermeasures 105
code provides uniform fault coverage for all types of faults [52]. It uses a pre-
diction circuit at round input to predict a nonlinear property of the round output
as shown in Fig. 5.18b. The prediction circuit has a linear predictor (L-Predict),
linear compressor (L-Compress), and a cubic function (Cubic). The L-Predict
will take the round key and the round input and generate a 32-bit output. The L-
Compress and cubic function will reduce the 32-bit data into 28 bits. There are
three components at the round output to extract the nonlinear property of the out-
put: the compressor (Compress), the linear compressor, and the cubic function.
Each byte of the compressor output L(j) is equivalent to the component-wise
106 D.B. Roy and D. Mukhopadhyay
XOR of four bytes of the same column. The output of the linear predictor Ll (j)
is the same as the output of the compressor.
Time Redundancy
The function is computed twice with the same input, and results are compared as
shown in Fig. 5.18c. One redundant cycle is required to check each round. Time
redundancy cannot detect permanent and transient faults that appear in both normal
and redundant computations.
A time redundancy is proposed in [53]. The design simply recomputes the input
and compares the results. A variation of the time redundancy is proposed in [54].
The function is computed on both clock edges to speed up the computation.
Hardware Redundancy
The circuit is duplicated, and both original and duplicated circuits are fed with the
same inputs and the outputs are compared as shown in Fig. 5.18d. Hardware redun-
dancy technique offers high fault coverage against both random faults [53], but it
may be bypassed by an attacker who can inject the same faults in both copies of the
hardware.
Hybrid Redundancy
Hybrid redundancy techniques combine the characteristics of the previous CED cat-
egories, and they often explore certain properties in the underlying algorithm and/or
implementation. In [55], an operation, a round, or the entire encryption is followed
by its inverse, and the results are compared with the original input. The detail is
shown in Fig. 5.18e.
In Recomputing with Permuted Operands (REPO) [56] the authors discover a
special invariance of AES and use it to detect faults (Fig. 5.18f). First, the data is
computed usually. Then, the same data is permuted and computed. After the results
are inverse permuted, the result should be the same as without any permutation.
Redundant rounds are inserted in the encryption. In each redundant round, the input
data is permuted and AES computes the permuted data. Then, the round output is
inverse permuted and compared with the original output. Any mismatch shows that
faults are detected. REPO provides close to 100 % fault coverage to both permanent
and transient faults.
Thus in this section we have provided a brief discussion on fault analysis along
with the countermeasures to protect the crypto-system against it. In the next section
we are going to discuss about the vulnerabilities that may arise in the crypto-system
due to conventional chip testing mechanisms.
5 Security of Crypto IP Core: Issues and Countermeasures 107
Reliability of devices has become a serious concern, with the increase of complexity
of ICs and the advent of deep sub-micron process technology. The growth of appli-
cations of cryptographic algorithms and their requirement for real time-processing
has necessitated the design of crypto-hardware. But along with the design of such
devices, testability is a key issue. What makes testability of these circuits more chal-
lenging compared to other digital designs, is the fact that popular design for testa-
bility (DFT) methodologies, such as scan chain insertion, can be used as a double-
edged sword. Scan chain insertion, which is a desirable test technique owing to its
high fault coverage and small hardware overhead, open “side channels” for crypt-
analysis [57, 58]. Scan chains are used to access intermediate values stored in the
flip-flops, thereby ascertaining the secret information, often the key. Conventional
scan chains fail to solve the conflicting requirements of effective testing and security
[57]. So, one of the solutions that have been suggested is to blow off the scan chains
from the crypto ICs, before they are released into the market. But such an approach
is unsatisfactory and directly conflicts the paradigm of DFT. In order to solve this
problem of efficiently testing cryptographic ICs several research works have been
proposed.
a
S1 ... Sn
Diffusion Layer
Register b
To Next Rounds
108 D.B. Roy and D. Mukhopadhyay
The security of the block cipher is obtained due to the properties of the round
function as shown in Fig. 5.19, and the number of rounds in the cipher. However,
when the design is prototyped on a hardware platform, and a scan chain is provided
to test the design, the attacker uses the scan chain to control the input patterns, and
observe the intermediate values in the output patterns. The security of the block
cipher is thus threatened, as the output after few rounds is revealed to the adversary.
The attacker then analyzes the data and applies conventional cryptanalytic methods
on a much lessened cipher [59].
We next summarize the scan based attack, w.r.t. Fig. 5.19. Without loss of gener-
ality, let us assume that the S-boxes are byte-wise mappings, though the discussion
can be easily adapted for other dimensions. The attack observes the propagation of
a disturbance in a byte through a round of the cipher. If one byte of the plaintext is
affected, say p0 , then one byte of a, a0 gets changed (see figure). The byte passes
through an S-box and produces an output, which is diffused in the output register,
b0 .
The diffusion layer of AES-like ciphers are characterized by a property called as,
branch number, which is the minimum total number of disturbed bytes at the input
and output of the layer. For example, the MixColumns step of AES has a branch num-
ber of 5, indicating that if b1 input bytes are disturbed at the input of MixColumns,
resulting in b2 bytes at the output which get affected, then b1 + b2 ≥ 5.
Depending upon the branch number of the diffusion layer, the input disturbance
spreads to say t-number of output bits in the register b. The attacker tries to exploit
this property to first ascertain the correspondence of the flip-flops of register b, with
the output bits in the scan-out pattern. Next the attacker applies a one-round differ-
ential attack to determine the secret key.
1. The attacker first resets the chip and loads the plaintext p and the key k, and
applies one normal clock cycle. The XOR of p and k is thus transformed by the
S-boxes and the diffusion layers, and is loaded into the register b.
2. The chip is now switched to the test mode and the contents of the flip-flops are
scanned out. The scanned out pattern is denoted by TP1 .
3. Next, the attacker disturbs one byte of the input pattern and repeats the above
two steps. In this case, the output pattern is TP2 .
It may be observed that if the attacker observes the difference between TP1 and
TP2 , the attacker can observe the positions of the contents of the register b. The
ones in the difference are all because of the contents of register b. In order to better
observe all the bit positions of register b, the attacker repeats the process with further
differential pairs. There can be a maximum 256 possible differences in the plaintext
byte being changed. However, the ciphers satisfy avalanche criteria, which states
that if one input bit is changed, on an average at least half of the output bits get
modified. Thus in most cases because of this avalanche criteria of the round, much
fewer plaintexts are necessary to obtain the locations of all the registers b.
However, the attacker has only ascertained the location of the values of the register
b in the scanned-out patterns. But, it is surely an unintended leakage of information.
5 Security of Crypto IP Core: Issues and Countermeasures 109
For example, the difference of the scanned out patterns giving away the Hamming
distance after one round of the cipher.
The attacker now studies the properties of the round structures. The S-box is a
non-linear layer, with the property that all possible input and output pairs are not
possible. As an example, for the present day standard cipher, the Advanced Encryp-
tion Standard (AES), given a possible input and output pair, on an average one value
of the input to the S-box is possible. That is, if the input to the S-box, denoted by S
is x, and the input and output differentials are 𝛼, 𝛽, then there is one solution on an
average to the equation:
𝛽 = S(x) ⊕ S(x ⊕ 𝛼)
⎛𝛼 0 0 0⎞ ⎛𝛽 0 0 0⎞ ⎛ 2𝛽 0 0 0⎞
⎜0 0 0 0⎟
⇒ ⎜0 0 0 0⎟
⇒ ⎜ 3𝛽 0 0 0⎟
⎜0 0 0 0⎟ ⎜0 0 0 0⎟ ⎜𝛽 0 0 0⎟
⎝0 0 0 0⎠ ⎝0 0 0 0⎠ ⎝𝛽 0 0 0⎠
Thus the attacker knows that the differential in the scanned-out patterns, TP1 and
TP2 , has the above property. That is there are 4 bytes in the register b, denoted by
d0 , d1 , d2 , d3 , such that:
The attacker in the previous attack has ascertained the positions of the 32 bits of
the register b in the scanned out pattern. But he does not know the correct pattern. The
above property says that the correct pattern will satisfy the above property. If w is the
number of ones in the XOR of TP1 and TP2 , then there are 32Cw possible patterns. Out
of that, the correct one will satisfy the above property. The probability of a random
string satisfying the above property is 2−24 . Thus, if w = 24 as an example, then the
number of satisfying permutations is 32C4 × 2−24 ≈ 1. Thus, there is a single value
which satisfies the above equations. This helps the attacker to get the value of 𝛽.
The attacker already knows the value of 𝛼 from the plaintext differential. Thus, the
property of the S-box ensures that there is on an average one single value of the input
byte of the S-box. Thus, the attacker gets the corresponding byte for a (see figure).
The attacker then computes one byte of the key by XORing the plaintext, p0 with the
value of the byte a0 , that is, k0 = p0 ⊕ a0 .
110 D.B. Roy and D. Mukhopadhyay
The remaining key bytes may be similarly obtained. In the literature, there are
several reported attacks on the standard block ciphers, namely DES and AES [60],
but all them follows the above general ideas of attacking through controllability and
observability through scan chains.
Countermeasures
An interesting alternative and one of the best method was proposed in [61, 62]
where a secure scan chain architecture with mirror key register was used to provide
both testability and security. Figure 5.20 shows the diagram of the secure scan archi-
tecture. The design uses the idea of a special register called as the mirror key register
(MKR), which is loaded with the key, stored in a separate register, during encryp-
tion. However during encryption, the design is in a secure mode and the scan chains
are disabled. When the design is in the test mode, the design is in the insecure mode
and the scan chains are enabled. During this time the MKR is detached from the key
register. The transition from the insecure mode to the secure mode happens, by set-
ting the Load_key signal high and the Enable_scan_in and Enable_scan_out signals
low. However the transition from the secure mode to the insecure mode happens only
through a power_off state and reversing the above control signals. It is expected that
the power off removes the content of the MKR, and thus does not reveal the key to a
scan-chain based attacker.
But this method has the following shortcomings:
∙ Security is derived from fact that switching off power destroys the data in registers.
So, if the secret is permanently stored on-chip (example credit cards, cell-phone
simcards, access cards) even after turning the power off the information exists
inside the chip. This can be extracted from a device having such a scan chain in
the insecure mode.
∙ At-speed testing or on-line testing is not possible with this scheme.
∙ The cryptographic device can be a part of a critical system that remains ON contin-
uously (like satellite monitoring system). In such devices power off is not possible.
Hence testing in such a scenario requires alternative solutions.
Crypto
Enable_scan_in
Core Load
Test Scan_mode
M Key K
Controller Logic K E
Enable_scan_out R Y
Fig. 5.20 Secure scan architecture with mirror key register [61]
5 Security of Crypto IP Core: Issues and Countermeasures 111
∙ One of the most secured mode of operation of a block cipher like AES is Cipher
Block Chaining (CBC) where the ciphertext at any instant of time depends on the
previous block of ciphertext [60]. If testing is required at an intermediate stage
then the device needs to be switched off. Thus for resuming data encryption all
the previous blocks have to be encrypted again. This entire process has also to
be synchronized with the receiver which is decrypting the data. Therefore such
modes of block ciphers cannot be tested efficiently using this scheme.
5.6 Conclusions
The chapter shows the security issues involved in designing IP cores for crypto-
graphic algorithms. It shows the threats that loom over the designs of even standard
cryptographic algorithms when conventional design approaches are adopted through
the availability of several side channel sources. It starts with the underlying principles
of power analysis and discusses in details on several forms of power attacks. Mitiga-
tion schemes and also evaluation strategies for such attacks are also discussed along-
with. The chapter also discusses on fault attacks of crypto-cores, taking examples of
the AES core. Laboratory results have been furnished throughout to demonstrate the
practicality of such attack vectors. Various redundancy architectures to prevent such
fault attacks are presented. Finally, a discussion on threats from adoption of conven-
tional testing schemes have been provided along with a popular method to improve
resistance against such menacing threats on the crypto-IC.
References
11. Park, J., Tyagi, A.: t-private systems: unified private memories and computation. In:
Chakraborty, R., Matyas, V., Schaumont, P. (eds.) Security, Privacy, and Applied Cryptography
Engineering. Lecture Notes in Computer Science, vol. 8804, pp. 285–302. Springer Interna-
tional Publishing (2014)
12. Gomathisankaran, M., Tyagi, A.: Glitch resistant private circuits design using HORNS. In:
IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2014, Tampa, FL, USA, July
9–11, 2014, pp. 522–527 (2014)
13. Park, J., Tyagi, A.: Towards making private circuits practical: DPA resistant private circuits.
In: IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2014, Tampa, FL, USA,
July 9–11, 2014, pp. 528–533 (2014)
14. Wong, M., Wong, M., Hijazin, I., Nandi, A.: Composite field GF(((22)2)2) AES s-box with
direct computation in gf(24) inversion. In: 2011 7th International Conference on Information
Technology in Asia (CITA 11), pp. 1–6 (2011)
15. Trichina, E.: Combinational logic design for AES subbyte transformation on masked data.
IACR Cryptol. ePrint Arch. 2003, 236 (2003)
16. Mangard, S., Popp, T., Gammel, B.: Side-channel leakage of masked cmos gates. In: Menezes,
A. (ed.) Topics in Cryptology CT-RSA 2005. Lecture Notes in Computer Science, vol. 3376,
pp. 351–365. Springer, Berlin (2005)
17. Clavier, C., Feix, B., Gagnerot, G., Roussellet, M., Verneuil, V.: Improved collision-correlation
power analysis on first order protected aes. In: Preneel, B., Takagi, T. (eds.) Cryptographic
Hardware and Embedded Systems CHES 2011. Lecture Notes in Computer Science, vol. 6917,
pp. 49–62. Springer, Berlin Heidelberg (2011)
18. Moradi, A.: Statistical tools flavor side-channel collision attacks. In: Pointcheval, D., Johans-
son, T. (eds.) Advances in Cryptology EUROCRYPT 2012. Lecture Notes in Computer Sci-
ence, vol. 7237, pp. 428–445. Springer, Berlin (2012)
19. Moradi, A., Mischke, O.: How far should theory be from practice? In: Prouff, E., Schaumont,
P. (eds.) Cryptographic Hardware and Embedded Systems CHES 2012. Lecture Notes in Com-
puter Science, vol. 7428, pp. 92–106. Springer, Berlin (2012)
20. Hajra, S., Rebeiro, C., Bhasin, S., Bajaj, G., Sharma, S., Guilley, S., Mukhopadhyay, D.:
DRECON: DPA resistant encryption by construction. In: Pointcheval, D., Vergnaud, D. (eds.)
Progress in Cryptology - AFRICACRYPT 2014 - 7th International Conference on Cryptology
in Africa, Marrakesh, Morocco, May 28–30, 2014. Proceedings. Lecture Notes in Computer
Science, vol. 8469, pp. 420–439. Springer (2014)
21. Liskov, M., Rivest, R.L., Wagner, D.: Tweakable block ciphers. In: Yung, M. (ed.) CRYPTO.
Lecture Notes in Computer Science, vol. 2442, pp. 31–46. Springer (2002)
22. Medwed, M., Standaert, F.-X., Großschädl, J., Regazzoni, F.: Fresh re-keying: security against
side-channel and fault attacks for low-cost devices. In: Bernstein, D.J., Lange, T. (ed.)
AFRICACRYPT. Lecture Notes in Computer Science, vol. 6055, pp. 279–296. Springer (2010)
23. Guilley, S., Sauvage, L., Flament, F., Vong, V.-N., Hoogvorst, P., Pacalet, R.: Evaluation of
power constant dual-rail logics countermeasures against DPA with design time security met-
rics. IEEE Trans. Comput. 59(9), 1250–1263 (2010)
24. Research Center for Information Security National Institute of Advanced Industrial Science
and Technology. Side-channel Attack Standard Evaluation Board SASEBO-GII Specification
(Version 1.01) (2009)
25. Shah, S., Velegalati, R., Kaps, J.-P., Hwang, D.: Investigation of DPA resistance of block RAMs
in cryptographic implementations on fpgas. In: Prasanna, V.K., Becker, J., Cumplido, R. (eds.)
ReConFig, pp. 274–279. IEEE Computer Society (2010)
26. Standaert, F.-X., Malkin, T., Yung, M.: A unified framework for the analysis of side-channel
key recovery attacks. In: Joux, A. (ed.) EUROCRYPT. Lecture Notes in Computer Science,
vol. 5479, pp. 443–461. Springer (2009)
27. Ali, S., Chakraborty, R.S., Mukhopadhyay, D., Bhunia, S.: Multi-level attacks: an emerging
security concern for cryptographic hardware. In: DATE, pp. 1176–1179. IEEE (2011)
28. Biham, E., Shamir, A.: Differential fault analysis of secret key cryptosystems. In: Kaliski Jr.,
B.S. (eds.) CRYPTO. Lecture Notes in Computer Science, vol. 1294, pp. 513–525. Springer
(1997)
5 Security of Crypto IP Core: Issues and Countermeasures 113
29. Tunstall, M., Mukhopadhyay, D., Ali, S.S.: Differential fault analysis of the advanced encryp-
tion standard using a single fault. In: Ardagna, C.A., Zhou, J. (eds.) WISTP. Lecture Notes in
Computer Science, vol. 6633, pp. 224–233. Springer (2011)
30. Biham, E., Shamir, A.: Differential fault analysis of secret key cryptosystems. In: Proceedings
of Eurocrypt. Lecture Notes in Computer Science, vol. 1233, pp. 37–51 (1997)
31. Blömer, J., Seifert, J.-P.: Fault based cryptanalysis of the advanced encryption standard (AES).
In: Financial Cryptography, pp. 162–181 (2003)
32. Giraud, C.: DFA on AES. In: IACR e-print archive 2003/008, p. 008. http://eprint.iacr.org/
2003/008 (2003)
33. Moradi, A., Shalmani, M.T.M., Salmasizadeh, M.: A generalized method of differential fault
attack against AES cryptosystem. In: CHES, pp. 91–100 (2006)
34. Mukhopadhyay, D.: An improved fault based attack of the advanced encryption standard. In:
AFRICACRYPT, pp. 421–434 (2009)
35. Dusart, G.L.P., Vivolo, O.: Differential fault analysis on AES. In: Cryptology ePrint Archive,
pp. 293–306 (2003)
36. Piret, G., Quisquater, J.: A differential fault attack technique against SPN structures, with appli-
cation to the AES and Khazad. In: CHES, pp. 77–88 (2003)
37. Saha, D., Mukhopadhyay, D., Chowdhury, D.R.: A diagonal fault attack on the advanced
encryption standard. IACR Cryptol. ePrint Arch. 581 (2009)
38. Tunstall, M., Mukhopadhyay, D., Ali, S.: Differential fault analysis of the advanced encryption
standard using a single fault. In: WISTP, pp. 224–233 (2011)
39. Agoyan, M., Dutertre, J.-M., Naccache, D., Robisson, B., Tria, A.: When clocks fail: on critical
paths and clock faults. In: CARDIS, pp. 182–193 (2010)
40. Barenghi, A., Hocquet, C., Bol, D., Standaert, F.-X., Regazzoni, F., Koren, I.: Exploring the
feasibility of low cost fault injection attacks on sub-threshold devices through an example of a
65 nm AES implementation. In: Proceedings of Workshop RFID Security Privacy, pp. 48–60
(2011)
41. Khelil, F., Hamdi, M., Guilley, S., Danger, J.L., Selmane, N.: Fault analysis attack on an AES
FPGA implementation. In: ESRGroups, pp. 1–5 (2008)
42. Selmane, N., Guilley, S., Danger, J.-L.: Practical setup time violation attacks on AES. In: Euro-
pean Dependable Computing Conference, pp. 91–96 (2008)
43. Mukhopadhyay, D.: An improved fault based attack of the advanced encryption standard. In:
Preneel, B. (ed.) AFRICACRYPT. Lecture Notes in Computer Science, vol. 5580, pp. 421–
434. Springer (2009)
44. National Institute of Standards and Technology: Advanced Encryption Standard. NIST FIPS
PUB 197 (2001)
45. Ali, S., Mazumdar, B., Mukhopadhyay, D.: A fault analysis perspective for testing of secured
soc cores. IEEE Des. Test 30(5), 63–73 (2013)
46. Kim, C.H.: Improved differential fault analysis on AES key schedule. IEEE Trans. Inf. Foren-
sics Secur. 7(1), 41–50 (2012)
47. Guo, X.: Fault Attacks and Countermeasures on Symmetric/Key Cryptographic Algorithms.
Ph.D. thesis
48. Wu, K., Karri, R., Kuznetsov, G., Goessel, M.: Low cost concurrent error detection for the
advanced encryption standard. In: ITC, pp. 1242–1248 (2004)
49. Bertoni, G., Breveglieri, L., Koren, I., Maistri, P., Piuri, V.: Error analysis and detection proce-
dures for a hardware implementation of the advanced encryption standard. IEEE Trans. Com-
put. 52(4), 492–505 (2003)
50. Mozaffari-Kermani, M., Reyhani-Masoleh, A.: A lightweight high-performance fault detection
scheme for the advanced encryption standard using composite field. IEEE Trans. VLSI Syst.
19(1), 85–91 (2011)
51. Mozaffari-Kermani, M., Reyhani-Masoleh, A.: Concurrent structure-independent fault detec-
tion schemes for the advanced encryption standard. IEEE Trans. Comput. 59(5), 608–622
(2010)
114 D.B. Roy and D. Mukhopadhyay
52. Karpovsky, M., Kulikowski, K.J., Taubin, E., Member, S.: Robust protection against fault-
injection attacks of smart cards implementing the advanced encryption standard. In: DNS, pp.
93–101 (2004)
53. Malkin, T., Standaert, F.-X., Yung, M.: A comparative cost/security analysis of fault attack
countermeasures. In: FDTC, pp. 109–123 (2005)
54. Maistri, P., Leveugle, R.: Double-data-rate computation as a countermeasure against fault
analysis. IEEE Trans. Comput. 57(11), 1528–1539 (2008)
55. Karri, R., Wu, K., Mishra, P., Kim, Y.: Concurrent error detection schemes of fault based side-
channel cryptanalysis of symmetric block ciphers. IEEE Trans. Comput.-Aid. Des. 21(12),
1509–1517 (2002)
56. Guo, X., Karri, R.: Recomputing with permuted operands: a concurrent error detection
approach. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 32(10), 1595–1608 (2013)
57. Kapoor, R.: Security vs. test quality: Are they mutually exclusive? In: ITC’04: Proceedings
of the International Test Conference, Washington, DC, USA, 2004, p. 1413. IEEE Computer
Society (2004)
58. Yang, B., Wu, K., Karri, R.: Scan based side channel attack on dedicated hardware implementa-
tions of data encryption standard. In: ITC’04: Proceedings of the International Test Conference,
pp. 339–344, Washington, DC, USA, 2004. IEEE Computer Society (2004)
59. Mukhopadhyay, D., Chakraborty, R.: Testability of cryptographic hardware and detection of
hardware trojans. In: 2011 20th Asian Test Symposium (ATS), pp. 517–524 (2011)
60. Stallings, W.: Cryptography and Network Security: Principles and Practice. Pearson Education
(2002)
61. Wu, K., Yang, B., Karri, R.: Secure scan: a design-for-test architecture for crypto-chips. In:
DAC’05: Proceedings of 42nd Design Automation Conference, pp. 135–140 (2005)
62. Yang, B., Wu, K., Karri, R.: Secure scan: a design-for-test architecture for crypto chips. IEEE
Trans. Comput.-Aid. Des. Integr. Circ. Syst. 25(10), 2287–2293 (2006)
Chapter 6
PUF-Based Authentication
Jim Plusquellic
6.1 Introduction
J. Plusquellic (✉)
University of New Mexico, Albuquerque, NM, USA
e-mail: [email protected]
cations by eliminating the need for costly non-volatile memory (NVM). PUFs can
be integrated into any type of system, including system-on-a-chip (SoC), an
application specific integrated circuit (ASIC) or field programmable gate array
(FPGA).
This chapter focuses on the design of authentication protocols which utilize
physical-layer cryptographic primitives such as the PUF, and describes the benefits
(and drawbacks) they offer over traditional software-based authentication protocols.
PUF-based authentication protocols are less than 15 years old and many have not
yet been fully vetted. Therefore, the development of low cost, secure protocols, and
proofs of their attack resilience is still very much a moving target. We provide a
high-level description of algorithmic security primitives and authentication proto-
cols, and then present a snapshot of the current state of the art, fully acknowledging
that the latter is rapidly evolving and still considered an open research problem by
the hardware security and trust community.
The term information security refers to vast array of mechanisms, protocols, and
algorithms, which are designed to protect information from unauthorized access,
modification, and destruction [17]. Information security has four primary objectives
including confidentiality, data integrity, authentication, and non-repudiation [18].
Confidentiality refers to maintaining privacy or secrecy of information and is tra-
ditionally ensured using encryption techniques. Data integrity relates to a property
of the data, that it has not been altered by an unauthorized party, and is typically
implemented using secure hashing schemes. Authentication is a process that con-
firms the identity of an entity or the original source of data using corroborative
evidence, and can be carried out using modification detection codes (MDCs),
message authentication codes (MACs), and digital signatures. Non-repudiation
refers to a process that associates an entity with a commitment or action, thereby
preventing the entity from claiming otherwise, and is traditionally ensured using
digital signature schemes.
The primary goal of cryptography is to provide a theoretical basis and practical
specifications for techniques that meet these information security goals. A wide
variety of cryptographic primitives have been developed to provide information
security. Menezes et al. [18] propose a taxonomy which partitions cryptographic
primitives into three basic categories, namely unkeyed primitives, symmetric-key
primitives, and public-key primitives. Unkeyed primitives include cryptographic
hash functions, one-way permutations, and random sequences. The keyed primi-
tives include a wide variety of symmetric and public-key ciphers, MACs (which are
keyed hash functions), signatures, and pseudo-random number generators (those
relevant to authentication are described in the next section). Each primitive can be
evaluated according to a set of criteria such as the level of security they provide as
6 PUF-Based Authentication 117
Random numbers are important in many cryptographic protocols, e.g., session keys,
nonce for authentication, randomized procedures, etc. Random numbers must be
selected uniformly from a distribution, thereby ensuring that all possible values are
equally likely, as a means of maximizing the difficulty of algorithmic and brute
force attacks carried out by adversaries against the protocol. Requests that are
common in cryptographic protocols include “select an element at random from the
sequence {1, 2, …, n}” or “generate a random string of symbols of length m over
the alphabet G of n symbols.” Uniformly refers to the probability that a given
symbol is selected and by definition is equal to 1/n for an alphabet of n symbols,
and 1/nm for a string of symbols of length m.
Traditionally, deriving random numbers from physical sources was difficult and
costly, spurring the development of software-based alternatives such as techniques
based on pseudorandom sequences and seed parameters (PRNGs) [19]. NIST
recommends several such cryptographically secure PRNGs, each based on different
types of cryptographic primitives such as hash functions, MACs and block ciphers
[20]. Although most are considered cryptographically secure, they each depend on a
118 J. Plusquellic
random seed with high entropy. An entropy accumulator can be used to derive the
seed from a “non-ideal” physical source of randomness, whereby the input bit-
stream produced by the non-ideal source is processed by the entropy accumulator
into an m-bit pool of high entropy. The entropy accumulator can be a cryptographic
hash function [19]. Alternatively, the physical layer nature of PUFs make them
cost-effective and well suited as the physical source of randomness. Recent work
shows that appropriate post-processing of PUF responses allow them to be used
directly as TRNGs, i.e., without the need of PRNGs [21].
difficult to find an input string m that hashes to specific hash value, and collision
resistant hash functions (CRHFs), which makes it difficult to find two input strings
that map to the same hash. OWHFs are preimage and second-preimage resistant,
and are considered weak one-way hash functions, while CRHFs typically have all
three properties and are called strong one-way hash functions.
Keyed hash functions provide both message authentication and data integrity
and are called message authentication codes (MACs) when used in
symmetric-encryption protocols, and digital signatures when used in asymmetric
encryption protocols. Both schemes hash the message and then sign it with a key.
The receiver authenticates by applying the MAC or digital signature algorithm on
the received message and verifies that the received hash matches the locally
computed value. Hashing compresses the message and makes this data integrity
check more efficient. Outside the scope of this expository, the chip area and
computational complexity of cryptographic hash functions is much larger than that
found in non-cryptographic hash functions [18, Ch. 9].
Similar to authentication protocols, secure hash algorithms continue to evolve,
driving periodic changes, and additions to the public standards [23, 24]. The term
secure hash algorithm (SHA) is used in reference to a set of public standards
maintained by the National Institute of Standards and Technology (NIST). In
particular, SHA-3 refers to subset of the cryptographic primitive family Keccak, a
standard released in August of 2015 that is designed as an alternative to the SHA-2
family of secure hash functions [25].
Dodis et al. [26, 27] proposed two algorithms for a secure sketch, both based on
binary error correcting linear block codes. A linear block code is characterized with
three parameters given as [n, k, t], which indicate that there are 2k codewords of
length n and each codeword is separated from all others by at least 2t − 1 bits. The
last parameter specifies the error correcting capability of the linear block code, in
particular, that up to t bits can be corrected.
The code-offset construction is the simpler of the two linear block codes. The
Sketch(y) procedure samples a uniform, random codeword c (which is independent
of y), and produces an n-bit helper data bitstring w using Eq. 6.2 [19]. The bitstring
w represents the binary offset between y and c.
w=y⊕c ð6:2Þ
Recover (y′, w) computes a noisy codeword c′ using Eq. 6.3 and then applies an
error-correcting procedure to correct c′ as c″ = Correct(c′).
c′ = y ′ ⊕ w => c′ = ðy ⊕ y′ Þ ⊕ c ð6:3Þ
6 PUF-Based Authentication 121
If the number of bits that are different between c and c′ < t, where t represents
the error-correcting capability of the code, then the algorithm guarantees y = y″.
Also, w discloses at most n bits of y, of which k are independent of y (with k less
than or equal to n). Therefore, the remaining min-entropy is m − (n − k) (specified
as m′ above), where (n − k) represents the min-entropy that is lost by exposing w to
the adversary.
The second algorithm proposed in [26, 27] is referred to as the syndrome
construction. The Sketch(y) procedure produces an (n − k)-bit helper data bitstring
using the operation specified by Eq. 6.5, where HT is a parity-check matrix
dimensioned as (n − k) x n.
q = y ⋅ HT ð6:5Þ
s = y′ ⋅ H t ⊕ w ⇒ s = ðy ⊕ y′ Þ ⋅ H T ð6:6Þ
Error correction is carried out by finding a unique error word e such that the
hamming weight (the number of ‘1’s) in bitstring e is less than or equal to t (the
error-correcting capability of the code). Also, the error word e satisfies Eq. 6.7.
s = e ⋅ HT ð6:7Þ
In both the code-offset and syndrome techniques, the Recover procedure is more
computationally complex than the Sketch procedure. As discussed below, the first
PUF-based authentication protocols implemented the Recover procedure on the
resource-constrained hardware token. Subsequent work proposes a reverse fuzzy
extractor, which implements Sketch on the hardware token and Recover on the
resource-rich server, making the protocol more cost-effective and attractive for this
type of application environment [28].
Similar to error-correction, there is a broad range of techniques for constructing a
randomness extractor. Section 6.3.1 described the requirements for random
number generation, and practical approaches for extracting randomness from
non-ideal physical sources, e.g., those based on the use of seeded cryptographic
PRNGs. Reference [19], Sect. 6.3.2 provides a survey of techniques proposed for
extracting randomness.
Fuzzy extractors combine a secure sketch with a randomness extractor as
shown in Fig. 6.1 (adapted from [19]). A PUF-based authentication protocol, with
the hardware token, e.g., smart card, shown on the left and the secure server, e.g.,
122 J. Plusquellic
bank, shown on the right is also shown to illustrate one possible usage scenario. The
Sketch, as noted above, takes an input r, which, e.g., might be a PUF response to a
server-generated challenge c, as input and produces helper data w (labeled 1st in the
figure). The Extractor takes both r and a random number (seed) n and produces an
entropy distilled version z, which can be stored as a tuple (c, z, w, n) in a secure
database (DB) on the server. This component of the fuzzy extractor is called
Generate or Gen.
Authentication in the field begins by selecting a tuple (c, z, w, n) from the DB
and transmitting the challenge c, helper data w and the seed n to the hardware token.
The PUF is challenged a second time with challenge c and produces a “noisy”
response r′ (labeled second in the figure). The Reproduce or Rep process of the
fuzzy extractor uses the Recover procedure of the secure sketch to error correct r′
using helper data w. The output r″ of Recover and the seed n are used by the
Extractor to generate z′. As long as the number of bit flip errors in r′ is less than
t (the chosen error correction parameter), the z′ produced by the token’s Extractor
will match the server-DB z and authentication succeeds. Note that the error cor-
rected z′ establishes a shared secret between the server and token, which can
alternatively be used as input to traditional cryptographic primitives such as hash
and block cipher functions (as opposed to being transmitted to the server as shown
in the figure).
n!
f ðk; n, pÞ = Pk ð1 − pÞn − k ð6:8Þ
k!ðn − kÞ!
μbinomial = np ð6:9Þ
The NIST statistical test suite can be used to evaluate the randomness of PUF
response bitstrings [30]. The NIST tests look for patterns in the bitstrings that are
not likely to be found at all or above a given frequency in a “truly random”
bitstring. For example, long or short strings of 0’s and 1’s, or specific patterns
repeated in many places in the bitstring work against randomness. The output of the
NIST statistical evaluation engine is the number of chips that pass the null
hypothesis for a given test, when evaluated at a significance level α (α is set to the
default value of 0.01 which reflects a confidence of 99 %). The null hypothesis is
specified as the condition in which the bitstring-under-test is random. Therefore, a
good result is obtained when the number of bitstrings that pass the null hypothesis
is large.
The NIST test suite consists of 15 separate tests, all of which have constraints on
the size of the bitstring. The following provides an intuitive overview of what the
tests measure, with details regarding the bitstring size requirements and applied test
statistics omitted (see [30]). The test is always conducted against what is expected
in a truly random sequence of similar length.
• Frequency Test: Counts the number of ‘1’ in a bitstring and assesses the
closeness of the fraction of ‘1’s to 0.5. All other tests assume this test is passed.
• Block Frequency Test: Same except bitstring is partitioned into M blocks.
Ensures bitstring is “locally” random.
• Runs Test: Analyzes the total number of runs, i.e., uninterrupted sequences of
identical bits, and tests whether the oscillation between ‘0’s and ‘1’s is too fast
or too slow.
• Longest Run Test: Analyzes the longest run of ‘1’s within M-bit blocks, and
tests if it is consistent with the length of the longest run expected in a truly
random sequence.
• Rank Test: Analyzes the linear dependence among fixed length substrings in the
bitstring, and tests if the number of ranks, i.e., number of rows that are linearly
independent, of size M, M − 1, etc., match the number expected in a truly
random sequence.
• Fourier Transform Test: Analyzes the peak heights in the frequency spectrum of
the bitstring, and tests if there are periodic features, i.e., repeating patterns close
to each other.
• Non-overlapping and Overlapping Template Tests: Analyzes the bitstring for
the number of times pre-specified target strings occur, to determine if too many
occurrences of non-periodic patterns occur.
• Universal Test: Analyzes the bitstring to determine the level of compression that
can be achieved without loss of information.
• Linear Complexity Test: Analyzes the bitstring to determine the length of the
smallest set of LFSRs needed to reproduce the sequence.
• Serial and Approximate Entropy Tests: Analyzes the bitstring to test the fre-
quency of all possible 2m overlapping m-bit patterns, to determine if the number
is uniform for all possible patterns.
6 PUF-Based Authentication 125
Encryption ensures the nonce and identifiers are “inseparably” bound as dis-
cussed above.
Challenge-Response using Keyed One-Way Functions: Encryption is con-
sidered a “heavy weight” cryptographic primitive, and may be replaced by a
one-way function (OWF) or a nonreversible function with shared key, and a
challenge, for authentication in resource-constrained devices. The encryption
algorithm EK is replaced by a MAC algorithm hK, i.e., a keyed hash function. The
receiver also computes the MAC and compares it with the received MAC. These
protocols require an additional cleartext field rA to be transmitted [18].
B confirms that the hash value received, designated as hk(rA, rB, B), is equal to
the value he/she computes locally using the same hash function and shared secret
K. A performs a similar validation using the transmitted hash hK(rB, rA, A) from
B. As discussed in Sect. 6.3.2, the computational infeasibility of finding a second
input to hK that produces the same hash provides the security guarantee in this
mutual authentication protocol.
Challenge-Response by Public-Key: Here, the prover decrypts a challenge
using its secret key component of the public-private pair, which is encrypted by the
verifier under its public key PA. Alternatively, the prover can digitally sign a
challenge.
A ← B: hðrÞ, B, PA ðr, BÞ
A ← B: r
128 J. Plusquellic
Weak PUFs are those whose challenge-response space is small while strong PUFs
have very large, ideally exponential, challenge-response spaces [108, 109]. The
distinction between strong and weak PUF is rooted in the amount of entropy that
each class can access. The larger the entropy source, the more difficult it is for an
130 J. Plusquellic
adversary, who has access to the PUF, to collect and analyze challenge-response
pairs (CRPs) until the complete behavior of the PUF can be predicted. The SRAM
PUF is an early example of a weak PUF with only one CRP [68] while the arbiter
PUF is traditionally considered a strong PUF because of its exponentially large
challenge space [41]. However, if the size of the entropy source is considered a
defining characteristic, then the arbiter PUF would fail to meet the definition of a
strong PUF because its response space is derived from a relatively small entropy
source, in particular, as small as a couple hundred gates. Given this latter consid-
eration, very few of the proposed PUFs meet this expanded definition.
Model-building resistance using machine learning techniques has emerged as an
important criterion for determining whether a PUF is strong based only on the size
of its CRP space or whether it is truly strong, i.e., attacks that attempt to learn and
predict its behavior are infeasible [42, 110].
The most widely referenced strong PUF, the arbiter PUF, was the one of the first
proposed, and is described in [41, 42]. However, it is also widely recognized that it
is considered strong based only on the size of its input challenge space, and not on
the amount of entropy it possesses.
The arbiter PUF measures path delays from a specialized test structure as its
source of entropy as shown in Fig. 6.2. The test structure implements two paths,
each of which can be individually configured using a set of challenge bits (stored in
FFs along the top of the figure). Each of the challenge bits controls a “Switch box”
that can be configured in either pass mode or switch mode. Pass mode connects the
upper and lower path inputs to the corresponding upper and lower path outputs,
while switch mode reverses the connections. A stimulus, represented as a rising
edge on the left side of the figure, cause two edges to propagate along the two paths
configured by the challenge bits. The faster path controls the value stored in the
arbiter located on the right side of the figure. If the propagating rising edge on the
upper input to the arbiter arrives first, the response bit output becomes a ‘0’.
Otherwise, the response bit is a ‘1’. The switch boxes are designed identically as a
means of avoiding any type of systematic bias in the delays of the two paths.1
Within-die process variations cause uncontrollable delay variations to occur in the
switch boxes, which in turn, makes each instance of the arbiter PUF unique in terms
of its generated response bit(s). A bitstring can be obtained from the arbiter PUF by
repeating the measurement process under a set of different challenges.
From this design, it is clear that the arbiter PUF has an exponential number of
input challenges that can be applied, in particular, 2n with n representing the
number of switch boxes. However, the total amount of entropy is relatively small,
1
Note that achieving an unbiased layout in an FPGA is a challenging and non-trivial process.
6 PUF-Based Authentication 131
and is represented by the four path segments in each of the switch boxes. For
n equal to 128, the total number of path segments that can vary individually from
one instance to another is 4 * 128 = 512. The exponential number of input chal-
lenges simply combines these individual sources of entropy in different ways.
Model building attacks attempt to learn the delay relationships of the two config-
urations for each switch box [110]. Once known, the response under any challenge
then becomes predictable (limited only by the noise margin of the arbiter mea-
surement circuit).
The model-building weakness of the arbiter PUF is addressed in follow-on work,
where the outputs of n arbiter PUFs are XOR’ed, to create a XOR-mixed arbiter
PUF [44, 111, 112]. Figure 6.3 shows an example in which two arbiter PUF output
bits are XOR’ed. The goal is to create an XOR network large enough to achieve the
avalanche criterion. This criterion is commonly found in cryptographic hash and
encryption functions where flipping one of the input bits (or a bit in the key for
encryption) causes half of the output bits to flip. For the XOR-mixed PUF, the goal
is to achieve the avalanche effect by flipping one of the challenge bits. Although
this helps significantly with model building, particularly with networks of XORs
greater than 4, larger XOR networks also reduce reliability by creating a noise-
based avalanche effect, i.e., any odd number of bit flips that occur on the inputs of
any given XOR network results in a response bit flip error. As reported in [111], if a
single arbiter PUF has an HDintra of 5 % (intra-chip HD measures the PUF’s ability
to reproduce the same bitstring over repeated applications of the challenge, usually
under different environmental conditions), the HDintra increases to 19 % for a
4-XOR-mixed arbiter PUF, i.e., nearly 1/5 of the response bits have bit flip errors.
Similar to arbiter PUFs, the hardware-embedded delay PUF (HELP) derives its
entropy from variations in path delays. However, HELP measures delays from
existing functional units. Therefore, no dedicated test structures are required.
Another major benefit of using existing functional units is the amount of entropy
that can be potentially leveraged. Cryptographic functional units are particularly
attractive because of the complexity of their interconnection networks. On the down
side, the lack of control over the configuration of paths in functional units creates
issues related to systematic bias and reliability, as described in the following
sections.
Interestingly, the authors of the first silicon-based PUF paper describe their
notion of a “better PUF” in Ongoing and future work section, which turns out,
based on our work, to be well founded [40]. The basic concept of measuring path
delays from a core logic functional unit was implemented first by Li and Lach [83],
but was not fully developed as a PUF primitive. In particular, the authors do not
address the bias introduced by paths of different lengths nor do they deal with the
reliability issues associated with paths that glitch.
Our development of HELP began in 2011 on a 90 nm ASIC implementation
[86], but was fully developed as an intrinsic PUF (with full integration of the
control logic, entropy source, and measurement components) on a 130 nm Xilinx
V2Pro [84, 85], and more recently using a 28 nm Xilinx Zynq architecture [87]. We
have developed solutions for path length bias and glitching that occur when core
logic functional units are used as the source of entropy, as well as techniques that
improve the attack resilience of HELP when used in low cost authentication
applications. This section describes the characteristics of the most recent incarna-
tion of HELP and presents new results.
The original version of HELP made use of an embedded test structure called
REBEL [113] for measuring path delays and detecting glitches [84–86]. Recent
implementations of HELP measure path delays in glitch-free functional units,
which allow a simplified version of REBEL to be used [87]. The simplified version
eliminates the delay chain component and instead samples the path delays at the
capture FF directly.
HELP attaches to an on-chip module, such as a hardware implementation of the
Secure Hashing Algorithm (SHA-3) [23], as shown on the left side of Fig. 6.4. The
data path component of the SHA-3 algorithm, configured as keccak-f [200], is used
in our FPGA experiments. This combinational data path component includes 416
primary inputs (PIs) and 400 primary outputs (POs) and is implemented on a Xilinx
Zynq FPGA using 1936 LUTs.
6 PUF-Based Authentication 133
Fig. 6.4 HELP Block Diagram: a Instantiation of the HELP entropy source and b HELP
processing engine
Similar to the arbiter PUF described in the previous section, within-die variations
in path delays are the main source of entropy for HELP. Manufacturing variations
change the relative path delays through the functional unit in different ways, and
therefore each instance of the functional unit is uniquely characterized by these
delays. However, the structure of the paths in the arbiter PUF is very different than
those in a typical functional unit, i.e., the arbiter PUF paths are symmetric and
regular (by design) while the paths within a typical functional unit exhibit no such
regularity.
Functional unit paths exhibit fan-out and then reconvergence of fan-out at various
points within the logic structure of the functional unit (called reconvergent-fanout),
as shown on the right side of Fig. 6.5. Also, the lengths of the paths can vary widely,
e.g., the short paths shown have 3 or fewer gates while the long paths are 5 or more
gates in length. Both of these characteristics make it more difficult to build a PUF
with good statistical characteristics. Reconvergent-fanout can cause glitching,
i.e., static and dynamic hazards, to occur on the primary outputs, whereby output
signals transition more than once. Glitching creates ambiguity regarding the “cor-
rect” timing value to use for the path. Operating the functional unit under different
Fig. 6.5 Portion of a functional unit schematic, showing fan-out and reconvergence of paths
134 J. Plusquellic
Clock Strobing
Path delay is defined as the amount of time (Δt) it takes for a set of 0-to-1 and
1-to-0 bit transitions introduced on the PIs of the functional unit (input challenge) to
propagate through the logic gate network and emerge on a PO. HELP uses a
clock-strobing technique to obtain high resolution measurements of path delays as
shown on the left side of Fig. 6.4. A series of launch-capture operations are applied
in which the vector sequence that defines the input challenge is applied repeatedly
to the PIs using the Launch row flip-flops (FFs) and the output responses are
measured on the POs using the Capture row FFs. On each application, the phase of
the capture clock, Clk2, is incremented forward with respect to Clk1, by small Δts
(on order of 20 ps), until the emerging signal transition on a PO is successfully
captured in the Capture row FFs. A set of XOR gates connected to the Capture row
FF inputs and outputs (not shown) provide a simple means of determining when
this occurs. When an XOR gate value becomes 0, then the input and output of the
FF are the same (indicating a successful capture). The first occurrence in which this
occurs during the clock strobe sweep causes the current phase shift value to be
recorded as the digitized delay value for this path. This operation is applied to all
POs simultaneously.
The phase shifting module for Clk2 is shown in the middle of Fig. 6.4. On-chip
digital clock managers (DCMs) are commonly included in FPGA architectures. For
example, Xilinx FPGAs typically incorporate at least one DCM with a digitally
controlled fine phase shift control mechanism even on their lowest cost FPGAs. For
low-cost components that do not include a DCM with this capability, a fine phase
shift mechanism can be implemented with a small area overhead using a multi-
tapped delay chain.
The right side of Fig. 6.4 shows the HELP processing engine. The digitized path
delays are collected by a storage module and stored in an on-chip block RAM
(BRAM). Each digitized timing value is stored as a 14-bit value, with 10 binary
digits serving to cover the fine phase shift sweep range of 0–1023 and 4 binary
6 PUF-Based Authentication 135
PN Processing
Fig. 6.6 a Example rising and falling path delays (PN), b PND and c PNDc
136 J. Plusquellic
ðPNDi − μTVX Þ
zvali = ð6:11Þ
RngTVX
As an example, Fig. 6.7a shows the PND histogram distribution for chip C1 at
25 °C, 1.00 V. The µTVx is shown as −40 while the RngTVx is computed between
the 5 and 95 % as 136. Figure 6.7b superimposes the PND histograms for C1 at
25 °C, 1.00 V and 100°C, 1.05 V. The TVComp process will shift (and scale) this
distribution to the left to remove the adverse effects introduced by the change in
environmental conditions.
A second illustration of the effect of TVComp is shown in Fig. 6.6b, c. The data
in Fig. 6.6c is obtained by applying TVComp procedure to the 2048 PND measured
under each of the 13 TV corners for each chip, i.e., 13 TV corners * 38 chips = 494
separate applications. Since the same reference mean and range are used for all
transformations, TVComp eliminates both TV noise and chip-wide performance
6 PUF-Based Authentication 137
Fig. 6.7 a PND distribution for chip C1 with µTVx and RngTVx depicted and b Chip C1 PND
distributions at 2 TV corners
differences between the chips. Note that the curves in Fig. 6.6c no longer exhibits
the saw-tooth behavior introduced by TV noise.2
The differences that remain in the TVComp’ed PND (subsequently referred to as
PNDc) shown in Fig. 6.6c are those introduced by within-die process variations
(WDV) and uncompensated TV noise (UC-TVNoise). For this particular PND, the
TVComp process is able to reduce TV noise to approx. 2 in the worst case, which
translates to approx. 36 ps. In general, PNDc with larger levels of UC-TVNoise are
more likely to introduce bit flip errors.
The implementation of the HELP algorithm shown in Fig. 6.4 constructs a
histogram distribution in the upper 2048 memory locations of the BRAM using the
2048 PND stored in the lower portion and then parses the distribution to obtain µTVx
and RngTVx. Once the distribution constants are available, the PND in the low
portion of the BRAM are converted to PNDc.
The last operation applied to the PN is represented by the Modulus operation
shown on the right side of Fig. 6.4. Modulus is a standard mathematical operation
that computes the positive remainder after dividing by the modulus. The Modulus
operation is required by HELP to eliminate the path length bias that exists in the
PNDc, which acts to reduce randomness and uniqueness in the generated bitstrings.
The value of the Modulus is also a user-selectable parameter, similar to the LFSR
seed, mean and range parameters, and is discussed further in the following.
The HELP engine shown in Fig. 6.4 overwrites the PNDc after applying the
Modulus. The final values, called MPNDc, are used in the bitstring generation
process.
2
TV compensation also serves as a countermeasure to prevent adversaries from manipulating
temperature and supply voltage as a physical attack mechanism.
138 J. Plusquellic
The bitstring generation process uses a fifth user-specified parameter, called the
Margin, as a means of improving the reliability of the bitstring regeneration pro-
cess. The bottom portion of Fig. 6.8a plots 18 of the 2048 PNDc from Chip1 along
the x-axis. The red curve line-connects the data points obtained under enrollment
conditions while the black curves line-connect data points under the 12 regeneration
TV corners.
The curves plotted along the top of Fig. 6.8a show the MPNDc values after a
modulus of 20 is applied. Figure 6.8b enlarges the upper portion of Fig. 6.8a and
includes a set of margins of size 2 surrounding two strong bit regions of size 6.
Designators along the top given as ‘s0’, ‘s1’, ‘w0,’ and ‘w1’ classify each of the
enrollment data points as either a strong 0 or 1, or a weak 0 or 1, resp. Data points
that fall on or within the hatched areas are classified as weak as a mechanism to
avoid bit flip errors introduced by UC-TVNoise that occurs during regeneration.
The Margin method improves bitstring reproducibility by eliminating data points
classified as “weak” in the bitstring generation process. For example, the data points
at indexes 4, 6, 7, 8, 10, and 14 would introduce bit flip errors at one or more of the
TV corners during regeneration because at least one of the regeneration data points
is in the opposite bit value region from the corresponding enrollment value. We
refer to this bitstring generation technique as the Single Helper Data
(SHD) scheme since the classification of the MPNDc as strong or weak is deter-
mined solely by the enrollment data.
A second technique, referred to as the Dual Helper Data (DHD) scheme,
requires that both the enrollment and regeneration MPNDc be in strong bit regions
before allowing the bit to be used in the bitstring during regeneration. The helper
data, which represents the classification of the MPNDc as strong or weak, is bitwise
‘AND’ed, and then both the enrollment and regeneration bitstrings are generated
(the enrollment data is assumed to be collected earlier in time and stored on a secure
server). The DHD scheme doubles the protection provided by the margin against bit
flip errors because the MPNDc produced during regeneration must now change and
move across both a ‘0’ and ‘1’ margin before it can introduce a bit flip error. This is
true because both the enrollment and regeneration MPNDc must be classified as
strong to be included in the bitstring and the strong bit regions are separated by
2 * margin.
Figure 6.8 highlights four cases where an enrollment-classified strong bit would
be reclassified as weak in the DHD scheme because 1 or more of the regeneration
PNDc falls within a weak region. This shows that in addition to doubling the
protection against bit flip errors, the DHD scheme can potentially produce different
bitstrings each time the chip regenerates it. Therefore, DHD adds uncertainty by
leveraging UC-TVNoise (and sampling noise to a smaller degree). This feature is a
benefit for authentication applications because only half of the helper data is
revealed to the adversary while the other half is generated and kept on the chip or
server. The missing helper data adds uncertainty for an adversary as to the final
form of the bitstring. Encryption applications can leverage both of these DHD
benefits as well by exchanging the chip and server helper data bitstrings while
keeping the generated keys private. These benefits of DHD are expanded upon in
the following sections.
Entropy Analysis
The Margin technique using either the SHD or DHD schemes adds uniqueness to
the regenerated bitstring. This is true because weak bits are excluded from the
bitstring based on the position of the PNDc and Margins and therefore, different
chips utilize different bits in the constructed bitstring. Figure 6.9a, b depict several
scenarios that show how the Margin and the position of the PNDc affect bitstring
generation. The line-connected curves in Fig. 6.9 are analogous to those described
earlier in reference to Fig. 6.6c. Figure 6.9a plots a set of 20 different PNDc to
Fig. 6.9 a Example PNDc (20 groups) from 38 chips (y-axis) across 1 enrollment and 12 TV
corners (x-axis), and b blow-up of −60 to −80 region
140 J. Plusquellic
illustrate how PNDc distribute across the range defined by the Modulus, which is set
to 20. Figure 6.9b is a blow-up of the bottom portion of Fig. 6.9a.
As indicated earlier, within-die process variations change path delays uniquely
in different chips, which is reflected by the y-dimensional spread within each group
of PNDc. For the data set labeled as scenario1 in Fig. 6.9b, the range occupied by
the PNDc is approx. 10. The y position of the overall data set is such that, except for
a few points, the bit generated by this data will be 0 for all 38 chips.
However, the enrollment data points (left-most) for some chips fall within the
weak bit regions and therefore, this bit is skipped for these chips using either the
SHD or DHD schemes. Moreover, UC-TVNoise causes some of the regeneration
data points to move from their strong bit positions in the enrollment data to weak
bits during regeneration. The DHD scheme excludes this bit for these chips as well,
creating differences in the generated bitstring for the same chip at different TV
corners, while simultaneously providing a 2 × Margin to bit flip errors. Moreover,
the relative position of the curve associated with each chip, with respect to the other
chips, changes in each data set so it is unpredictable which data points are excluded
during bitstring generation for any particular chip. The curve for chip C1 is high-
lighted in red in each of the PNDc groups to illustrate the change in its relative
position with respect to other chips in the group.
The data set labeled scenario2 in Fig. 6.9b shows a second possibility, that is
closest to the “ideal” case because the position and range of the curves spans the
y-axis into both the strong 0 and strong 1 bit regions. The number of possible
results regarding the status of the bit includes those described for scenario1 plus an
additional possibility that some chips generate a strong 1 bit and others a strong 0
bit. In contrast, scenario3 labeled in Fig. 6.9a is closest to the “worst” case where
nearly the entire data set is positioned with the strong 0 region. Note that this
scenario is only possible when the Modulus is large enough to create strong bit
regions that upper-bound the smallest range (WDV + UC-TVNoise) found among
the MPNDc groups. Generating bitstrings with Moduli larger than 4 * Margin +
this smallest range begins to reduce their statistical quality. The analysis presented
in subsequent sections shows that the upper-bound for this data set is
Modulus = 28.
The bitstrings generated using the DHD scheme is subjected to the NIST statistical
test suite as well as Inter-chip and Intra-chip hamming distance (HD) tests. The
analysis is carried out using two different reference scaling factors for TVComp,
referred to as minimum (Min) and mean scaling. The µref and Rngref scaling con-
stants derived from the set of path distributions for the 38 chips are used as the
reference values in Eq. 6.12 to scale all chip data before applying the Modulus
operation and DHD bitstring generation procedures described above. The minimum
scaling constants are derived from the chip with smallest distribution, i.e., smallest
mean and range values. The mean scaling constants are computed from the average
6 PUF-Based Authentication 141
mean and range values across the distributions of all chips. We focus our analysis
on these two scaling factors because they represent the extremes of the recom-
mended range. We expect similar results to be produced for all scaling factors
between these limits.
We use the acronym SBS to denote “strong bitstring.” The DHD scheme requires
two helper data bitstrings from the same chip as a means of constructing the two
corresponding SBS’s. The helper data bitstrings, which are derived from the 2048
MPNDc using the Margin technique, are bitwise AND’ed and then used to select
bits for use in the construction of the SBS’s. The SBS’s generated using enrollment
data (TV0) and the nominal regeneration TV corner data (TV2) from the same chip
are used in the NIST statistical tests and Interchip hamming distance (HDInter)
calculations below. UC-TVNoise is smallest using this combination, and therefore
it represents the worst case condition where the effect of the helper data AND’ing
has the smallest impact on the additional entropy as discussed earlier. Only one of
the SBS’s from each chip is used in HDInter and NIST statistical tests, and the SBS’s
are truncated to the length of smallest bitstring among the 38 generated. The same
criteria are used in the Intra-chip HD (HDIntra) calculations except a much larger set
of bits are processed by accumulating the results across a set of 256 different LFSR
seeds (only one LFSR seed is used for NIST and HDInter tests because similar
results are obtained using other seeds).
NIST Statistical Test Results The NIST statistical test results are shown in
Fig. 6.10a, b for minimum and mean scaling, respectively. A test is considered “a
pass” according to the NIST criteria if at least 35 of the 38 chips pass the test
individually. The histogram bar heights indicate the number of chips that pass the
test. The bitstrings generated using a Margin of 3 and a set of Moduli between 14
and 30 are subjected to 10 of the NIST tests. The size of the bitstring was too small
for some values of the Modulus and therefore, the bar heights for these NIST test
results are set to 0 (includes regions along back and left side of the 3-D histogram).
Under minimum scaling, all NIST tests are passed except for four associated
with Modulus 30. These fails are related to scenario3 discussed in reference to
Fig. 6.10 NIST statistical test results using 38 chip bitstrings for each analysis and a Minimum
scaled data and b Mean scaled data
142 J. Plusquellic
Fig. 6.9, where the range of withindie variation fits entirely within the strong ‘0’ or
‘1’ regions defined by Modulus. This is supported by the results presented under the
mean scaling, where the bitstrings for Modulus 30 pass all tests (only 1 test is failed
under mean scaling, and with a value of 34 instead of 35). Mean scaling enlarges
the y-dimensional spread of the data points over minimum scaling and reduces the
probability that scenario3 occurs. These results indicate that the bitstrings possess a
high degree of randomness, which is a necessary condition for classifying the
bitstrings as cryptographic quality. The results using Margins of 2 and 4 are very
similar.
Interchip Hamming Distance (HDInter)
HDInter is computed using Eq. 6.13. The symbols NC, NB, and NCC represent
“number of chips,” “number of bits,” and “number of chip combinations,”
respectively. This equation simply sums all the bitwise differences between each of
the possible pairing of chip SBS’s (NCC), and then converts the sum into a per-
centage by dividing by the total number of bits that were examined. The XOR
operator generates a 1 when the pair of bits in the SBS’s at the same position is
different and 0 otherwise.
NC NC
NB !
1
HDinter = ∑ ∑ ∑ ðSBSi, k ⊕ SBSj, k Þ × 100 ð6:13Þ
NCC × NB i = 0 j = i k = 0
Figure 6.11a shows the HDInter results for a set of Moduli (x-axis) and Margins
(y-axis). The ideal value for HDInter is 50 %, which indicates that half of the bits in
any arbitrary pairing of bitstrings from the 38 chips have different values. The best
values are produced for smaller Moduli, as expected. However, all values remain
above 48.5 %, which indicates a high degree of uniqueness among the bitstrings
from different chips.
Fig. 6.11 a Interchip hamming distance (HD), b Probability of failure and c Smallest bitstring
size statistics using 4096 PN
6 PUF-Based Authentication 143
Figure 6.11b reports HDIntra as the probability of a bit flip failure for the same
set of Moduli and Margins used in 11(a) (note the x-axis is reversed from that
shown in Fig. 6.11a). The value of the exponent x is reported from the
equation 1/10−x so −6 indicates 1 chance in 1 million. Cases where no bit flips
were detected as shown as −10. As expected, the larger Moduli produce lower
probabilities of failure. The probability of failure for Margins 3 and 4 under min-
imum scaling are all set to 10−10 (no bit flip errors were detected), and are less than
10−6 for Margin 2 except for Modulus 10. The probability of failure under mean
scaling is larger but remains below 10−6 for Margins 3 and 4.
Minimum Bitstring Size Figure 6.11c plots the smallest bitstring size for the same
set of Moduli and Margins. Smaller Moduli have smaller strong bit regions for a
given Margin and therefore, fewer bits quality as strong. However, the bitstring
sizes grow quickly, with at least several hundred bits available for Moduli/Margin
combinations with strong bit regions of size 2 and larger. Bitstring size can be
increased as needed by increasing the number of tested paths beyond 4096.
MPNDc will vary because its value depends on the all of the 4096 PNs selected and
used in the bitstring generation process. This complex relationship is leveraged as a
security property in the HELP authentication protocol as a means of both preserving
privacy and adding resilience to model-building attacks.
(which we discuss below), but the large number of CRPs available in strong PUF
implementations also allow for simpler schemes with stronger security properties.
The protocol has the benefit of being simple to implement and is very light-
weight for the token. The inability of the PUF to precisely reproduce the response ri
(in simple schemes that do not attempt error correction or error avoidance) makes it
necessary to implement a error-tolerant matching scheme with HDintra > 0. It
should be noted, however, that large values of HDintra increase the chance of
impersonation, and act to reduce the strength of the authentication scheme.
A second drawback is the large number of challenge-response pairs that must be
recorded during enrollment, as a means of ensuring that authentication can be
carried out over a long period of time. This increases the storage requirements for
the verifier, since the worst-case usage scenario must be accommodated, and/or
creates inconveniences for users who exceed the stored CRP capacity. Other
drawbacks include the lack of resistance to denial of service attacks, whereby
adversaries purposely deplete the server database, the inability to carry out
privacy-preserving or mutual authentication and the susceptibility of the scheme to
model-building attacks [118]. The latter is the primary driver for the requirement
that a truly strong PUF be used for authentication protocols with unprotected
interfaces, of which this simple protocol is an example.
A growing list of proposed protocols address these short-coming by incorpo-
rating cryptographic primitives on the prover and verifier side [19, 21, 39, 40, 119].
The inclusion of cryptographic primitives enables significant improvements to the
security properties of the protocols, and additionally allow for privacy-preserving
and mutual authentication. However, their use, in many cases, requires error-free
response bitstrings from the PUF, which in turn requires helper data to be stored
with the CRPs on the server. Many recent protocols target low-cost, resource-
constrained applications, e.g., RFID, and attempt to minimize the implementation
footprint and energy profile on the token side. Error correction algorithms, such as
secure sketches [26, 27], are asymmetric in terms of their computational cost, with
helper data generation requiring fewer resources than the process of using the helper
data to correct bit flip errors in the regenerated response. Recently proposed
authentication protocols attempt to minimize the area and energy requirements for
token-side operations by leveraging this asymmetrical relationship. We discuss
several of these protocols below. An excellent review of these and other protocols
[28, 38, 40, 120–133] is provided in [117, 134].
correlations that may exist among different challenges are obfuscated, increasing
the difficulty of model-building even further. The main drawback of using a OWF
on the PUF responses as shown is a requirement that the responses from the PUF be
error-free. This is true because even a single bit flip error in the PUF’s response
changes a large number of bits in the output of the OWF (avalanche effect). The
functions Gen and Rep are responsible for error-correcting the response, using
algorithms that were described earlier in Sect. 6.3.3.
The protocol works as follows. During enrollment in a secure environment, a
one-time interface is used to allow the server to obtain PUF responses, rj, produced
from randomly generated, hashed challenges cj. The Gen routine produces helper
data hdj for each rj, which is sent to the token to produce a hashed version of the
PUF response, r′j. The 3-tuples <cj, r′j, hdj> produced by multiple iterations of this
algorithm are stored in the database for token htID. After enrollment, a fuse is blown
to disable the one-time interface. Authentication is very similar except for the Gen
operation. Note that the response r′n must match the stored response rn in order for
the authentication to succeed, i.e., error-correction eliminates the need for the
“fuzzy matching” component in Protocol 1. Otherwise, the benefits and drawbacks
are similar as those described for Protocol 1 with additional drawbacks related to
the need for a cryptographic hash function and the increased computational and
energy cost associated with Rep.
6 PUF-Based Authentication 149
Maes et al. proposes a protocol based on reversed secure sketching that is designed
to address authentication in resource-constrained environments [19, 119]. Their
protocol uses the syndrome technique proposed in [26] (see Sect. 6.3.3) for error
correction but reverses the roles of the prover and verifier, i.e., the prover
(resource-constrained token) performs the lighter-weight Gen procedure while the
verifier (server) performs the compute-intensive Rep procedure. The same process
is carried out during enrollment and regeneration. Given that the sketching pro-
cedure produces a unique bitstring with bits that are different every time it is
executed on the token, in order to authenticate, the verifier is required to correct the
original bitstring stored during enrollment to match each of the regenerated bit-
strings. In order to accomplish this, the helper data produced by each run of Gen on
the token is transmitted to the verifier.
The mutual authentication protocol proposed in [19] is graphically illustrated in
Fig. 6.14. Similar to previous protocols, enrollment involves the verifier generating
challenges and storing the PUF responses ri for hti in a secure database (not shown).
In the proposed protocol, only a single CRP is stored for each token, which is
indexed by IDi in the server’s database, and then this interface is permanently
disabled on the token. The authentication process begins with the token on the left
generating the bitstring response again as r′i and then multiplying it by the
parity-check matrix HT of the syndrome-based linear block code to produce the
helper data hdi. A random number generator is used to produce nonce n1 that is
exchanged with the verifier as a mechanism to prevent replay attacks (see Sect. 6.4
for expository on traditional challenge-response authentication). The tuple <IDi, hdi
and n1> is transmitted over an unsecured channel to the verifier.
Fig. 6.14 “Reversed secure sketching” mutual authentication protocol proposed in [26]
150 J. Plusquellic
The verifier looks up the response bitstring ri generated by this token during
enrollment in the secure database and invokes the Rep routine of the secure sketch
error correction algorithm with ri and the transmitted helper data hdi. If the PUF
response r′i and corresponding helper data hdi are within the error-correcting
capabilities of the secure sketch algorithm, the output r″i of Rep will match the r′i
generated by the token. A second nonce, n2, is generated to enable secure mutual
authentication (see Sect. 6.4) and a secure hash is applied to the IDi, helper data hdi,
the regenerated response bitstring r″i and both nonces n1 and n2 to produce m1. The
hash m1 conveys to the token that the server has knowledge of the response r′i,
which allows the token to authenticate the server. This verification is carried out by
the token by hashing the same values, except using its own version of r″i and
comparing the output to the transmitted m1. If a match occurs, then r′i must be equal
to r″i, and the token accepts, otherwise authentication of the server fails. The token
then demonstrates knowledge of r′i by hashing it with its IDi and nonce n2 and
transmitting the result m2 to the server. The server then authenticates the token
using a similar process by comparing its result with m2.
The helper data in this “reverse” implementation of the fuzzy extractor changes
from one run of the protocol to the next, based on the number and position of the
bits that flip during each regeneration. The main drawbacks of the proposed scheme
are that it is not privacy-preserving and assumes that the helper data does not leak
any information about the response ri. Moreover, since most PUFs can reliably
reproduce more than 80 % of the secret bitstring, any correlations that occur in the
helper data bitstrings introduced by these “constant” secret bitstring components
may reveal information that the adversary can use to increase the effectiveness of
reverse-engineering attacks.
tolerance of e. Although the protocol is very light weight for the token, and avoids
NVM, the level of model to—hardware-correlation attained in the compact model
must be very high and must be able to accommodate changes introduced by
TVNoise, resulting in considerable time and effort at enrollment. PUFs that are
easily modeled simplify the development of the compact model, but also represents
somewhat of a contradiction to their required resilience to model-building attacks.
Also, the proposed protocol does not preserve privacy.
Fig. 6.16 Part 1: Mutual, Privacy-preserving authentication protocol proposed in [21, 135]
154 J. Plusquellic
The server begins the authentication process by generating a nonce n1, which is
transmitted to the token. The token’s challenge c1 is read from the NVM and used
to generate a noisy PUF response r′1. The Gen component of the fuzzy extractor
produces z′1 (an entropy distilled version of r′1) and helper data hd. Helper data hd
is encrypted using the key sk1 from the NVM to produce hdenc. The token then
generates a nonce n2. The PUF-generated key z′1 and the concatenated nonces
(n1||n2) are used as input to a pseudo-random function PRF to produce a set of
unique values t1 through t5 that are used as an ID, keys, and challenges in the
remaining steps of the protocol.
A second response r2 is obtained from the PUF using a new randomly generated
challenge c2, which will serve as the chained key for the next authentication (as-
suming this one succeeds). It is XOR-encrypted as r2_enc for secure transmission to
the server. PRF’ is then used to compute a MAC m using t3 as the key, over the
concatenated, encrypted helper data and new key (hdenc||r2_enc) to allow the server
to check the integrity of hdenc and r2_enc. The encrypted values hdenc and r2_enc plus
n2, t1 and m are transmitted to the server. The nonce n2, as usual, introduces
“freshness” in the exchange, preventing replay attacks. The ID t1 will be the target
of a search in the server database during the server side execution of the protocol.
The server begins an exhaustive search of the database, carrying out the fol-
lowing operations for each entry in the DB: (1) decrypt helper data hdenc using the
current DB-stored ski to produce hd″, (2) construct z″ using the fuzzy extractor’s
Rep procedure and helper data hd″, (3) compute t′1 through t′5 from PRF(z″, n1||n2)
and (4) compare token generated value t1 with t′1. If a match is found, then the
server verifies that the token’s MAC m matches the PRF′(t′3, henc||r2_enc) computed
by the server. If they match, then the token’s PUF-generated key r2 is recovered
using (r2_enc XOR t′2), and the database is updated by replacing (sk1, r1, skold, rold)
with (t′5, r2, sk1, r1). If the exhaustive search fails, then the entire process is repeated
using (skoldi, roldi). If both searches fail, the server generates a random t′4 (which
guarantees failure when the token authenticates). Otherwise, the t′4 produced from a
match during the first or second search is transmitted to the token. The token
compares its t4 with the received t′4. If they match, the token updates its NVM
replacing (sk1, c1) with (t5, c2). Otherwise, the old values are retained.
Note that the old values are needed for de-synchronization attacks where the
adversary prevents the last step, i.e., the proper transmission of t′4 from the server to
the token. In such cases, the server has authenticated the token and has committed
the update to the DB with (t′5, r2, sk1, r1) but the token fails to authenticate the
server, so the token retains its old NVM values (sk1, c1). On a subsequent
authentication, the first search process fails to find the t′5, r2 components but the
second search will succeed in finding sk1, r1. This allows the token and server to
re-synchronize.
The encryption of the helper data hd, as mentioned, prevents the adversary from
repeatedly attempting authentication to obtain multiple copies of the helper data,
and then using them to reverse engineer the PUF’s secret. Note that encryption does
not prevent the adversary from manipulating the helper data, and carrying out
6 PUF-Based Authentication 155
overlaps with those chosen for other tokens, but with no more than 50 % over-
lapping with any one token. This policy prevents the challenges used in the Authen
Phase during in-field authentication from being used to track the token (explained
further below). The set of PN {PNy} generated in the Authen Phase are also stored,
along with the challenge vectors, in the secure database under IDi. The number of
structural paths for the data path component of SHA-3 is larger than 860,000, with
more than 80 % testable, so the set of challenge vectors available is large. Note that
the task of generating 2-vector tests for all paths is likely to be computationally
infeasible for even moderately sized functional units. However, it is feasible and
practical to use random vectors and ATPG to target random subsets of paths for the
enrollment requirements.
6 PUF-Based Authentication 157
The cardinality of {PNy} is approx. twice that of {PNj} at 8192 but both are
relatively small because the parameters, particularly the Path-Selection-Mask, allow
an exponential number of different combinations to be constructed over successive
authentications. The example from Sect. 6.5.4.7 uses the Path-Selection-Mask to
select 50 PN per challenge. In this case, the number of challenges that need to be
applied in the ID and Authen Phases during enrollment is approx. 80 and 160, resp.
The protocol for token authentication is shown in the bottom portion of
Fig. 6.17. The token initiates the process by generating and sending a nonce n1 to
the server. The server generates a nonce n2 and transmits the fixed set of challenges
{ck} and n2 to the token. The concatenated nonce n1 with n2 is used as input to a
hash function and a SelPar function is used to derive the Mod, S, µ, Rng, Mar
parameters from the hash output m. The SelPar function selects bit fields in the hash
output m for use in a table lookup operation to pseudo-randomly constrain the Mod
and Mar parameters to a specific set of values (as given in Fig. 6.11). Other bit
fields are used to define µ and Rng, constrained, in this case, to a range of
fixed-point values. The same SelPar operation is carried out on the server. The hash
function limits the amount of the control an adversary has over picking specific
values for these parameters in an attack scenario in which the adversary has pos-
session of the token. This component of the protocol is similar to the strategy
proposed for the Slender PUF Protocol described in Sect. 6.4 [133] but is used there
for challenge selection.
The set {ck} of challenges are applied to the PUF to generate the set {PN′j}. The
difference, TVComp and modulus operations shown on the right side of Fig. 6.4 are
applied to {PN′j} to generate the set {MPNDc′j}. Bitstring generation using the
single helper data scheme, BitGenS, is then performed as described in Sect. 6.5.4.4
using the Mar parameter. BitGenS produces a strong bitstring SBS′ and helper data
string hd′, which are both transmitted to the server.
A search process is carried out on the server, where the {PNj}i data for each
token i in the database is processed in a similar fashion. However, bitstring gen-
eration is carried out using the dual helper data scheme (BitGenD). BitGenD returns
an SBS computed using the server data and a modified bitstring SBS″, which is a
reduced-in-size version of the token’s SBS′ (see Sect. 6.5.4.4 for details). The
search process terminates when the number of bits that differ in SBS and SBS″ is
less than a tolerance ε (which may be zero) or the database is exhausted. In the
former case, the token identifier IDi is passed to the Authen Phase. Otherwise,
authentication terminates with failure at the end of the ID Phase.
Note that token privacy is preserved in the ID Phase because, with high prob-
ability, the transmitted information SBS′ and hd′ will be different from one run of
the protocol to the next, given the diversity of the parameter space provided by
Mod, S, µ, Rng, Mar, and Path-Select-Mask. Also note that this is a
compute-intensive operation for large databases because the difference, TVComp,
modulus, and BitGenD operations must be applied to each server data set. How-
ever, the search operation can be carried out in parallel on multiple CPUs given the
independence of the operations. Trial run experiments without any type of explicit
158 J. Plusquellic
parallelism yields runtimes of 200 us per database entry using a database of 10,000
elements when evaluated on an Intel i7-4702HQ CPU @ 2.2 GHz running Linux.
The Authen Phase is not shown but is identical to the ID Phase with the
following exceptions. The subset of 80 token-specific challenges {c1} is randomly
selected from the larger set of 160 in {cx} that were applied during enrollment. As
indicated earlier, the 160 challenges selected for a token overlap with those selected
for other tokens, making it impossible for adversaries to track specific tokens across
multiple authentications. A second difference is that the Authen Phase represents
the mutual authentication step, in which the server is authenticated to the token.
Therefore, the server generates the SBS′ and hd′ using the Single Helper Data
scheme, which is then transmitted to the token, and the token implements the Dual
Helper Data scheme and fuzzy match operations (opposite to that shown in
Fig. 6.17). This is possible in a resource-constrained environment because of the
symmetry in energy requirements of the proposed error avoidance schemes, i.e., the
work performed by the Single Helper Data and Dual Helper Data schemes are
nearly the same. Note that an optional third phase can be implemented to carry out a
second token authentication using the {cx} challenges if needed.
6.8 Conclusion
References
1. Goertzel, K.M.: Integrated circuit security threats and hardware assurance countermeasures.
In: Real-Time Information Assurance, CrossTalk, Nov/Dec 2013
2. Pope, S., Cohen, B.S., Sharma, V., Wagner, R.R., Linholm, L.W., Gillespie, S.: Verifying
Trust for Defense Use Commercial Semiconductors
3. Grand Challenges for Engineering. http://www.engineeringchallenges.org/cms/8996/9042.
aspx
160 J. Plusquellic
4. Defense Science Board Task Force On High Performance Microchip Supply, Office of the
Under Secretary of Defense. http://www.acq.osd.mil/dsb/reports/2005-02-HPMS_Report_
Final.pdf. Accessed Feb 2005
5. Dean Collins: TRUST, A Proposed Plan for Trusted Integrated Circuits. http://www.
stormingmedia.us/95/9546/A954654.html
6. Senator Joe Lieberman: National Security Aspects of the Global Migration of the U.S.
Semiconductor Industry. http://lieberman.senate.gov/documents/whitepapers/semiconductor.
pdf. Accessed June 2003
7. TRUST in Integrated Circuits (TIC). http://www.darpa.mil/mto/solicitations/baa07-24/index.html
8. National Cyber Leap Year Summit 2009: Co-Chairs’ Report. http://www.qinetiq-na.com/
Collateral/Documents/English-US/InTheNews_docs/National_Cyber_Leap_Year_Summit_
2009_Co-Chairs_Report.pdf. Accessed 16 Sept 2009
9. Integrity and Reliability of Integrated Circuits. DARPA-BAA-10-33 (2010)
10. Trusted Integrated Chips (TIC). IARPA-BAA-11-09 (2011)
11. Bureau of Industry and Security, U.S. Department of Commence. Defense Industrial Base
Assessment: Counterfeit Electronics. http://www.bis.doc.gov/index.php/forms-documents/
doc_download/37-defense-industrial-base-assessment-of-counterfeit-electronics-2010
12. Grow, B., Tschang, C.-C., Edwards, C., Burnsed, B.: Dangerous fakes. Businessweek. http://
www.businessweek.com/stories/2008-10-01/dangerous-fakes (2008)
13. Kessler, L.W., Sharpe, T.: Faked parts detection. http://www.circuitsassembly.com/cms/
component/content/article/159/9937-smt (2010)
14. Stradley, J., Karraker, D.: The electronic part supply chain and risks of counterfeit parts in
defense applications. IEEE Trans. Compon. Packag. Technol. 29(3), 703–705 (2006)
15. Ke, H., Carulli, J.M., Makris, Y.: Counterfeit electronics: a rising threat in the semiconductor
manufacturing industry. In: International Test Conference (ITC), pp. 1–4 (2013)
16. Gassend, B., Clarke, D.E., van Dijk, M., Devadas, S.: Controlled physical random functions.
In: Conference on Computer Security Applications, pp. 149–160 (2002)
17. https://en.wikipedia.org/wiki/Information_security
18. Menezes, A.J., van Oorschot, P.C., Vanstone, S.A.: Handbook of Applied Cryptography.
CRC Press. ISBN 0-8493-8523-7. http://cacr.uwaterloo.ca/hac/. Accessed Oct 1996
19. Maes, R.: Physical Unclonable Functions, Constructions, Properties and Applications.
Springer (2013). ISBN 978-3-642-41394-0
20. Barker, E., Kelsey, J.: Recommendation of random number generation using deterministic
random bit generators. NIST SP800-90A. https://en.wikipedia.org/wiki/NIST_SP_800-90A
21. Aysu1, A., Gulcan, E., Moriyama, D., Schaumont, P., Yung, M.: End-to-end design of a
PUF-based privacy preserving authentication protocol. In: CHES (2015)
22. https://en.wikipedia.org/wiki/Cryptographic_hash_function
23. https://en.wikipedia.org/wiki/SHA-3
24. https://en.wikipedia.org/wiki/Secure_Hash_Algorithm
25. http://www.nist.gov/manuscript-publication-search.cfm?pub_id=919061
26. Dodis, Y., Reyzin, L., Smith, A.: Fuzzy extractors: how to generate strong keys from
biometrics and other noisy data. In: Advances in Cryptology (EUROCRYPT), pp. 523–540
(2004)
27. Dodis, Y., Ostrovsky, R., Reyzin, L., Smith, A.: Fuzzy extractors: how to generate strong
keys from biometrics and other noisy data. SIAM J. Comput. 38(1), 97–139 (2008)
28. Van Herrewege, A., Katzenbeisser, S., Maes, R., Peeters, R., Sadeghi, A.-R., Verbauwhede,
I., Wachsmann, C.: Reverse fuzzy extractors: enabling lightweight mutual authentication for
PUF-enabled RFIDs. Lecture Notes in Computer Science, vol. 7397, pp. 374–389 (2012)
29. https://en.wikipedia.org/wiki/Binomial_distribution
30. NIST: Computer Security Division, Statistical Tests. http://csrc.nist.gov/groups/ST/toolkit/
rng/stats_tests.html
31. Needham, R., Schroeder, M.: Using encryption for authentication in large networks of
computers. Commun. ACM 21(12), 993–999 (1978)
6 PUF-Based Authentication 161
32. Lenstra, A.K., Hughes, J.P., Augier, M., Bos, J.W., Kleinjung, T., Wachter, C.: Ron was
wrong, whit is right. Cryptology ePrint Archive, Report 2012/064 (2012)
33. Torrance, R., James, D.: The state-of-the-art in IC reverse engineering. In: Lecture Notes in
Computer Science (LNCS), Workshop on Cryptographic Hardware and Embedded Systems,
vol. 5747, pp. 363–381 (2009)
34. Kocher, P.C., Jaffe, J., Jun, B.: Differential power analysis. In: Lecture Notes in Computer
Science (LNCS), Advances in Cryptology, vol. 1666, pp. 388–397 (1999)
35. Lofstrom, K., Daasch, W.R., Taylor, D.: IC identification circuits using device mismatch. In:
International Solid State Circuits Conference, pp. 372–373 (2000)
36. Puntin, D., Stanzione, S., Iannaccone, G.: CMOS unclonable system for secure authenti-
cation based on device variability. In: Conference on Solid-State Circuits Conference,
pp. 130–133 (2008)
37. Stanzione, S., Iannaccone, G.: Silicon physical unclonable function resistant to a 1025-trial
Brute Force Attack in 90 nm CMOS. In: Symposium VLSI Circuits, pp. 116–117 (2009)
38. Pappu, R.: Physical one-way functions. Ph.D. thesis, MIT, ch. 9, 2001
39. Pappu, R.S., Recht, B., Taylor, J., Gershenfeld, N.: Physical one-way functions. Science 297
(6), 2026–2030 (2002)
40. Gassend, B., Clarke, D.E., van Dijk, M., Devadas, S.: Silicon physical random functions. In:
Conference on Computer and Communications Security, 148–160 (2002)
41. Lee, J.W., Lim, D., Gassend, B., Suh, G.E., van Dijk, M., Devadas, S.: A technique to build
a secret key in integrated circuits for identification and authentication applications. In:
Symposium of VLSI Circuits, pp. 176–179 (2004)
42. Lim, D.: Extracting secret keys from integrated circuits. M.S. thesis, MIT, 2004
43. Lim, D., Lee, J.W., Gassend, B., Suh, G.E., van Dijk, M., Devadas, S.: Extracting secret
keys from integrated circuits. Trans. Very Large Scale Integr. Syst. 13(10), 1200–1205
(2005)
44. Suh, G.E., Devadas, S.: Physical unclonable functions for device authentication and secret
key generation. In: Design Automation Conference, pp. 9–14 (2007)
45. Majzoobi, M., Koushanfar, F., Potkonjak, M.: Lightweight secure PUFs. In: Conference on
Computer-Aided Design (2008)
46. Majzoobi, M., Koushanfar, F., Potkonjak, M.: Testing techniques for hardware security. In:
International Test Conference, pp. 185–189 (2008)
47. Ozturk, E., Hammouri, G., Sunar, B.: Physical unclonable function with tristate buffers. In:
Symposium on Circuits and Systems, pp. 3194–3197 (2008)
48. Ozturk, E., Hammouri, G., Sunar, B.: Towards robust low cost authentication for pervasive
devices. In: Conference on Pervasive Computing and Communications, pp. 170–178 (2008)
49. Gassend, B., Van Dijk, M., Clarke, D., Torlak, E., Devadas, S., Tuyls, P.: Controlled
physical random functions and applications. ACM Trans. Inf. Syst. Secur. 10(4) (2008)
50. Devadas, S., Suh, E., Paral, S., Sowell, R., Ziola, T., Khandelwal, V.: Design and
implementation of PUF-based ‘Unclonable’ RFID ICs for anti-counterfeiting and security
applications. In: Conference on RFID, pp. 58–64 (2008)
51. Qu, G., Yin, C.: Temperature-aware cooperative ring oscillator PUF. In: Workshop on
Hardware-Oriented Security and Trust, pp. 36–42 (2009)
52. Maiti, A., Schaumont, P.: Improving the quality of a physical unclonable function using
configurable ring oscillators. In: Conference on Field Programmable Logic and Applications,
pp. 703–707 (2009)
53. Maiti, A., Casarona, J., McHale, L., Schaumont, R.: A large scale characterization of
ROPUF. In: Symposium on Hardware-Oriented Security and Trust, pp. 94–99 (2010)
54. Hori, Y., Yoshida, T., Katashita, T., Satoh, A.: Quantitative and statistical performance
evaluation of arbiter physical unclonable functions on FPGAs. In: Conference on
Reconfigurable Computing and FPGAs, pp. 298–303 (2010)
55. Yin, C.-E.D., Qu, G.: LISA: maximizing RO PUF’s secret extraction. In: Symposium on
Hardware-Oriented Security and Trust, pp. 100–105 (2010)
162 J. Plusquellic
56. Costea, C., Bernard, F., Fischer, V., Fouquet, R.: Analysis and enhancement of ring
oscillators based physical unclonable functions in FPGAs. In: Conference on Reconfigurable
Computing and FPGAs, pp. 262–267 (2010)
57. Majzoobi, M., Koushanfar, F., Devadas, S.: FPGA PUF using programmable delay lines. In:
Workshop on Information Forensics and Security, pp. 1–6 (2010)
58. Xin, X., Kaps, J., Gaj, K.: A configurable ring-oscillator-based PUF for Xilinx FPGAs. In:
Conference on Digital System Design, pp. 651–657 (2011)
59. Qingqing, C., Csaba, G., Lugli, P., Schlichtmann, U., Ruhrmair, U.: The bistable ring PUF: a
new architecture for strong physical unclonable functions. In: Symposium on
Hardware-Oriented Security and Trust, pp. 134–141 (2011)
60. Qingqing, C., Csaba, G., Lugli, P., Schlichtmann, U., Ruhrmair, U.: Characterization
of the bistable ring PUF. In: Design, Automation & Test in Europe Conference,
pp. 459–1462 (2012)
61. Mansouri, S.S., Dubrova, E.: Ring oscillator physical unclonable function with multi level
supply voltages. In International Conference on Computer Design, pp. 520–521 (2012)
62. Addabbo, T., Fort, A., Mugnaini, M., Rocchi, S., Vignoli, V.: Statistical characterization of a
FPGA PUF module based on ring oscillators. In: Instrumentation and Measurement
Technology Conference, pp. 1770–1773 (2012)
63. Maiti, A., Inyoung, K., Schaumont, P.: A robust physical unclonable function with enhanced
challenge-response set. Trans. Inf. Forensics Secur 7(1), Part: 2, pp. 333–345 (2012)
64. Meng-Day, Y., Sowell, R., Singh, A., M’Raihi, D., Devadas, S.: Performance metrics and
empirical results of a PUF cryptographic key generation ASIC. In: Symposium on
Hardware- Oriented Security and Trust, pp. 108–115 (2012)
65. Maeda, S., Kuriyama, H., Ipposhi, T., Maegawa, S., Inoue, Y., Inuishi, M., Kotani, N.,
Nishimura, T.: An artificial fingerprint device (AFD): a study of identification number
applications utilizing characteristics variation of polycrystalline silicon TFTs. Trans.
Electron Dev. 50(6), 1451–1458 (2003)
66. Simpson, E., Schaumont, P.: Offline hardware/software authentication for reconfigurable
platforms. In: Cryptographic Hardware and Embedded Systems, vol. 4249, Oct 2006,
pp. 10–13
67. Habib, B., Gaj, K., Kaps, J.-P.: FPGA PUF based on programmable LUT delays. In:
Euromicro Conference on Digital System Design (DSD), pp. 697–704 (2013)
68. Guajardo, J., Kumar, S.S., Schrijen, G.-J., Tuyls, P.: Physical unclonable functions and
public key crypto for FPGA IP protection. In: Conference on Field Programmable Logic and
Applications, 189–195 (2007)
69. Su, Y., Holleman, J., Otis, B.: A 1.6pJ/bit 96 % stable chip ID generating circuit using
process variations. In: International Solid State Circuits Conference, pp. 406–407 (2007)
70. Guajardo, J., Kumar, S.S., Schrijen, G., Tuyls, P.: Brand and IP protection with physical
unclonable functions. In: Symposium on Circuits and Systems, pp. 3186–3189 (2008)
71. Kumar, S.S., Guajardo, J., Maes, R., Schrijen, G.-J., Tuyls, P.: Extended abstract: the
butterfly PUF protecting IP on every FPGA. In: Workshop on Hardware-Oriented Security
and Trust, pp. 70–73 (2008)
72. Kassem, M., Mansour, M., Chehab, A., Kayssi, A.: A sub-threshold SRAM based PUF. In:
Conference on Energy Aware Computing, pp. 1–4 (2010)
73. Bohm, C., Hofer, M., Pribyl, W.: A microcontroller SRAM-PUF. In: Conference on
Network and System Security, pp. 25–30 (2011)
74. Bhargava, M., Cakir, C., Mai, K.: Reliability enhancement of bi-stable PUFs in 65 nm bulk
CMOS. In: Workshop on Hardware-Oriented Security and Trust, pp. 79–83 (2012)
75. Alkabani, Y., Koushanfar, F., Kiyavash, N., Potkonjak, M.: Trusted integrated circuits: a
nondestructive hidden characteristics extraction approach. In: Information Hiding (2008)
76. Ganta, D., Vivekraja, V., Priya, K., Nazhandali, L.: A highly stable leakage-based silicon
physical unclonable functions. In: Conference on VLSI Design, pp. 135–140 (2011)
6 PUF-Based Authentication 163
77. Helinski, R., Acharyya, D., Plusquellic, J.: Physical unclonable function defined using power
distribution system equivalent resistance variations. In: Design Automation Conference,
pp. 676–681 (2009)
78. Helinski, R., Acharyya, D., Plusquellic, J.: Quality metric evaluation of a physical
unclonable function derived from an IC’s power distribution system. In: Design Automation
Conference, pp. 240–243 (2010)
79. Ju, J., Chakraborty, R., Rad, R., Plusquellic, J.: Bit string analysis of physical unclonable
functions based on resistance variations in metals and transistors. In: Symposium on
Hardware-Oriented Security and Trust, pp. 13–20 (2012)
80. Ju, J., Chakraborty, R., Lamech, C., Plusquellic, J.: Stability analysis of a physical
unclonable function based on metal resistance variations. In: Symposium on
Hardware-Oriented Security and Trust (HOST), pp. 143–150 (2013)
81. Ismari, D., Plusquellic, J.: IP-level implementation of a resistance-based physical
unclonable function. In: Accepted to Symposium on Hardware-Oriented Security and
Trust (HOST) (2014)
82. Chakraborty, R., Lamech, C., Acharyya, D., Plusquellic, J.: A transmission gate physical
unclonable function and on-chip voltage-to-digital conversion technique. In: Design
Automation Conference (DAC), pp. 1–10 (2013)
83. Li, J., Lach, J.: At-speed delay characterization for IC authentication and trojan horse
detection. In: International Workshop on Hardware-Oriented Security and Trust (HOST),
pp. 8–14 (2008)
84. Aarestad, J., Ortiz, P., Acharyya, D., Plusquellic, J.: HELP: a hardware-embedded delay—
based PUF. IEEE Des. Test Comput. 30(2), 17–25 (2013)
85. Aarestad, J., Acharyya, D., Plusquellic, J.: An error-tolerant bit generation technique for use
with a hardware-embedded path delay PUF. In: Symposium on Hardware-Oriented Security
and Trust (HOST), pp. 151–158 (2013)
86. Saqib, F., Areno, M., Aarestad, J., Plusquellic, J.: An ASIC implementation of a
hardware-embedded physical unclonable function. IET Comput. Digit. Tech. 8(6), 288–
299 (2014)
87. Che, W., Saqib, F., Plusquellic, J.: PUF-based authentication, invited paper. In: International
Conference on Computer Aided Design, Nov 2015
88. Kursawe, K., Sadeghi, A.-R., Schellekens, D., Skoric, B., Tuyls, P.: Reconfigurable physical
unclonable functions—enabling technology for tamper-resistant storage. In: Workshop on
Hardware-Oriented Security and Trust, pp. 22–29 (2009)
89. Rosenfeld, K., Gavas, E., Karri, R.: Sensor physical unclonable functions. In: Symposium on
Hardware-Oriented Security and Trust, pp. 112–117 (2010)
90. Xiaoxiao, W., Tehranipoor, M.: Novel physical unclonable function with process and
environmental variations. In: Conference on Design, Automation & Test in Europe,
pp. 1065–1070 (2010)
91. Lin, L., Holcomb, D., Krishnappa, D.K., Shabadi, P., Burleson, W.: Low-power
sub-threshold design of secure physical unclonable functions. In: Symposium on
Low-Power Electronics and Design, pp. 43–48 (2010)
92. Ruhrmair, U., Jaeger, C., Bator, M., Stutzmann, M., Lugli, P., Csaba, G.:
Applications of high-capacity crossbar memories in cryptography. Trans. Nanotechnol.
10(3), 489–498 (2011)
93. Simons, P., van der Sluis, E., van der Leest, E.: Buskeeper PUFs, a promising
alternative to D flip-flop PUFs. In: Symposium on Hardware-Oriented Security and
Trust (HOST), pp. 7–12 (2012)
94. Maiti, A., Schaumont, P.: A novel microprocessor-intrinsic physical unclonable function. In:
Field Programmable Logic and Applications, pp. 380–387 (2012)
164 J. Plusquellic
95. Sreedhar, A., Kundu, S.: Physically unclonable functions for embedded security based on
lithographic variation. In: Conference on Design, Automation & Test in Europe, pp. 96–105
(2012)
96. Kumar, R., Dhanuskodi, S.N., Kundu, S.: On manufacturing aware physical design to
improve the uniqueness of silicon-based physically unclonable functions. In: International
Conference on Embedded Systems, pp. 381–386 (2014)
97. Forte, D., Srivastava, A.: On improving the uniqueness of silicon-based physically
unclonable functions via optical proximity correction. In: Design Automation Conference,
pp. 7–12 (2012)
98. Meguerdichian, S., Potkonjak, M.: Device aging-based physically unclonable functions. In:
Conference on Design Automation Conference, pp. 288–289 (2011)
99. Kalyanaraman, M., Orshansky, M.: Novel strong PUF based on nonlinearity of MOSFET
subthreshold operation. In: Symposium on Hardware-Oriented Security and Trust (HOST),
pp. 13–18 (2013)
100. Rose, G.S., McDonald, N., Lok-Kwong, Y., Wysocki, B., Xu, K.: Foundations of memristor
based PUF architectures. In: IEEE/ACM International Symposium on Nanoscale Architec-
tures (NANOARCH), pp. 52–57 (2013)
101. Che, W., Bhunia, S., Plusquellic, J.: A non-volatile memory-based physically unclonable
function without helper data. In: International Conference on Computer-Aided Design
(ICCAD) (2014)
102. Yu, Z., Krishna, A.R., Bhunia, S.: ScanPUF: robust ultralow-overhead PUF using scan
chain. In: Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 626– 631
(2013)
103. Zhang, L., Kong, Z.H., Chang, C-H.: PCKGen: a phase change memory based cryptographic
key generator. In: International Symposium on Circuits and Systems (ISCAS), pp. 1444–
1447 (2013)
104. Konigsmark, S.T.C., Hwang, L.K., Deming, C., Wong, M.D.F.: CNPUF: a carbon
nanotube-based physically unclonable function for secure low-energy hardware design. In:
Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 73–78 (2014)
105. Zhang, F., Henessy, A., Bhunia, S.: Robust counterfeit PCB detection exploiting intrinsic
trace impedance variations. In: VLSI Test Symposium, Apr 2015
106. Areno, M., Plusquellic, J.: Securing trusted execution environments with PUF generated
secret keys. In: TrustCom (2012)
107. Areno, M., Plusquellic, J.: Secure mobile association and data protection with enhanced
cryptographic engines. In: PRISMS (2013)
108. Guajardo, J., Kumar, S.S., Schrijen, G.T., Tuyls, P.: FPGA intrinsic PUFs and their use for
IP protection. Cryptogr. Hardware Embedded Syst. 4727, 63–80 (2007)
109. Rührmair, U., Busch, H., Katzenbeisser, S.: Strong PUFs: models, constructions, and
security proofs. In: Sadeghi, A.-R., Naccache, D. (eds.) Towards Hardware-Intrinsic
Security, pp. 79–95. Springer (2010)
110. Gassend, B., Lim, D., Clarke, D., van Dijk, M., Devadas, S.: Identification and
authentication of integrated circuits. Concurr. Comput. 16(11), 1077–1098 (2004)
111. Majzoobi, M., Koushanfar, F., Potkonjak, M.: Testing techniques for hardware security. In:
International Test Conference, pp. 1–10 (2008)
112. Paral, Z., Devadas, S.: Reliable and efficient PUF-based key generation using pattern
matching. In: Symposium on Hardware-Oriented Security and Trust, pp. 128–133 (2011)
113. Lamech, C., Aarestad, J., Plusquellic, J., Rad, R., Agarwal, K.: REBEL and TDC: two
embedded test structures for on-chip measurements of within-die path delay variations. In:
International Conference on Computer-Aided Design, pp. 170–177 (2011)
114. Tiri, K., Verbauwhede, I.: A logic level design methodology for a secure DPA resistant
ASIC or FPGA implementation. In: DATE, pp. 246–251 (2004)
6 PUF-Based Authentication 165
115. Tiri, K., Verbauwhede, I.: A digital design flow for secure integrated circuits. IEEE Trans.
Comput.-Aided Des. Integr. Circ. Syst. 25(7), 1197–1208 (2006)
116. Ranasinghe, D.C., Engels, C.W., Cole, P.H.: Security and privacy: modest proposals for
low-cost RFID systems. In: Auto-ID Labs Research Workshop (2004)
117. Delvaux, J., Gu, D., Schellekens, D., Verbauwhede, I.: Secure lightweight entity
authentication with strong PUFs: mission impossible? In: CHES, pp. 451–475 (2014)
118. Rührmair, U., Sehnke, F., Solter, J., Dror, G., Devadas, S., Schmidhuber, J.: Modeling
attacks on physical unclonable functions. In: Conference on Computer and Communications
Security, pp. 237–249 (2010)
119. Van Herrewege, A., Katzenbeisser, S., Maes, R., Peeters, R., Sadeghi, A.-R., Verbauwhede,
I., Wachsmann, C.: Reverse fuzzy extractors: enabling lightweight mutual authentication for
PUF-enabled RFIDs. In: International Conference on Financial Cryptography and Data
Security (2012)
120. Bolotny, L., Robins, G.: Physically unclonable function-based security and privacy in RFID
systems. In: PerCom, pp. 211–220 (2007)
121. Ozturk, E., Hammouri, G., Sunar, B.: Towards robust low cost authentication for pervasive
devices. In: PerCom, pp. 170–178 (2008)
122. Hammouri, G., Ozturk, E., Sunar, B.: A tamper-proof and lightweight authentication
scheme. Pervasive Mobile Comput. 807–818 (2008)
123. Kulseng, L.,. Yu, Z., Wei, Y., Guan, Y.: Lightweight mutual authentication and ownership
transfer for RFID systems. In: INFOCOM, pp. 251–255 (2010)
124. Sadeghi, A.-R., Visconti, I., Wachsmann, C.: Enhancing RFID security and privacy by
physically unclonable functions. In: Information Security and Cryptography, pp. 281–305
(2010)
125. Katzenbeisser, S., Unal Kocabas, Van Der Leest, V., Sadeghi, A., Schrijen, G.J., Schroder,
H., Wachsmann, C.: Recyclable PUFs: logically reconfigurable PUFs. In: CHES, pp. 374–
389 (2011)
126. Kocabas, U., Peter, A., Katzenbeisser, S., Sadeghi, A.: Converse PUF-based authentication.
In: TRUST, pp. 142–158 (2012)
127. Lee, Y.S., Kim, T.Y., Lee, H.J.: Mutual authentication protocol for enhanced RFID security
and anticounterfeiting. In: WAINA, pp. 558–563 (2012)
128. Jin, Y., Xin, W., Sun, H., Chen, Z.: PUF-based RFID authentication protocol against secret
key leakage. Lect. Notes Comput. Sci. 7235, 318–329 (2012)
129. Xu, Y., He, Z.: Design of a security protocol for low-cost RFID. In: WiCOM, pp. 1–3 (2012)
130. Lee, Y.S., Lee, H.J., Alasaarela, E.: Mutual authentication in wireless body sensor networks
based on physical unclonable function. In: IWCMC, pp. 1314–1318 (2013)
131. Yu, M.-D.M., M’Rahi, D., Verbauwhede, I., Devadas, S.: A noise bifurcation architecture
for linear additive physical functions. In: HOST, pp. 124–129 (2014)
132. Konigsmark, S.T.C., Hwang, L.K., Chen, D., Wong, M.D.F.: System-of-PUFs: multilevel
security for embedded systems. In: CODES, pp. 27:1–27:10 (2014)
133. Majzoobi, M., Rostami, M., Koushanfar, F., Wallach, D.S., Devadas, S.: Slender PUF
protocol: a lightweight, robust, and secure authentication by substring matching. In:
Symposium on Security and Privacy Workshop, pp. 33–44 (2012)
134. Delvaux, J., Gu, D., Peeters, R., Verbauwhede, I.: A survey on lightweight entity
authentication with strong PUFs. Cryptology ePrint Archive: Report 2014/977
135. Moriyama, D., Matsuo, S., Yung, M.: PUF-based RFID authentication secure and private
under complete memory leakage. IACR Cryptology ePrint Archive 2013, 712 (2013). http://
eprint.iacr.org/2013/712
136. Che, W., Saqib, F., Plusquellic, J.: A privacy-preserving, mutual PUF-based authentication
protocol. Submitted to special issue “Physical Security in Cryptography Environment”,
Cryptogr. J. http://www.mdpi.com/journal/cryptography. Accessed Aug 2016
166 J. Plusquellic
137. Das, A., Kocabas, U., Sadeghi, A.-R., Verbauwhede, I.: PUF-based secure test wrapper
design for cryptographic SoC testing. In: Design, Automation and Test in Europe, pp. 866–
869 (2012)
138. Hoffman, C., Cortes, M., Aranha, D.F., Araujo, G.: Computer security by hardware-intrinsic
authentication. In: Hardware/Software Codesign and System Synthesis, pp. 143–152 (2015)
139. Wang, X., Zheng, Y., Basak, A., Bhunia, S.: IIPS: infrastructure IP for secure SoC design.
Trans. on Comput. 64(8), 2226–2238 (2015)
140. Trimberger, S.M., Moore, J.J.: FPGA security: motivations, features, and applications. Proc.
IEEE 1248–1265 (2014)
Chapter 7
FPGA-Based IP and SoC Security
7.1 Introduction
D. Saha (✉)
A. K. Choudhury School of Information Technology,
University of Calcutta, Kolkata, India
e-mail: [email protected]
S. Sur-Kolay
Advanced Computing & Microelectronics Unit,
Indian Statistical Institute, Kolkata, India
e-mail: [email protected]
Generation of
Design Entry
Technology
Bitstream
Synthesis
Mapping
Routing
Place &
(c)
Fig. 7.1 FPGA architecture: a a slice of a CLB in Xilinx Virtex family, b FPGA design flow, and
c Xilinx FPGA chip with ultrascale architecture (Courtesy: Xilinx)
7 FPGA-Based IP and SoC Security 169
its usage. Alternatively, an SoC with multiple IP cores from different IP core ven-
dors may entirely be configured on an FPGA. An increasing number of companies
are providing IP support to FPGA.
In an FPGA, an application-specific design is optimized in performance, power,
and in size for being loaded in the smallest possible FPGA chip and is taken in the
form of a configuration bitstream file, also known as bitfile. IP for an FPGA may
be in various forms, namely HDL design, FPGA-based design after place-and-route,
stored bitfile before loading, or even bitfile running on an FPGA. The soft bitfile is
the most valuable and vulnerable IP.
Widespread usages of FPGA IPs on an FPGA-based SoC, and support for partial
reconfiguration, create causes of concerns, specifically in (i) IP exchange, (ii) iden-
tification and partial decoding of bitstream, (iii) IP management on FPGA-based
SoCs, and also (iv) detection of Trojan sources active across the IPs. Discussion on
these threats on FPGA-based SoCs appears in Sect. 7.5.
(a) encryption of the bitstream to prevent cloning, reverse engineering, and integrity
check of the bitstream to detect malicious modification;
(b) appropriate design of cryptoprocessor unit on an FPGA to protect secret infor-
mation from side-channel attacks;
(c) embedding in the design a signature that is tamper resistant as well as resilient
against reverse engineering to protect the IP against counterfeiting;
(d) applying techniques for detection of Trojans.
These mechanisms ensure trusted use of an IP. On one hand, only the genuine IP
core vendor gets the patent by correctly proving his ownership and also the desired
royalty fees for each legal IP instance as only an authorized user can access his IP
core. On the other, an IP core purchased by a legal buyer is of desired IP value ensur-
ing protection of buyer’s right [3]. Some of the above-mentioned security aspects are
usually included in standard FPGA products, whereas the rest are in research domain.
The FPGA products are still vulnerable to the threats which have not been covered
by the security aspects in them, along with other new types of attacks.
Several surveys on FPGA security, such as [4–6], have on a number of security
challenges and their countermeasures. In the current perspective of partial recon-
figurability of FPGAs and FPGA-based SoCs, this chapter highlights the present
state-of-the-art of FPGA security and existing vulnerabilities.
In commercial FPGAs, the following techniques as countermeasures have been
introduced to enhance FPGA security [7].
∙ The bitfile (i.e., the .bit file) generated in the final stage of a design tool for FPGA
is difficult to read, and the bitstream in the bitfile for Xilinx starting from the
old Virtex-II family to the recent Virtex-7 series remains encrypted, and therefore
cloning of the design is prevented.
∙ In Virtex-6, Spartan-6, and other recent families from Xilinx, a unique but public
57-bit device identifier, known as Device DNA, is programmed into a one-time
programmable (OTP) e-fuse to uniquely identify an FPGA device. It attempts
to make the encryption device-specific so that the encrypted bitfile cannot be
decrypted and utilized in other chips.
172 D. Saha and S. Sur-Kolay
In spite of several security measures adopted in the (i) manufacturing flow, (ii)
design tools, (iii) during configuration, and (iv) operation of FPGAs, all the threats
described in Sect. 7.1.3 cannot be tackled. While countermeasures to the threats
T1, T2, and T5 have been implemented, several research proposals are there for T3
and T4. Although the security against the risk in threat T6 for HTH has only been
enhanced, the risk is still alive. An unbiased verification team within an FPGA ven-
dor for verifying their design tool and testing their device can help to tackle threats
T7 and T8.
Moreover, few new attacks have been introduced into the security mechanisms
incorporated. For example, the entire secret key for decryption has already been
recovered by analyzing side-channel information. After measuring the power con-
sumption of a single power-up of Xilinx Virtex-II Pro, all the three different keys
7 FPGA-Based IP and SoC Security 173
used by its triple DES encryption module could be retrieved. The full 128-bit AES
key [8] of an Altera Stratix II has been discovered by applying side-channel analy-
sis with 30,000 measurements in less than three hours. Some internals of hard-
ware crypto engines of the corresponding Xilinx and Altera devices have also been
revealed through these attacks. A keyed test mechanism has already been discovered
for enabling readback for Microsemi FPGAs.
A large number of possible Trojan sources in a design tool as well as in the hard-
ware render complete assurance of a trusted environment to be difficult. The bitfile
with high IP value is more vulnerable for such Trojan intrusion. Of late, attackers
can even interpret the format of a bitfile and insert Trojans very effectively.
A rich adversary may employ costlier attacks like tampering of configuration data
by applying radiation or physical stress. Nowadays, high-profit applications also uti-
lize FPGA IPs. Therefore, these costlier attacks become feasible as an adversary can
gain monetarily notwithstanding the cost of the attacks. Techniques with low perfor-
mance overhead to counter costlier attacks are on demand.
Sometimes, certain security measures to counter various types of attacks may be
conflicting with each other—such as encryption and trustworthy signature verifica-
tion, encryption and reconfigurability.
In the platform of partially reconfigurable FPGA-based SoCs and embedded sys-
tems, some security mechanisms have been adopted to cope up with the increased
vulnerability to attacks mostly targeting the content of memory, the operations of
processors and configuration controller, interfaces and data transmission on a sys-
tem. But, more effective and efficient measures are on demand. Further, during distri-
bution of multiple IPs to several IP tool vendors, key management is quite complex,
and the possibility of partial interception of IPs remains to be handled properly.
The above-mentioned vulnerabilities to attacks are still alive and new security
holes are being introduced. Therefore, IP security in FPGA domain has interesting
challenges and needs special attention.
For an FPGA design, the corresponding bitfile core is to be kept encrypted when
it is shipped or transmitted to its buyer and also at buyers’ site to prevent cloning
as well as reverse engineering of the bitfile. The encryption algorithm to be used
must be fast, robust against cryptanalysis and the decryption hardware implementing
the decryption algorithm must consume low area and low power, is of high speed
and robust against side-channel attack. Besides encryption, this section discusses
other crypto primitives for authentication, integrity check and freshness of bitstream,
generation of chip-specific encryption keys and partial encryption.
174 D. Saha and S. Sur-Kolay
For bitstream encryption, symmetric key encryption is used. The configuration bitfile
is encrypted using a secret key at the vendors’ end. At the users’ end, the encrypted
configuration bitstream from some external nonvolatile memory is loaded into the
FPGA at each system power-up. The same secret key stored on-chip is used to
decrypt the configuration file. Both encryption of bitstream and storing of the key
for decryption take place at the IP vendors’ end and the legitimate user cannot access
the private key.
Xilinx FPGAs apply either triple-data encryption standard (3DES) or 256-bit
advanced encryption standard (AES) [8] in cipher block chaining (CBC) mode.
Altera Stratix II and Stratix II GX devices use 128-bit AES in counter (CTR) mode
for configuration bitstream encryption. Key length in this range provides desired
strength against attacks. CBC or CTR mode prevents the propagation of errors.
Configuration bitstream on decryption is typically stored in SRAM memory,
which facilitates higher performance, greater logic density, improved power effi-
ciency, reduced manufacturing cost, and higher flexibility of self-test. But, SRAM
is volatile, i.e., loses data at power-off. So, battery-backed SRAM is used in Xilinx
to support throughout the application life. In addition, SRAM-based memory facili-
tates fast in-site partial reconfiguration. However, the possibility of data interception
due to the external memory persists in SRAM-based memory.
Actel FPGAs of Microsemi and some other FPGAs use nonvolatile on-chip flash
resource for configuration bitstream to eliminate the risk of using external memory.
But, integration of flash memory on SRAM-based FPGA is costlier as it requires
complex fabrication steps. Use of flash memory for PR is also technologically pos-
sible and several research directions for partially reconfigurable flash memory are
available. But, the presence of configuration data permanently on flash memory-
based chip and reconfigurability of flash may cause similar security threats as in
SRAM.
For key storage, Xilinx uses either battery-backed SRAM with a key clear prop-
erty as volatile storage, or OTP e-fuse as nonvolatile storage. For enhanced secrecy
and persistence of the key in the buyers’ site, OTP nonvolatile memory (flash or
e-fuse) is preferred. The key may be programmed on-chip, or off-chip during the
regular manufacturing flow.
Encryption of bitstream has an overhead due to the additional bitstream storage
and the decryption unit on FPGA. Instead of a built-in decryptor, the decryption unit
may be configured in the configuration logic of the FPGA also.
An AES decryption unit in an FPGA must incur very low overhead in terms of area
and power. Table 7.1 shows the implementation details of a few fast and compact
7 FPGA-Based IP and SoC Security 175
AES processors in FPGAs. Two FPGA designs for AES are presented by Good and
Benaissa [9]—while one is the fastest design, the second one based on 8-bit data-
path and only 124 slices and 2 BRAMs on Spartan-II is believed to be the smallest
compared to the other designs using 32-bit datapath.
AES or any other cryptoprocessor is designed for FPGA implementation in a way
to defend leakage of the secret key through side-channel information. In order to
defend against differential power analysis (DPA)-based side-channel attacks, which
are measured in terms normalized energy deviation (NED) and normalized stan-
dard deviation (NSD), randomization of computations and equalization of consumed
power are applied in general. These techniques will be discussed in detail in another
chapter. For FPGAs, ROM-based substitution box (S-box) for AES is proposed [14]
which outperforms logic S-box in area, power, performance, and also in power-
analysis resistance. For power-analysis resistance, they propose modification of tra-
ditional ROM to create matched bitline and wordline capacitances across the mem-
ory. One tricky approach to resist any power analysis-based attack is to place a sensor
or detector circuit, which can detect whether any device is attached with the power
pin or not. In order to prevent leakage of information from electromagnetic field
measured by an antenna, the computations are distributed across the FPGA.
confidentiality and SHA-2 for integrity—may cause speed mismatch and significant
area overhead. Furthermore, such an attempt cannot ensure authentication. For confi-
dentiality, authentication, and integrity check, message authentication code (MAC)
function is to be used over encryption [8]. Alternatively, AES in Galois/Counter
mode (GCM) is preferred as fast authenticated encryption (AE) algorithm with
integrity check. It facilitates area efficiency and high-speed implementation using
dynamic PR. For encryption/decryption, counter (CTR) mode is used which is highly
parallel. For MAC-based authentication, hashing based on product–sum operation in
Galois field GF(2w ) (GHASH) [12] is used which enables faster and more compact
hardware implementation. Interleaving of CTR and GHASH in a single function
improves performance. Several implementation details of AES-GCM are given in
Table 7.1. Analyzing parallel implementations, 8-block parallel implementation is
found as a sweet point.
If the embedded secret encryption key is specific to an FPGA architecture, i.e., device
family, the user may re-sell the encrypted bitfile, which can be decrypted in any
FPGA chip of the same architecture family. In order to prevent such an attempt, the
secret encryption key is chip-specific, i.e., for each fabricated FPGA chip, a unique
key is used for encryption of the bitfiles to be loaded into that specific chip. The
authors in [17] propose to encrypt the bitfile IP core based on secure device identi-
fication. Using both public-key and secret-key cryptography, the system and the IP
exchange protocol are designed in this work. Several techniques have been developed
to generate the FPGA chip-specific secret information—either secret keys, or some
random number used as initialization vector or a seed for cryptographic primitives.
for the PUF circuit. A nonlinear transformation applied on the responses of PUFs
to generate the output ensures robustness against reverse engineering. The mixing
property of XOR logic or in general parity generator provides the resilience against
emulation and statistical guessing.
Encryption of the entire bitfile discussed so far, without any provision of encrypting a
partial bitstream, causes security concern in the case of partial reconfiguration (PR).
Earlier, authors in [23] suggested encryption of only judiciously selected portions
Analyzer Proce-
and ssing
... ... C Filter
b t
Im Ib2 It2 I1b I1t Ibn Itn Ib2 b
It2 I1 It1
Im
Cfn inc/dec
Decoder LSB Binary
Ccs
Decoder MSB Counter
Fig. 7.2 True Random Number Generator [21]: coarse blocks create delay differences over a wide
range; decoder block maps the values of the counter to the number of 1’s in the input to the program-
mable delay lines. The bit rate is 16 Mbit/sec and the propagation delay is 61.06 ns. This TRNG
core uses 128 LUTs packed into 16 Virtex-5 CLBs
7 FPGA-Based IP and SoC Security 179
of the configuration bitstream. Later on, various FPGA tools have been updated so
that those facilitate partial encryption, i.e., encryption of partial bitstreams for the
reconfigurable modules to be loaded thereafter. For Xilinx, Virtex-6 supports partial
encryption but Virtex-5 does not. BitGen pads the FPGA partial bitstream with NOP
commands so that the entire bitstream is evenly divided into AES-256 encryption
blocks and encrypted. Then, encrypted partial reconfiguration (EPRC) system is used
to perform a frame by frame CRC check before loading it.
PR has both positive and negative security aspects (Table 7.2). Loading differ-
ent cryptoprocessors using module-based PR facilitates the use of different encryp-
tion algorithms for different applications. Difference-based PR may be used if the
design for security is adaptive in nature, requiring minute changes at consecutive
times based on some controls. Both types of PR introduce the security threat of
identifying the bitstream for the cryptoprocessors or other crypto designs, such as
PUF circuit to regenerate the secret key. Therefore, the encryption algorithm should
be strong enough to prevent reverse engineering of the bitstream. There is another
concern in generation of PUF-based key. For partially reconfiguring an FPGA with
a design module including a PUF circuit, it is mandatory to bind the PUF design to a
proper location of the hardware at the time of reconfiguration; otherwise the metrics
of the PUF circuit are likely to be changed, so will its response. Thus, the PUF-based
secret key cannot be reproduced.
However, all these techniques cannot prevent a legal buyer from intentional
reselling of his encrypted bitfile core along with the corresponding FPGA hard-
ware to an unauthorized user, who can download the bitfile core into that particular
FPGA hardware. In order to prevent such events, authentication of legitimate buyers
is required by the IP vendor.
180 D. Saha and S. Sur-Kolay
Techniques for embedding signatures of the IP vendor and the legitimate buyer, and
for verification of those signatures for purpose of authentication of an FPGA-based
IPs, are discussed in this section.
The techniques for embedding signatures of the IP vendor and the legitimate buyer
are termed as watermarking and fingerprinting, respectively. Sometimes, design
tool-specific information is embedded instead of the signature of the IP vendor. A
signature embedding technique must be fast, robust against tampering, and incur low
overhead. In a design tool for FPGA, the constraint file, which is used to incorporate
constraints on objectives such as timing or on technology mapping or placement of
I/Os and logic, may be used to embed signatures.
Authors in [24] propose two protocols for embedding user- and tool- specific infor-
mation into a logic circuit while performing multilevel logic minimization and
technology mapping. It embeds additional constraints, derived uniquely from the
authors’ signature, into the problem specification, such that the final solution can be
retrieved only within a subset of the set of all solutions for multilevel logic mini-
mization and technology mapping.
Watermark may be embedded during logic synthesis through incremental tech-
nology mapping of selective disjoint closed cones [25]. A closed cone is a portion
of a logic network, which contains no outgoing edge to other logic nodes. In order to
minimize and isolate perturbations of the topology, disjoint closed cones are used as
watermark hosts. After logic synthesis, some disjoint closed cones in the optimized
circuit are selected, and re-mapped based on signature bits. In order to retrieve the
signature, the watermarked copy is needed to be compared against the original mas-
ter copy.
The technique proposed in [26] embeds the bits of the watermark in a constraint file
as the least-significant bit of the timing constraints on signal delays. It has practi-
cally zero overhead on delay. However, the watermark bits can easily be tampered.
Moreover, the watermark embedded through this technique lacks in verification pos-
sibilities.
7 FPGA-Based IP and SoC Security 181
There are several works which emphasize verification issues of the watermark
embedded in FPGA design. Centralized verification team may be biased and
182 D. Saha and S. Sur-Kolay
prove to the verifier that a “statement is true,” without revealing any knowledge other
than the veracity of the statement. The entire proof is split into two parts, say, P1 and
P2 . There are several rounds, in each of which the verifier randomly picks one of the
two parts and asks the prover to prove that part. The prover is declared successful if
the part provided by the verifier in each round is proven correctly. Failure in any one
round is taken as failure in verification of the signature.
In the ZKP-based signature verification technique Verify_ZKP, the watermark is
obtained from the public information (say, company name) IPb of the IP vendor,
encoded with a function hs representing s shifts of a nonlinear feedback shift register
(NLFSR). It is embedded as the configuration bitstring of some of the unused CLBs
of an FPGA based on a secret key KC .
The encoding key and the locations of the watermark remain private to the IP
vendor. Verify_ZKP proves with trust that the “desired watermark is present” in the
bitfile core D of the FPGA design without revealing the locations or the bitstring
in the mark. The watermark is verified in a mapped version of D to keep the mark
secure. For verifying a watermark in a mapped design, the proof part P1 is verifica-
tion of the mapping function, whereas the part P2 is verification of the presence of
the public information IPb in the watermark from the mapped design. In each round r,
r := 0
Extracted N N
r
Contents =hs+s’r(IPb)? Failed DM = MCr (D)?
Y
Y
Increment r
Y N
r < R? Verified
Fig. 7.3 Overview of interactive zero-knowledge protocol Verify_ZKP [34] for secure yet trust-
worthy signature verification
184 D. Saha and S. Sur-Kolay
the prover (IP vendor) generates a distinct mapped design DrM from D and commits it
to the verifier (buyer). A genuine IP vendor succeeds to prove P1 or P2 as demanded
by the verifier in that round. If the core D does not have the desired watermark, or the
committed (marked) design is not a mapped version of D, the probability of success
in a round is 1∕2 in either case. Therefore, after sufficiently large number of rounds
R, the cheating prover can succeed with a very low probability 1∕2R . The main steps
of the ZKP-based verification protocol Verify_ZKP are illustrated in Fig. 7.3.
In order to enforce zero-knowledge property, the mapping function MCr in each
round r should not be self-mapping, should be distinct, and have low correlation with
the mappings in the other rounds. Mapping in each round consists of two main steps:
(i) location mapping which generates a different assignment of the CLBs to the CLB
locations, and (ii) content encoding which encodes the configuration bitstring of each
CLB separately. For location mapping, space-filling curves and Latin rectangles are
used to achieve the desired properties enlisted above. For content encoding, the shifts
of the NLFSR used differ for each round. The time complexity for Verify_ZKP is
linear in the size of the design. The embedded watermark is robust against typical
attacks like tampering or deletion, finding ghost signature and additive attack. Using
partial reconfiguration, different mappings can be configured for verification. The
strength of Verify_ZKP is determined by the Pearson’s product–moment correlation
coefficients between the location of a CLB and the Manhattan distance to its loca-
tion after mapping, for all CLBs over 20 rounds of interaction. The values are of
the order of 10−2 for FPGA IWLS’05 benchmark designs implemented by Xilinx
ISE tool. Similar correlation coefficients for the content encoding are also quite low.
Verify_ZKP facilitates public verification without any additional design overhead. It
reduces design overhead due to marking by 56.2 % when compared to [31] and has
negligible CPU time requirement.
A Trojan circuit may be inserted in one or more of the following possible modes:
(i) HDL description of a design, (ii) technology mapped design, (iii) placed-and-
routed design, or (iv) the configuration bitstream for a target FPGA. An adversary
aims at inserting a Trojan in an FPGA-based design so that it remains undetected by
the design and validation tool corresponding to the product. Different techniques for
detecting HTH are highlighted below.
A widely used technique for detection of a Trojan inserted in the HDL design is
to use ring oscillators (RO) as a locking mechanism for binding an FPGA design
to a specific area of FPGA hardware. This results in a specific physical placement
of the design on the hardware. A ring oscillator is a delay loop circuit, typically
7 FPGA-Based IP and SoC Security 185
The authors of [1] propose an IP protectin (IPP) technique for detection of tampers
such as changes, deletion of existing logic, and addition of extraneous logic such
as Trojans, inserted in FPGA design files. The technique is parity-based and uses
an error-correcting code structure for this purpose. For each test vector, the parity
of outputs of the CLBs in a parity group (PG) produces one parity bit; For a test
set, a parity vector (PV) is generated for each PG. During a trust-checking phase, a
test-pattern generator (TPG) and an output response analyzer (ORA) are configured
in FPGA. The TPG is connected to the inputs of each PG of CLBs, one at a time,
and it feeds identical input/test vectors to each CLB in a parity group, while the
output vector produced by the ORA is checked against the expected PV for this PG
(Fig. 7.4). Failing to detect a desired parity relation signals the possible existence of
additional circuitry, i.e., Trojan in the FPGA design. The technique uses two-level
randomization: (a) randomization of the mapping of the parity groups to the CLB
array, and (b) randomization within each parity group of odd and even parities for
different input combinations. The two-level randomization is meant to counter the
attacks by an adversary who tries to either detect the parity groups and inject tampers
to mask each other, or tamper with the TPG and the ORA in an undetectable manner.
This method using an underlying error-correcting code and its 2-level randomiza-
tion was validated by inserting 1–10 circuit CLB tampers and 1–5 extraneous logic
CLBs in two medium-size circuits and a RISC processor circuit implemented on a
Xilinx Spartan-3 FPGA. The results of 100 % tamper detection and 0 % false alarms,
obtained at a hardware overhead of only 7–10 %, were promising. This technique can
detect extraneous logic implemented completely in some unused CLBs, as it maps
the error-correcting code in all the CLBs, irrespective of functional or not. Support
to partial reconfiguration in modern FPGAs facilitates the TPG to be connected to
the inputs of each PG of CLBs.
186 D. Saha and S. Sur-Kolay
Fig. 7.4 Detection of Trojans [1] in FPGA using error-correcting code: a test-pattern generator
(TPG) for sending test vectors to each CLB of a parity group and output response analyzer (ORA)
for checking responses against desired parity vector, b column and row parity groups in solid and
dashed line, respectively
The authors in [36] propose inclusion of Trojans directly in the unencrypted bit-
stream for an FPGA. The process of inclusion has the difficulty of understanding the
structure or format of the bitfile, but has the advantage of bypassing all the checks
in the FPGA design tool except CRC, which can also be disabled. The Trojan cir-
cuits are based on ring oscillator, which have the effect of increasing the operating
temperature, and hence causes increased device aging. The bitstrings corresponding
to these Trojan circuits are inserted into appropriate locations in the configuration
bitstream corresponding to some unused CLBs. Two different types of Trojans have
been inserted, namely (i) isolated Trojans, whose insertion is quite easy and (ii) Tro-
jans connected with the original design, which requires appropriate modification at
several locations in the bitstream for their insertion.
Applying radiation on an FPGA may cause bit flips in its memory blocks resulting
in a change in its functionality. If the supply voltage or the external clock is altered
intentionally, it may induce glitch and introduce faulty operations. These attacks may
7 FPGA-Based IP and SoC Security 187
Let us discuss now the major security issues in FPGA-based advanced architectures
such as system-on-chips, embedded systems, and cloud architectures.
An SoC usually contains reusable IPs, based on either ASIC or FPGA, embedded
processor(s) (a general-purpose processor and multiple special-purpose processors
depending on the requirements) or controller(s), memory elements such as SRAM,
ROM, bus architecture for interfacing IPs and other components on SoC, program-
mable blocks (FPGA). Figure 7.5 shows the components of an SoC, an SoC config-
ured on FPGA, and the way of IP reuse in SoC environment.
There are two possible ways of using FPGAs in an SoC. The first is inclu-
sion of FPGA blocks on a system along with other ASIC components, where the
(a) (c)
Controller ROM
Programmable Design house 1 Design house 2 Design house j
IP block (FPGA)
SoC1 SoC2 SoCj
IP IP
Fig. 7.5 a Programmable FPGA with other components on an SoC, b entire SoC configured on
an FPGA chip [6], c reuse of IPs in SoCs and in systems
188 D. Saha and S. Sur-Kolay
The first requirement in SoC design is to ensure secure distribution of FPGA IP cores
to the system developer. The objectives are to (i) assure confidentiality and authen-
ticity of the core, (ii) limit the number of instances of the core, and (iii) make every
instance operate on a specific set of devices. The second and the third objectives
combinedly aim at preventing over-deployment of an IP. Public-key cryptography
is applied along with symmetric key cryptography to ensure secure distribution of
cores.
One solution for secure distribution of a core is proposed in [37] (Fig. 7.7a). In
step 1, the CV receives the ID of the target FPGA (FID) from the SD. The CV gen-
erates a key KCV from the private key of his private–public-key pair, the public key
of FVs key pair, and the FID. This key KCV is used as symmetric key for encrypting
the bitstream of the IP core. In step 2, at the SD’s end, KCV is again generated using
public-key cryptography. The FV provides a personalization bitstream in encrypted
form, which contains the key generation function as well as the private key of the
FV. The key KCV is generated from the FV’s private key, the CV’s public key and
the FID.
In order to handle different IP cores from independent CVs, different KCVi s are gen-
erated and used accordingly. When the same core is used on multiple FPGA devices,
a distinct KCVij based on the FID of the jth device is generated by the ith CV.
Another solution is symmetric key-based pay-per-use licensing scheme [38]
(Fig. 7.6b). A TTP acts as a metering authority (MA) to generate a license to a SD
to use an IP core only once at a small fee paid by the SD to the CV. The SD does not
7 FPGA-Based IP and SoC Security 189
(a)
At CV’s place: CV’s private key + FV’s public key + FID generates K CV Symmetric
At SD’s place: CV’s public key + FV’s private key + FID generates K CV Key
(b)
CV enrolls IP to MA: stores ID(IP), K CV FV enrolls device to MA: stores K F , (K MA )KF in device
At SD’s place: SD receives (IP) KCV from CV and license (K CV )KMA from MA
License processing in device: K F , (K MA )KF K MA , (K CV)KMA K CV , (IP) KCV IP
Fig. 7.6 Secure core distribution protocols a described in [37], b described in [38]
need to make a large payment for indefinite use of the IP, so his IP remains protected
from over-use (overbuilding).
In step 1, each CV enrolls each of his IP cores with the MA, when the MA stores
the ID(IP) and the secret key KCV of the CV. Similarly, each FV enrolls each of his
devices with the MA, when the MA programs a unique device secure key KF into the
nonvolatile memory (NVM) of the device. The secret key KMA of the MA encrypted
with KF is also incorporated so that KMA can be generated within the device at the
time of license processing. In step 2, in order to build an IP on a particular FPGA
device, the SD receives the bitstream IP encrypted with KCV from CV, but he needs
a license from the MA. So, the MA sends the license containing KCV encrypted with
KMA . In step 3, during license processing, first KMA is generated as mentioned above,
and then KCV is extracted from the license for loading the bitstream IP on the device.
If the FPGA IPs are in HDL form and the system integrator needs to apply multiple
EDA tools to synthesize (and P&R) the IPs into a single chip, the encryption scenario
is more complex. One possible way is to encrypt each IP using the secret key KCV
of its vendor, then encrypt KCV using the public key of the vendor of the EDA tool,
and send both the encrypted data to the EDA tool.
The output of an EDA tool for a design phase is sent to another EDA tool for the
next design phase. This scenario has the following security risks:
(i) The secret keys of the CVs of many IPs involved, may be extracted by an
untrusted EDA tool.
(ii) One IP vendor may recover one part of the synthesized output netlist.
In order to overcome the risk in (ii), either All-or-Nothing principle is used, or a
separate set of keys other than the KCV s is used to send the output netlist to the next
EDA tool. However, the risk in (i) can only be tackled if end-to-end security can be
established from core to device using PUF response of the device. For example, a
pre-routed IP core is encrypted using the public part of a key, obtained from the IP
core and the PUF response of the device [39].
The work in [40] proposes a method for relocation of partial bitstream IPs on a
device. It ensures enhanced security in IP exchange as it does not disclose to the IP
vendor the information about the FPGA design on which an IP is to be deployed. The
190 D. Saha and S. Sur-Kolay
method receives information about the resource requirements of the IP and the bus
macros at the boundary of the IP from the IP vendor. It calculates the value of frame
address register (FAR) of the desired location for the IP in the FPGA. The system
developer SD may deploy an IP using this protocol onto a number of different FPGA
designs reconfiguring the same device, without communicating with the IP vendor
multiple times.
When an entire SoC is configured on an FPGA, various IPs obtained from differ-
ent IP vendors may not be trustworthy. The system developer needs to partition the
FPGA to allocate space to each of these FPGA-based IPs. The partitions for various
cores should be hard (not flexible). By controlling the mapping onto the device at
the floorplanning stage, spatial isolation of IP cores can be maintained in FPGA.
Fences containing buffers or unused logic are placed between the IPs to isolate their
regions. Fences are wide enough so that a single-bit failure in configuration can-
not connect the neighboring partitions. Continuous monitoring on the configuration
data, known as bitstream scrubbing, is applied particularly in the isolation fences to
detect any bit flip. Restricted use of FPGA interconnects is allowed through the fence
for communication between the IPs. Increased modularization reduces the possibil-
ity of interference and enhances the ease of checking correctness.
attempt to configure a device. But, access to the configuration controller from the
internal configuration bitstream is allowed through ICAP for self-reconfiguration.
As a consequence, proper security mechanisms are enforced at this level.
Fig. 7.7 FPGA-based design of an ECC cryptoprocessor [43]: a simplified structure of the DSP
blocks in advanced FPGAs, b l-bit multiplication circuit with a cascade of parallel DSP blocks
for P-224 (i.e., the key length is 224) using DSP cores with other FPGA resources
(Fig. 7.7) provides the highest speed among the ECC processors over prime field.
Among the area efficient compact implementations, the details of the designs in [44]
and in [45] have been provided in Table 7.3. The ECC design [44] with less than
100 slices may be considered as the smallest or most compact one implemented in
modern Virtex-5 device. Benaissa et al. in [46] achieved throughput/slice figures of
19.65, 65.30, and 64.48 (106 /(sec × slices)), respectively on Virtex-4, Virtex-5, and
Virtex-7 FPGAs.
The implementations discussed so far are made resistant against simple power
analysis at the algorithmic level, using the Montgomery ladder for modular multi-
plication [47], and by making the number of integer additions and subtractions inde-
pendent of the input values in the modular addition/subtraction component. In order
to ensure resistance against DPA, several models for elliptic curve, such as Edward
curve, binary Huff curve, have been employed. The details of DPA-resistant FPGA
implementations of Edward curve are given in several works, e.g., [48], and those of
Huff curves are in the works like [49]. These details appear in another chapter.
Cryptographic algorithms based on addition (A), rotation (R), and exclusive-or
(X) operations are classified as ARX algorithms. Many FPGA implementations of
the cryptoprocessors supporting ARX-based cryptographic primitives are present in
the literature [50] as given in Table 7.3.
7 FPGA-Based IP and SoC Security 193
The work in [51] assumes that the entire hardware component of an embedded sys-
tem is in FPGA, and a software executes on an FPGA hardware only if the hard-
ware manifests a uniqueness based on process variation and device aging. A benign
hardware Trojan (BHT) is designed as delay-logic arbiters and is implemented in
an FPGA platform supporting reconfigurability, in order to realize all the required
device aging. The output of the BHT exploiting process variation and device aging
of the hardware either enables or disables writing to particular general-purpose reg-
isters (GPR). Thus the BHT embedded in the FPGA guarantees that the software is
authorized to execute on the hardware and vice versa. It is shown in [51] that the
worst-case performance penalty is 8 % for zlib benchmark with 22 GPRs.
McIntyre et al. in IOLTS 2010 first proposed the use of a distributed software
scheduling algorithm to avoid low trust cores in a hardcoded multi-core
194 D. Saha and S. Sur-Kolay
Among the modern architectures in FPGAs, the cloud architecture is growing pop-
ular, after SoCs and embedded systems. In secure yet fast cloud computing, com-
putation logic targeting a secure hardware is separated from the code for I/O and
coordination, which may run on an untrusted hardware. A tamper-proof FPGA with
small feature size is suitable for a computation-specific processing chip. FPGA archi-
tecture, due to its partial reconfigurability, provides better power-performance ratio
and does not have the security threats caused due to cache sharing [53]. The FPGA-
based computation circuit has a strong secure guarantee. Further, the communication
channel between a processor chip and a state chip for storing a state across power
cycles, such as in a smart card, is also made secure. In order to prevent side-channel
attacks between the circuits sharing an FPGA, a supervisor module such as a TPM
is used.
7.6 Summary
In the domain of FPGAs, several IP threats are still alarming. Most of the crypto units
in FPGA are side-channel cracked. FPGA vendor-specific configuration bitstream
format, support to partial reconfigurability in modern FPGA, enhanced demand for
FPGAs on system-on-chip, increase in remote computing are the sources of open
challenges for IP and SoC security in FPGAs.
References
1. Dutt, S., Li, L.: Trust-based design and check of FPGA circuits using two-level randomized
ECC structures. ACM Trans. Reconfig. Technol. Syst. 2(1) (2009)
2. http://homepages.cae.wisc.edu/~ece554/website/Lectures/Xilinx_Vertex_Tech_s03.pdf
7 FPGA-Based IP and SoC Security 195
3. Qu, G., Potkonjak, M.: Intellectual Property Protection in VLSI Designs: Theory and Practice.
Springer, Heidelberg (2003)
4. Drimer, S.: Volatile FPGA design security–a survey. http://www.cl.cam.ac.uk/techreports/
UCAM-CL-TR-763.pdf (2008)
5. Majzoobi, M., Koushanfar, F., Potkonjak, M.: FPGA-oriented security. In: Introduction to
Hardware Security and Trust, Chapter 1. Springer (2011)
6. Trimberger, S.M., Moore, J.J.: FPGA security: motivations, features, and applications, invited
paper. Proc. IEEE 102(8) (2014)
7. McNeil, S.: Solving Today’s Design Security Concerns, Xilinx White paper FPGAs, WP365
(v1.2) July 30 (2012)
8. Menezes, A., Oorschot, P., Vanstone, S.: Handbook of Applied Cryptography. CRC Press
(1996)
9. Good, T., Benaissa, M.: AES on FPGA from the fastest to the smallest. In: CHES 2005: Pro-
ceedings of International Conference on Cryptographic Hardware and Embedded Systems,
LNCS 3659, pp. 427-440. Springer (2005)
́
10. Granado-Criado, J.M., Vega-Rodriguez, ́
M.A., Sanchez-P ́
erez, ́
J.M., Gomez-Pulido, J.A.: A
new methodology to implement the AES algorithm using partial and dynamic reconfiguration.
Integr. VLSI J. 43(1), 72–80 (2010)
11. Chu, J., Benaissa, M.: Low area memory-free FPGA implementation of the AES algorithm.
In: FPL 2012: Proceedings of International Conference on Field Programmable Logic and
Applications, pp. 623–626 (2012)
12. Hori, Y., Katashita, T., Sakane, H., et al.: Bitstream protection in dynamic partial reconfigu-
ration systems using authenticated encryption. IEICE Trans. Inf. Syst. E96-D(11), 2333–2343
(2013)
13. Abdellatif, K.M., Chotin-Avot, R., Mehrez, H.: Improved method for parallel AES-GCM cores
using FPGAs. In: Proceedings of International Conference on Reconfigurable Computing and
FPGAs, pp. 1–4 (2013)
14. Teegarden, C., Bhargava, M., Mai, K.: Side-channel attack resistant ROM-based AES S-box.
In: HOST 2010: Proceedings of IEEE International Symposium on Hardware-Oriented Secu-
rity and Trust, pp. 124–129 (2010)
15. Drimer, S., Kuhn, M.G.: A protocol for secure remote updates of FPGA configurations. In: Pro-
ceedings of International Workshop on Applied Reconfigurable Computing, Reconfigurable
Computing: Architectures, Tools and Applications, pp. 50–61. Springer, Berlin (2009)
16. Vliege, J., Mentens, N., Verbauwhede, I.: A single-chip solution for the secure remote config-
uration of FPGA using bitstream compression. In: Proceedings of International Conference on
Reconfigurable Computing and FPGAs, pp. 1–6 (2013)
17. Adi, W., Ernst, R., Soudan, B., Hanoun, A.: VLSI design exchange with intellectual property
protection in FPGA environment using both secret and public-key cryptography. In: ISVLSI
2006: Proceedings of IEEE Computer Society Annual Symposium on VLSI, pp. 24–29 (2006)
18. Morozov, S., Maiti, A., Schaumont, P.: An analysis of delay based PUF implementations on
FPGA. In: Proceedings of International Conference on Reconfigurable Computing: Architec-
tures, Tools and Applications, pp. 382–387 (2010)
19. Pappala, S., Niamat, M., Sun, W.: FPGA based trustworthy authentication technique using
physically unclonable functions and artificial intelligence. In: HOST 2012: Proceedings of
IEEE International Symposium on Hardware-Oriented Security and Trust, pp. 59–62 (2012)
20. Majzoobi, M., Koushanfar, F., Potkonjak, M.: Techniques for design and implementation of
secure reconfigurable PUFs. ACM Trans. Reconfig. Technol. Syst. 2(1) (2009)
21. Majzoobi, M., Koushanfar, F., Devadas, S.: FPGA-based true random number generation using
circuit metastability with adaptive feedback Control. In: CHES 2011: Proceedings of Crypto-
graphic Hardware and Embedded Systems, LNCS 6917, pp. 17–32. Springer (2011)
22. Varchola, M., Drutarovsky, M., Fischer, V.: New universal element with integrated PUF and
TRNG capability. In Proceedings of International Conference on Reconfigurable Computing
and FPGAs, pp. 1–6 (2013)
196 D. Saha and S. Sur-Kolay
23. Yip, K., Ng, T.: Partial-encryption technique for intellectual property protection of FPGA-
based products. IEEE Trans. Consum. Electr. 46(1), 183–190 (2000)
24. Kirovski, D., Hwang, Y., Potkonjak, M., Cong, J.: Protecting combinational logic synthesis
solutions. IEEE Trans. Comput.-Aided Des. Integr. Circuit Syst. 25(12), 2687–2696 (2006)
25. Cui, A., Chang, C.H., Tahar, S.: IP watermarking using incremental technology mapping. IEEE
Trans. Comput.-Aided Des. Integr. Circuit Syst. 27(9), 1565–1570 (2008)
26. Jain, A., Yuan, L., Puri, P., Qu, G.: Zero overhead watermarking technique for FPGA designs.
In: GLSVLSI 2003: Proceedings of ACM Great Lakes symposium on VLSI, pp. 147–152
(2003)
27. Lach J., Mangione-Smith, W.H., Potkonjak, M.: Fingerprinting techniques for field-
programmable gate array intellectual property protection. IEEE Trans. Comput.-Aided Des.
Integr. Circuit Syst. 20(10), 1253–1261 (2001)
28. Saha, D., Sur-Kolay, S.: Robust intellectual property protection of VLSI physical design. J.
IET Comput. Dig. Tech. 4(5), 388–399 (2010)
29. Castillo, E., Meyer-Baese, U., Garcia, A., Parrilla, L., Lloris, A.: IPP@HDL: efficient intel-
lectual property protection scheme for IP cores. IEEE Trans. Very Large Scale Integr. (VLSI)
Syst. 15(5), 578–591 (2007)
30. Kerckhof, S., Durvaux, F., Standaert, F., Gerard, B.: Intellectual property protection for FPGA
designs with soft physical hash functions: first experimental results. In: HOST 2013: Proceed-
ings of IEEE International Symposium on Hardware-Oriented Security and Trust, pp. 7–12
(2013)
31. Qu, G.: Publicly detectable techniques for the protection of virtual components. In: Proceedings
of Design Automation Conference, pp. 474–479 (2001)
32. Ziener, D., Teich, J.: Power signature watermarking of IP cores for FPGAs. J. Signal Process.
Syst. 51(1), 123–136 (2008)
33. Kean, T., McLaren, D., Marsh, C.: Verifying the authenticity of chip designs with the design tag
system. In: HOST 2008: Proceedings of IEEE International Workshop on Hardware-Oriented
Security and Trust, pp. 59–64 (2008)
34. Saha, D., Sur-Kolay, S.: Secure public verification of IP marks in FPGA design through a
zero-knowledge protocol. IEEE Trans. VLSI (VLSI) Syst. 20(10), 1749–1757 (2012)
35. Rilling, J., Graziano, D., Hitchcock, J., et al.: Circumventing a ring oscillator approach to
FPGA-based hardware Trojan detection. In: ICCD 2011: IEEE International Conference on
Computer Design, pp. 289–292 (2011)
36. Chakraborty, R.S., Saha, I., Palchaudhuri, A., Naik, G.K.: Hardware Trojan insertion by direct
modification of FPGA configuration bitstream. IEEE Des. Test Comput. 30(2), 45–54 (2013)
37. Drimer, S., Guneysu, T., Kuhn, M.G., Paar, C.: Protecting multiple cores in a single FPGA
design. http://www.saardrimer.com/sd410/papers/protect_many_cores.pdf (2007)
38. Maes, R., Schellekens, D., Verbauwhede, I.: A Pay-per-use licensing scheme for hardware IP
cores in recent SRAM-based FPGAs. IEEE Trans. Inf. Forensics Secur. 7(1), 98–108 (2012)
39. Guajardo, J., Guneysu, T., Kumar, S.S., Paar, C.: Secure IP-block distribution for hardware
devices. In: HOST 2009: IEEE International Workshop on Hardware-Oriented Security and
Trust, pp. 82–89 (2009)
40. Ebrahim, A., Benkrid, K., Khalifat, J., Hong, C.: A platform for secure IP integration in Xilinx
Virtex FPGAs. In: International Conference on Reconfigurable Computing and FPGAs, pp.
1–6 (2013)
41. Khan, Z.U.A., Benaissa, M.: High speed ECC implementation on FPGA over GF(2m ). In:
FPL 2015: Proceedings of International Conference on Field Programmable Logic and Appli-
cations, pp. 1–6 (2015)
42. Roy, S.S., Rebeiro, C., Mukhopadhyay, D.: Theoretical modeling of elliptic curve scalar mul-
tiplier on LUT-based FPGAs for area and speed. IEEE Trans. Very Large Scale Integr. (VLSI)
Syst. 21(5), 901–909 (2013)
43. Guneysu, T., Paar, C.: Ultra high performance ECC over NIST primes on commercial FPGAs,
In: CHES 2008: Proceedings International Workshop on Cryptographic Hardware and Embed-
ded Systems, LNCS 5154, pp. 62–78. Springer (2008)
7 FPGA-Based IP and SoC Security 197
44. Basu Roy, D., Das, P., Mukhopadhyay, D.: ECC on your fingertips: a single instruction
approach for lightweight ECC design in GF(p). IACR Cryptology ePrint Archive 2015: 1225
45. Vliegen, J., Mentens, N., Genoe, J., Braeken, A., Kubera, S., Touhafi, A., Verbauwhede, I.: A
compact FPGA-based architecture for elliptic curve cryptography over prime fields. In: ASAP
2010: Proceedings of IEEE International Conference on Application-specific Systems Archi-
tectures and Processors, pp. 313–316 (2010)
46. Khan, Z.U.A., Benaissa, M.: Throughput/area-efficient ECC processor using Montgomery
point multiplication on FPGA. IEEE Trans. Circuits Syst. 62-II(11), 1078–1082 (2015)
47. Cho, S.M., Seo, S.C., Kim, T.H., Park, Y.-H., Hong, S.: Extended elliptic curve Montgomery
ladder algorithm over binary fields with resistance to simple power analysis. J. Inf. Sci. 245,
304–312 (2013)
48. Azarderakhsh, R., Reyhani-Masoleh, A.: Efficient FPGA implementations of point multipli-
cation on binary Edwards and generalized Hessian curves using Gaussian normal basis. IEEE
Trans. Very Large Scale Integr. (VLSI) Syst. 20(8), 1453–1466 (2012)
49. Chatterjee, A., Sengupta, I.: High-speed unified elliptic curve cryptosystem on FPGAs using
binary Huff curves. In: VDAT 2012: Proceedings of VISI Design and Test, LNCS 7373, pp.
243–251. Springer (2012)
50. Shahzad, K., Khalid, A., Rkossy, Z.E., Paul, G., Chattopadhyay, A: CoARX: a coprocessor for
ARX-based cryptographic algorithms. In: Proceedings of Annual Design Automation Confer-
ence, Article No. 133 (2013)
51. Zheng, J.X., Chen, E., Potkonjak, M.: A benign hardware Trojan on FPGA-based embedded
systems. In: FPL 2012: Proceedings of International Conference on Field Programmable Logic
and Applications, pp. 464–470 (2012)
52. Kliem, D., Voigt, S.-O.: Scalability evaluation of an FPGA-based multi-core architecture with
hardware-enforced domain partitioning. Microprocess. Microsyst. (2014)
53. Costan, V., Devadas, S.: Security challenges and opportunities in adaptive and reconfigurable
hardware. In: HOST 2011: Proceedings of IEEE International Symposium on Hardware-
Oriented Security and Trust (2011)
Chapter 8
Physical Unclonable Functions and
Intellectual Property Protection Techniques
8.1 Introduction
Mathematically strong cryptographic primitives and protocols assume that the under-
lying hardware is trustworthy and rely on them to store secrets. However, because
of the vulnerabilities in the hardware, an attacker can retrieve these secret keys [1].
Thus, one needs to prevent an attacker from extracting secret keys from the hard-
ware. Additionally, semiconductor companies invest billions of dollars in designing
a chip. Such designs become their intellectual property (IP), and hence, they are
called as IP designs or IP cores. However, because of the vulnerabilities in the hard-
ware design flow, one needs to prevent an attacker from stealing IP designs [2]. In
this book chapter, we will explore hardware design techniques that can thwart these
attacks.
Mobile and embedded devices, which are becoming more ubiquitous day by day,
often handle sensitive private information. Such devices need to authenticate the
user and the data, and protect against attackers who have physical access to those
devices. Furthermore, several security protocols that run on these devices require
cryptographic applications such as encryption, which require secret keys. However,
when secret keys are stored in an IC, an attacker can easily retrieve them by perform-
ing side channel attacks and/or tampering with the chip. Traditionally, secret keys are
stored in a nonvolatile electrically erasable programmable read-only memory (EEP-
ROM) or battery-backed static random access memory (SRAM). Unfortunately, such
techniques are not only prone to tampering attacks but also result in tremendous area,
power, and delay overhead, thereby increasing the cost of the chip.
To thwart such attacks, researchers have developed a security primitive referred
to as physical unclonable functions (PUFs) [3]. PUFs use random, process varia-
tions inherent in chip manufacturing to produce keys unique to the chip. PUFs are
attractive because one can use them to store secret keys in an efficient and secure
way. Section 8.2 details about the two main types of PUFs (weak- and strong-PUFs),
security metrics to evaluate their capabilities, their applications and protocols for
using PUFs, and the challenges in implementing PUF circuits.
Due to the ever increasing complexity and cost of constructing and/or maintaining a
foundry with advanced fabrication capabilities, many semiconductor companies are
becoming fabless. Such fabless companies design ICs and send them to an advanced
foundry, which is usually off-shore, for manufacturing. Also, the criticality of time-
to-market has forced companies to buy several IC/IP blocks to use them in their
systems-on-chips (SoCs). The buyers and sellers of these IP blocks are distributed
worldwide.
As the IC design flow is distributed worldwide today, hardware is susceptible
to new kinds of attacks such as counterfeiting, reverse engineering, and IP piracy
[4–7]. ICs may be recycled or remarked and sold illegally. An attacker, anywhere in
this design flow, can reverse engineer the functionality of an IC/IP. One can then steal
and claim ownership of the IP. An untrusted IC foundry may overbuild ICs and sell
them illegally. Finally, rogue elements in the foundry may insert malicious circuits
(hardware Trojans) into the design without the designer’s knowledge [8]. Because of
the IP right violations alone, the semiconductor industry loses up to $4 billion annu-
ally [2]. The annual losses due to counterfeit ICs, which include recycled, remarked,
tampered, and overproduced ICs, are estimated to be about $169 billion [9]. Such
attacks have led IP and IC designers to reevaluate trust in hardware [10].
To thwart such attacks, researchers have developed IP protection techniques:
watermarking [11], fingerprinting [12], metering [13], logic locking [6, 14], and
split manufacturing [15]. Along with these techniques, Sect. 8.3 explains the differ-
ent classes of IP protection techniques in detail and security metrics used to evaluate
their effectiveness.
8 Physical Unclonable Functions and Intellectual . . . 201
8.2.1 Motivation
PUFs are classified based on the number of challenge–response pairs (CRPs) that
they generate. The two main types of PUFs are weak-PUFs and strong-PUFs, which
are described below. Apart from these two types of PUFs, one can also consider
unique objects as a type of PUF [16]. Unique objects use random unclonable prop-
erties which require an external equipment to measure their responses.
Weak-PUFs
In this type of PUF, the number of CRPs generated is polynomial in the number of
components in the PUF circuit. The responses generated by weak-PUFs are used as
a “fingerprint” of the chip and/or as the physical keys of the chip. Thus, they are also
called physically obfuscated keys. Since the number of CRPs is only polynomial
in the number of components in the PUF, these PUFs may not be useful in certain
applications such as authentication, which requires a large number of CRPs (in the
order of millions).
Example—SRAM-based PUF. Static random access memory (SRAM) cells can
be used as a weak-PUF [17]. An SRAM cell, shown in Fig. 8.1, consists of two cross-
coupled inverters and access transistors. “A” and “B” are the two nodes in this cell,
and the voltages at these two nodes determine the output (response) of the SRAM.
When an SRAM cell is powered on, its output transitions to either 0 (AB = 01)
or 1 (AB = 10). This transition is usually driven by the strength of cross-coupled
inverters, which is, in turn, determined by process variations. The strength of the
cross-coupled inverters is dictated by different transistor parameters such as length
of the channel (Leff ), threshold voltage (Vth ), dopant concentration, etc.
202 R. Karri et al.
Strong-PUFs
In this type of PUF, the number of CRPs generated is exponential in the number of
components, and thus this type of PUF has a large number of CRPs.
Example—Arbiter-based PUF. One example of a strong-PUF is an arbiter-
based PUF shown in Fig. 8.2 [22]. An arbiter-PUF consists of N stages, where N is
the number of bits in the challenge. Each stage consists of a pair of multiplexers,
whose inputs are connected to the outputs of the previous stage as shown in Fig. 8.2.
The inputs of the first stage are tied together. The output of the last stage is fed to a
D latch, which acts as an “arbiter.”
When a challenge is applied to the arbiter-PUF, two paths are selected. A rising
edge is then applied to the input of the arbiter-PUF. Due to process variations, the
relative speed of the two paths will be different in different chips. Consequently, the
latch may hold a 0 or 1. The output of the latch is the response bit. The number of
path pairs, and hence the response bits is exponential in the number of stages (or
challenges). Thus, this PUF is considered as a strong-PUF.
The disadvantage of the arbiter PUF is that an adversary can model the PUF by
obtaining a polynomial number of challenge–response pairs using linear-delay mod-
els. This way he can predict the response of the PUF [23]. One can solve this problem
by introducing nonlinearity in PUF structure, thereby preventing an adversary from
modeling the PUF.
8 Physical Unclonable Functions and Intellectual . . . 203
One can use the following security metrics to quantify the security of PUFs:
∙ Uniqueness is defined as the Hamming distance between the responses from PUFs
in two different circuits upon applying the same challenge. This metric helps one
uniquely differentiate an IC from other ICs containing the same PUF structure. Its
ideal value should be 50 % because one can then differentiate maximum number
of ICs. Note that sometimes this metric is called inter-Hamming distance.
204 R. Karri et al.
∙ Uniformity is defined as the proportion of 1’s and 0’s in a response. It ensures the
randomness of the response. Its ideal value should be 50 % because any affinity
toward either 1 or 0 reduces the randomness in the response.
∙ Bit-aliasing is defined as the affinity of a response bit toward either 0 or 1. Because
of bit-aliasing, different PUFs may produce similar response bits. Consequently,
the responses of these PUFs will be more predictable. Ideally, the value should be
50 %.
∙ Steadiness, or robustness, is defined as the ratio of response bits that remain
unchanged at different time intervals. Ideally, the value for steadiness should be
100 %. Note that this metric is different from the uniformity metric. Steadiness
ensures that the responses are stable across different time intervals; uniformity
ensures that responses are random, making them unpredictable. Note that some-
times this metric is called intra-Hamming distance.
A comprehensive analysis of the metrics used to evaluate the security of PUFs can
be found in [27].
One can use the strong-PUF to authenticate a user. The protocol is as follows:
1. A strong-PUF is manufactured.
2. The authentication server obtains the strong-PUF. It then applies a set of ran-
domly generated challenges and records the corresponding responses. This set
of challenge–response pairs (CRPs) is used to create the CRP table.
3. The user obtains the strong-PUF.
4. Whenever the user wants to be authenticated, he sends a request to the server.
5. The server randomly picks a challenge from the CRP table and sends it to the
user.
6. The user applies this challenge to his strong-PUF and obtains the response. This
response is sent to the server.
7. The server checks if the received response matches with the one from the CRP
table. If so, the user is authenticated; otherwise, the user is not authenticated.
Though, in theory, a perfect match is required, in practice a “close” match suffices
in order tolerate errors.
8. The server deletes the used CRP from the table.
The last step is needed because an attacker can record the response, while the user
sends the correct response, and reuse it later to spoof the server if the server sends
the same challenge. Such an attack is known as the replay attack. Since there is a
finite number of CRPs in the CRP table, there is a non-negligible probability that
8 Physical Unclonable Functions and Intellectual . . . 205
the server may reuse the same challenge if the CRP is not deleted. Hence, the used
CRPs are deleted to avoid such attacks.
Since the server stores only a finite number of CRPs and a CRP is deleted after
being used, the above protocol becomes obsolete when the server runs out of CRPs.
To avoid such problems, researchers have proposed to store a compact model of the
PUF on the server [28]. The server can use this compact model to produce CRPs on
the fly.
Since a weak-PUF generates unique responses per chip, such responses can be used
to generate secret keys [23]. Such secret keys can be used by other security primitives
such as an encryption engine or a hash engine. Weak-PUF generated keys are used
for certified execution of software on a processor [29]. Weak-PUFs can also be used
to prevent piracy of IC designs (See Sect. 8.3 for more details).
8.2.5 Challenges
Attacks on PUFs
Several attacks on PUFs have been proposed in the literature [30–32]. These attacks
try to build a simulation model of the PUF, especially a strong-PUF, by monitoring
several CRPs. Such attacks use machine learning techniques. Researchers have also
developed side-channel attacks on PUFs. In this type of attack, a simulation model
of PUF is developed based on its power or delay characteristics. This model is then
used to mimic the PUF.
Reliability Issues
The response of the PUFs can vary due to environmental conditions, such as temper-
ature and voltage, and due to aging effects. For example, in the case of arbiter PUF,
a change in operating voltage changes the delays of the transistors, which in turn
affects the response bits. Such changes in response bits at run time impact the usage
of PUFs in security applications. For instance, when a weak-PUF is used to generate
secret keys, a change in the response bit results in a different key. Thus, PUFs are
required being error-prone.
To make the response of a PUF more reliable, encoding schemes and “helper
data” are provided [33–35]. Such schemes tolerate errors and improve security by
not leaking sufficient amount of secret information.
206 R. Karri et al.
PPUF is a variant of PUF. Its simulation models are made public [36–38], unlike
a PUF whose simulation models are hidden from the attacker. Although an attacker
can simulate the PPUF on a given challenge to obtain a response, the simulation time
is too large (e.g., several years) compared to the time it takes to apply a challenge
and obtain its response on the PUF primitive (e.g., a few nanoseconds).
A PPUF using XOR gates, as shown in Fig. 8.3, is constructed in [36]. Because
of process variations, different gates will have different delays. The simulation time
of the XOR PPUF is exponential in the number of rows of gates [36]. This PPUF
uses three values: the previous input to the gates in the bottom row, the current input
to those gates, and the output sampling time. When a server wants to authenticate a
user, it will send these three values. A user can apply the previous and current inputs
to the XOR gates, sample the output, and send it back to the server within a stip-
ulated time. The server can verify the output through simulation using the publicly
available simulation model of the user’s PPUF. Here, the server simulates only a pre-
determined subset of the PPUF, but not the entire PPUF. This subset is known only
the server. An attacker can only simulate the PPUF, as he does not have the PPUF
circuit. However, since the simulation time is exponential in the number of devices,
he cannot predict the correct response within the stipulated time. Thus, an attacker
cannot break the security offered by PPUF.
Similar to XOR gates, one can also use emerging technology devices to implement
PPUFs [37, 38]. These implementations have a smaller overhead when compared to
the XOR-based implementation.
A PPUF can implement two-party security protocols such as authentication, key
exchange, bit commitment, and time stamping. One cannot use PUF to implement
many of these protocols as it requires both the parties to know the challenge–response
pairs a priori.
The sensor physical unclonable function was originally proposed by Rosenfeld et al.
[39]. Rosenfeld described a technique to build a PUF which accepts light as another
input to the challenge–response generation mechanism. The uniqueness of the sensor
PUF is determined by nonhomogeneous coatings over the photodiode light sensing
elements.
A sensor PUF builds upon the concept of the PUF by introducing a sensed quan-
tity as another challenge input. The PUF is fully defined by a challenge–sensor–
response triple rather than a challenge–response pair. The concept of a sensor PUF
can be interpreted as the utilization of a standard PUF while taking advantage of the
noise associated with variable operating conditions.
An ideal implementation of a sensor-PUF should exhibit the following properties:
1. Stability. Given a fixed challenge and a fixed sensor input, the response bits
should be the same across all operating conditions for an IC.
2. No leakage. No challenge–sensor–response triple should leak information about
any other triple.
3. Manufacturer resistance. The manufacturer should have no control over the
response of the PUF due to the limits of the manufacturing process. Therefore it
should be infeasible to generate two PUFs with identical responses.
Sensor PUFs may similarly be classified as weak or strong, although the number
of sensor inputs cannot be considered as part of the challenge space in this respect.
Rather, the number of distinct sensor inputs defines the sensor resolution and can be
as simple as a binary value (i.e., whether a physical quantity exceeds some threshold)
or as complicated as the whole input space of an image sensor [40].
8.3 IP Protection
8.3.1 Motivation
While the IC design flow spans many countries, not all countries have strict laws
against intellectual property theft. Some of the few exceptions are countries such as
the USA and Japan [42–45]. Thus, every IC/IP designer bears an additional respon-
sibility to protect his/her design. If a designer can harden the functionality of an
IC while it passes through the different, potentially untrustworthy phases of the
design flow, these attacks can be thwarted [6, 7]. Or, a designer should at least be
able to track down the source of piracy, enabling him to file a litigation against the
attacker [12]. Techniques that enable a designer to achieve these objectives are col-
lectively called IP protection techniques.
8.3.2 Classification
8.3.3 Watermarking
4
1 2
14
15
13
6 12 10
5 8 16
9
11
Fig. 8.4 A motivational example for IP watermarking based on graph partitioning. Source [56]
210 R. Karri et al.
0 pairs
1 pair
2 pairs
3 pairs
4 pairs
5 pairs
6 pairs
7 pairs
8 pairs
Fig. 8.5 Watermarking: number of possible watermarks vs quality of solutions for the graph when
the following pairs of vertices are merged together: (16, 14), (6, 2), (16, 4), (9, 8), (5, 16), (9, 4),
(11, 10), (9, 4). Source [56]
Fig. 8.5. While there is only one solution for an edge-cut value of 9 (and hence, this
is not a good watermark constraint), there are 37 different solutions for an edge-cut
value of 13. There is a trade-off between number of possible solutions and the output
quality.
A watermark should have the following characteristics [11]:
1. The watermark should not alter the functionality of the design. For example,
in case of embedding the watermark during high-level synthesis, the watermark
embedded as a choice of variable-to-register mapping should yield the same func-
tionality as the original design.
2. The watermark should clearly prove the ownership of the designer. In other
words, the probability of an attacker embedding the same watermark signature
should be very low. This is possible when there is a large number (1080 ) of opti-
mal solutions, from which the designer randomly selects one as his watermark.
3. Apart from the designer, no one should be able to identify and/or remove the
watermark from the original design.
8.3.4 Fingerprinting
While watermarking enables one to identify that the design has been pirated, it does
not reveal the source of piracy. To solve this problem, fingerprinting has been intro-
duced. Here, along with the watermark of the designer, the signature of the buyer (for
instance, his public key) will be embedded into the design [12]. When challenged,
the designer can reveal the watermark to claim the ownership and the buyer’s sig-
nature to reveal the source of piracy. For example, the power, timing, or thermal
fingerprint of an IC is revealed on applying a set of input vectors.
8 Physical Unclonable Functions and Intellectual . . . 211
Logic locking1 hides the functionality and the implementation of a design by insert-
ing additional gates into the original design. In order for the design to exhibit its
correct functionality (i.e., produces correct outputs), a valid key has to be supplied
to the locked design. The gates inserted for locking are the key gates. Upon apply-
ing a wrong key, the locked design will exhibit a wrong functionality (i.e., produce
wrong outputs).
EPIC [6] incorporates logic locking into the IC design flow, as shown in Fig. 8.6.
In the untrusted design phases, the IC is locked, and its functionality is not revealed.
Post fabrication, the IP vendor activates the locked design by applying the valid key.
The keys are stored in a tamper-evident memory inside the design to prevent access
to an attacker, rendering these key inputs inaccessible to an attacker.
Logic locking prevents attacks such as IP piracy and hardware Trojans. Since the
design is locked by the designer, the foundry cannot use any copies or overproduced
ICs without the secret keys. Furthermore, it prevents an attacker from analyzing the
structural behavior of the design, thereby hindering Trojan insertion.
Logic locking techniques can prevent IP piracy, overbuilding and reverse engi-
neering attacks. Some logic locking techniques offer protection against multiple
attacks such as IP piracy attacks along with Hardware Trojan insertion [58, 62, 63].
Classification
1 Researchers have previously used the terms “logic obfuscation” [6, 14] and “logic encryption” [58–
60] for this purpose. However, echoing the call for consistent terminology by Plaza and Markov [61],
we use the term “logic locking” in this chapter.
212 R. Karri et al.
Fig. 8.6 IC design flow with logic locking [6]. The design is in the locked form in the untrusted
design regime, shown as dotted lines. Upon fabrication, the IC is activated by applying the secret
key. The attacker in the untrusted foundry can reverse engineer the design, but he can only obtain
the locked design. This prevents an attacker from pirating the design and/or identifying safe places
to insert Trojans
(a) (b)
K0
K1
G1 K1
G1
I1
I1
G2 K0
I2 G9 O1 G2 G9 O1
I2
G5 G5
G3 G8 G3 G8
G6 G10 O1 G6 G10 O2
I3 I3
G4
G4
I4 I4
G7 G7
K2 K2
Fig. 8.7 Logic locking: a The original design. b Design locked by randomly inserting key gates
XOR or XNOR. To eradicate such a simple deduction analysis between the key gate
types and the key values, following post-processing steps can be applied:
1. The netlist can be synthesized such that the XOR/XNOR key gates are replaced
with other gates like AND/OR/NAND, etc., rendering it difficult to identify the
key gates as XOR/XNOR.
2. The existing inverters in the design can be absorbed into the key gates, changing
their polarity in a manner oblivious to the attacker; similarly, additional inverters
can be added next to the key gates to change the polarity of the key gates.
Metrics
Preventing an attacker from deducing the key. Multiple attacks have been pre-
sented against existing logic locking techniques. The objective of an attacker is find
out the key used for locking the circuit [61, 66, 69–71]. Based on the capabilities of
an attacker, these attacks can be broadly classified into two types:
214 R. Karri et al.
8.3.6 Metering
2
X refers to a don’t care value. It can be freely set to either 1 or 0.
8 Physical Unclonable Functions and Intellectual . . . 215
S1 S2 S3 S4 S5
Modified
State
S0
Transistion
Black Hole H1 H2 H3 H4
States
Fig. 8.8 Hardening a controller. Approach 1: Existing states are replicated [46]. Approach 2:
State transitions are modified [14, 72–74]. Approach 3: Additional states are added [47, 50, 75].
Approach 4: Black-hole states are added [47, 50, 75]. S0 through S6 are the states in the original
FSM. All the other states are added for obfuscation. Solid edges are the state transitions in the orig-
inal FSM. Dashed edges are state transitions from an invalid state to a valid state, on applying the
valid key. Dotted edges are the state transitions from a valid state to an invalid state, on applying an
invalid key or when key is withdrawn
identified ICs are matched against their record in a database. This will reveal unreg-
istered ICs or overbuilt ICs. In active metering, parts of the IC’s functionality can
be only accessed, locked, or unlocked by the designer and/or IP rights owners [47].
The difference between metering and locking is that while metering uses a unique
unlock key per IC, locking just locks the IC.
Example: Figure 8.8 shows an example finite state machine (FSM) of a con-
troller and how it has been hardened using several metering techniques. Each node
represents a state. The edges represent the state transition. The solid edges represent
the state transitions in the original design. The dotted edges represent the ones added
for metering.
A controller can be hardened by adding extra states and/or transitions in the FSM.
The controller can be hardened by:
1. State replication: Some states in the original FSM may be replicated [46]. For
example, in Fig. 8.8, the state S2 has been replicated three times. Only the orig-
inal state S2 has an outward transition; none of the other states has an outward
transition. On applying an incorrect key, the design enters into one of the repli-
cated states. Consequently, it enters into a lock-down state.
2. Additional transitions: Additional transitions between the states of the original
design are added into the design [14, 72–74]. For example, in Fig. 8.8, the tran-
sition from S1 to S3 and S3 and S5 are added. On applying an incorrect key, the
design will skip the S2 and S4 states, thus exhibiting a different, and thus, wrong
functionality.
216 R. Karri et al.
3. Additional states: Extra states are added into the design [47, 50, 75]. For exam-
ple, in Fig. 8.8, states S7 through S31 are the additional states. On applying an
incorrect key, the design enters into one of these states and eventually enters the
reset state.
4. Black-hole states: An invalid key leads the design into invalid states via illegal
transitions, and eventually into black-hole states, where the design is stuck [47,
50, 75]. In Fig. 8.8, H1 through H4 are the black-hole states.
Leading fabless semiconductor companies such as AMD and research agencies such
as Intelligence Advanced Research Projects Agency (IARPA) have proposed split
manufacturing to protect against IP piracy and trojan insertion [15, 76]. In split man-
ufacturing, the layout of the design is split into the Front End Of Line (FEOL) layers
and Back End Of Line (BEOL) layers which are then fabricated separately in differ-
ent foundries. The FEOL layers consist of transistors and other lower metal layers
(≤M4) and the BEOL layers consist of the top metal layers (>M4). Post fabrication,
the FEOL and BEOL wafers are aligned and integrated together using either elec-
trical, mechanical, or optical alignment techniques. The final ICs are tested upon
integration of the FEOL and BEOL wafers [15, 76]. The asymmetrical nature of the
metal layers facilitates split manufacturing. The top BEOL metal layers are usually
thicker and have a larger pitch than the bottom FEOL metal layers [77]. Hence, a
designer can easily integrate the BEOL and FEOL wafers.
Figure 8.9 shows a possible split manufacturing aware IC design flow. A gate-
level netlist is partitioned into blocks which are then floorplanned and placed. The
transistors and wires inside a block form the FEOL layers. The top metal wires con-
necting the blocks and the IO ports form the BEOL layers. The BEOL and FEOL
wires are assigned to different metal layers and routed such that the wiring delay
and routing congestion are minimized. The layout of the entire design is split into
two—one layout just contains the FEOL layers and the other layout just contains the
BEOL layers. The two layouts are then fabricated in two different foundries. In one
embodiment, the FEOL layout is first fabricated and then sent to a trusted second
foundry where the BEOL layout is built on top of it [15]. In another embodiment,
the fabricated FEOL and BEOL layouts are obtained by the system integrator, and
are then integrated by using electrical, mechanical, or optical alignment techniques
and tested for defects [76].
Split manufacturing aims to improve the security of an IC as the FEOL and BEOL
layers are fabricated separately and combined post fabrication. This prevents a sin-
gle foundry (especially the FEOL foundry) from gaining full control of the IC. For
instance, without the BEOL layers, an attacker in the FEOL foundry can neither iden-
tify the safe places within a circuit to insert trojans nor pirate the designs without the
8 Physical Unclonable Functions and Intellectual . . . 217
Fig. 8.9 Split manufacturing: the layout of a design is split into two Front End-of-Line (FEOL)
and Back End-of-Line (BEOL) parts. These two parts are manufactured at two different places. This
prevents the attacker at the FEOL foundry from accessing the BEOL part. With only an incomplete
design, an attacker can neither pirate the design nor identify safe places insert hardware Trojans in it
BEOL layers. The economic benefit of split manufacturing comes from performing
the low cost BEOL layer fabrication in-house and outsourcing the expensive FEOL
layer fabrication [15, 76, 78].
Even though two foundries are involved in manufacturing a single IC, split manufac-
turing should not degrade the manufacturing quality of an IC. Recently, researchers
have proved the practicality of split manufacturing by manufacturing an FPGA in
two different foundries [79]. One can also leverage the 3D manufacturing technology
where security sensitive components can be placed in one layer and manufactured
in a trusted low-end foundry; and other components of the design can be placed in
another layer and manufactured in an untrusted high-end foundry [80]. Feasibility
of split manufacturing for analog designs has also been demonstrated [81].
The attacker in the foundry should not be able to determine the missing BEOL con-
nections from the FEOL connections. Naive split manufacturing is vulnerable to
218 R. Karri et al.
proximity attack. This attack exploits the heuristic that floorplanning and placement
(F&P) tools use to reduce the wiring (delay) between the pins to be connected [82]—
place the partitions close by and orient the partitions. This heuristic of most F&P
tools is a security vulnerability that can be exploited by an attacker in the FEOL
foundry who does not have access to the BEOL layers. Thus, the attacker simply
makes the missing connections between the two closest compatible3 pins.
To thwart such attacks, a fault-analysis-based pin swapping technique is proposed
in [83]. The idea here is to find the best set of pins to swap such that when an attacker
performs the proximity attack, the obtained netlist will be different from that of the
original netlist; the difference between the two netlists can be quantified via Ham-
ming distance of the outputs. Furthermore, instead of splitting the design at M4, one
can also split at M1, thereby increasing the effort of an attacker [84].
In [85], an algorithm to select wires for the BEOL layers is provided. A formal
notion of an attacker’s inability to figure out the missing BEOL connections is pro-
vided. However, this approach has a significant performance overhead, potentially
offsetting the benefits of a high-end FEOL foundry.
8.4 Conclusion
In this chapter, we elaborated on two important security techniques: PUFs and IP pro-
tection techniques. Weak-PUFs have already found applications in securing FPGA
designs [86]. Several companies such as Verayo [87] and Intrinsic ID [88] are try-
ing to commercialize strong-PUFs as well. While reliability challenges still exist for
PUF circuits, PPUFs, PUFs using emerging technology devices, and sensor PUFs
provide a wide variety of applications.
In case of IP protection techniques, a designer can use a technique depending
upon the threat model he faces. For instance, if he does not have access to a BEOL
foundry, he can pursue logic locking. Otherwise, he can pursue split manufacturing
to protect his design, as logic locking requires one to store keys on the chip. However,
note that split manufacturing requires the designer to trust the end user. Thus, one
needs to carefully select an IP protection technique that suits their business model.
References
1. Kocher, P., Jaffe, J., Jun, B.: Differential power analysis. Advances in cryptology (CRYPTO
99). Lect. Notes Comput. Sci. 1666, 388–397 (1999)
2. SEMI. Innovation is at risk as semiconductor equipment and materials industry loses up to $4
billion annually due to IP infringement (2008). www.semi.org/en/Press/P043775
3
Two pins are compatible if one pin is the output of a gate or an input port, and the other pin is an
input of a gate or an output port.
8 Physical Unclonable Functions and Intellectual . . . 219
3. Herder, C., Yu, M.-D., Koushanfar, F., Devadas, S.: Physical unclonable functions and appli-
cations: a tutorial. Proc. IEEE 102(8), 1126–1141 (2014)
4. Guin, Ujjwal, DiMase, Daniel, Tehranipoor, Mohammad: Counterfeit integrated circuits:
detection, avoidance, and the challenges ahead. J. Electron. Test. 30(1), 9–23 (2007)
5. Rostami, M., Koushanfar, F., Karri, R.: A primer on hardware security: models, methods, and
metrics. P. IEEE 102(8), 1283–1295 (2014)
6. Roy, J.A., Koushanfar, F., Markov, I.L.: EPIC: ending piracy of integrated circuits. IEEE/ACM
Design, Automation and Test in Europe, pp. 1069–1074 (2008)
7. Roy, J.A., Koushanfar, F., Markov, I.L.: Ending piracy of integrated circuits. Computer 43(10),
30–38 (2010)
8. Karri, R., Rajendran, J., Rosenfeld, K., Tehranipoor, M.: Trustworthy hardware: identifying
and classifying hardware Trojans. IEEE Comput. 43(10), 39–46
9. Top 5 Most Counterfeited Parts Represent a $ 169 Billion Potential Challenge for Global
Semiconductor Market. http://press.ihs.com/press-release/design-supply-chain/top-5-most-
counterfeited-parts-represent-169-billion-potential-cha
10. DARPA. Defense Science Board (DSB) study on High Performance Microchip Supply (2005).
www.acq.osd.mil/dsb/reports/ADA435563.pdf
11. Koushanfar, Farinaz, Hong, Inki, Potkonjak, Miodrag: Behavioral synthesis techniques for
intellectual property protection. ACM Trans. Des. Autom. Electron. Syst. 10(3), 523–545
(2005)
12. Caldwell, A.E., Choi, H.-J., Kahng, A.B., Mantik, S., Potkonjak, M., Qu, G., Wong, J.L.: Effec-
tive iterative techniques for fingerprinting design IP. IEEE Trans. Comput.-Aided Des. Integr.
Circuits Syst. 23(2), 208–215 (2004)
13. Koushanfar, F., Qu, G., Potkonjak, M.: Intellectual Property Metering. Information Hiding,
Workshop (2001)
14. Chakraborty, R.S., Bhunia, S.: HARPOON: an obfuscation-based soc design methodology for
hardware protection. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 28(10), 1493–
1502 (2009)
15. Intelligence Advanced Research Projects Activity. Trusted Integrated Circuits Program. https://
www.fbo.gov/utils/view?id=b8be3d2c5d5babbdffc6975c370247a6
16. Rhrmair, U., Devadas, S., Koushanfar, F.: Security Based on Physical Unclonability and Dis-
order. Introduction to Hardware Security and Trust, pp. 65–102 (2012)
17. Holcomb, D.E., Burleson, W.P., Fu, K.: Power-up SRAM state as an identifying fingerprint
and source of true random numbers. IEEE Trans. Comput. 58(9), 1198–1210 (2009)
18. Guajardo, Jorge, Kumar, Sandeep S., Schrijen, Geert-Jan, Tuyls, Pim: FPGA intrinsic PUFs
and their use for IP protection. Cryptographic Hardware Embed. Syst. 4727, 63–80 (2007)
19. Tuyls, P., Schrijen, G.-J., Kori, B., van Geloven, J., Verhaegh, N., Wolters, R.: Read-proof hard-
ware from protective coatings. Cryptographic Hardware Embed. Syst. 4249, 369–383 (2006)
20. Helinski, R., Acharyya, D., Plusquellic, J.: A physical unclonable function defined using power
distribution system equivalent resistance variations. ACM/IEEE Design Automation Confer-
ence, pp. 676–681 (2009)
21. Helinski, R., Acharyya, D., Plusquellic, J.: Quality metric evaluation of a physical unclon-
able function derived from an IC’s power distribution system. ACM/IEEE Design Automation
Conference, pp. 240–243 (2010)
22. Gassend, B., Clarke, D., van Dijk, M., Devadas, S.: Silicon physical random functions. ACM
Conference on Computer and Communications Security, pp. 148–160 (2002)
23. Suh, G.E., Devadas, S.: Physical unclonable functions for device authentication and secret key
generation. IEEE/ACM Design Automation Conference, pp. 9–14 (2007)
24. Lee, J.W., Lim, D., Gassend, B., Suh, G.E., van Dijk, M., Devadas, S.: A technique to build a
secret key in integrated circuits for identification and authentication applications. IEEE Inter-
nationall Symposium on VLSI Circuits, pp. 176–179 (2004)
25. Majzoobi, M., Koushanfar, F., Potkonjak, M.: Lightweight secure PUFs. IEEE/ACM Interna-
tional Conference on Computer-Aided Design, pp. 670–673 (2008)
220 R. Karri et al.
26. Pappu, R., Recht ,B., Taylor, J., Gershenfeld, N.: Physical one-way functions. Science
297(5589), 2026–2030 (2002)
27. Maiti, A., Gunreddy, V., Schaumont, P.: A Systematic Method to Evaluate and Compare the
Performance of Physical Unclonable Functions (2011). https://eprint.iacr.org/2011/657.pdf
28. Devadas, S.: Non-networked RFID PUF authentication. U.S. Patent 8 683 210, U.S. Patent
Appl. 12/623 045 (2008)
29. Suh, G.E., O’Donnell, C.W., Devadas, S.: Aegis: a single-chip secure processor. IEEE Des.
Test Comput. 24(6), 570–580 (2007)
30. Rührmair, U., Sehnke, F., Sölter, J., Dror, G., Devadas, S., Schmidhuber, J.: Modeling attacks
on physical unclonable functions. ACM Conference on Computer and Communications Secu-
rity, pp. 237–249 (2010)
31. Schuster, D.: Side-channel analysis of physical unclonable functions (PUFs). PhD Dissertation,
Technische Universität München (2010)
32. Wei, S., Wendt, J.B., Nahapetiany, A., Potkonjak, M.: Reverse engineering and prevention tech-
niques for physical unclonable functions using side channels. IEEE/ACM Design Automation
Conference, pp. 1–6 (2014)
33. Devadas, S., Yu, MDM.: Secure and robust error correction for physical unclonable functions.
IEEE Des. Test 99 (2013)
34. Paral, Z., Devadas, S.: Reliable and efficient PUF-based key generation using pattern matching.
IEEE International Symposium on Hardware-Oriented Security and Trust, pp. 128–133 (2011)
35. Yin, C.-E., Qu, G.: Improving PUF security with regression-based distiller. IEEE/ACM Design
Automation Conference, pp. 1–6 (2013)
36. Nathan Beckmann and Miodrag Potkonjak. Hardware-based public-key cryptography with
public physically unclonable functions. Information Hiding, pp. 206–220 (2009)
37. Rajendran, J., Rose, G.S., Karri, R., Potkonjak, M.: Nano-PPUF: a memristor-based security
primitive. IEEE Computer Society Annual Symposium on VLSI, pp. 84–87 (2012)
38. Ruhrmair, U., Chen, Q., Stutzmann, M., Lugli, P., Schlichtmann, U., Csaba, G.: Towards elec-
trical, integrated implementations of SIMPL systems. Information Security Theory and Prac-
tices. Security and Privacy of Pervasive Systems and Smart Devices, vol. 6033, pp. 277–292
(2010)
39. Rosenfeld, K., Gavas, E., Karri, R.: Sensor physical unclonable functions. IEEE International
Symposium on Hardware-Oriented Security and Trust, pp. 112–117
40. Cao, Y., Zalivaka, S.S., Zhang, L., Chang, C.-H., Chen, S.: CMOS image sensor based phys-
ical unclonable function for smart phone security applications. International Symposium on
Integrated Circuits, pp. 392–395 (2014)
41. Maes, R., Verbauwhede, I.: Physically Unclonable Functions: A Study on the State of the Art
and Future Research Directions, pp. 3–37. Towards Hardware-Intrinsic, Security (2010)
42. Council Decision 96/644/EC of 11 November 1996 on the extension of the legal protection of
topographies of semiconductor products to persons from the Isle of Man (2015). http://eur-lex.
europa.eu/legal-content/EN/TXT/?uri=celex:31996D0644
43. Law on the Circuit Layout of a Semiconductor Integrated Circuits (Act No. 43 of May 31,
1985, as last amended by Act No. 50 of June 2, 2006) (2015)
44. Malbon, J., Lawson, C., Davison, M.: A Commentary. Edward Elgar Publishing, The
WTO Agreement on Trade-Related Aspects of Intellectual Property Rights (2014). ISBN
9781845424435
45. Government Printing Office. The Copyright Law of the United States and Related Laws Con-
tained in Title 17 of the United States Code (2012). ISBN 9780160795084
46. Alkabani, Y., Koushanfar, F., Potkonjak, M.: Remote activation of ICs for piracy prevention
and digital right management. In: Proceedings of IEEE/ACM International Conference on
Computer-Aided Design, pp. 674–677 (2007)
47. Alkabani, Y., Koushanfar, F.: Active Hardware Metering for Intellectual Property Protection
and Security, pp. 291–306. USENIX, Security (2007)
48. Huang, J., Lach, J.: IC activation and user authentication for security-sensitive systems. IEEE
International Workshop on Hardware-Oriented Security and Trust, pp. 76–80 (2008)
8 Physical Unclonable Functions and Intellectual . . . 221
49. Roy, J.A., Koushanfar, F., Markov, I.L.: Protecting bus-based hardware IP by secret sharing.
ACM/IEEE Design Automation Conference, pp. 846–851 (2008)
50. Koushanfar, F., Qu, G.: Hardware metering. IEEE/ACM Design Automation Conference, pp.
490–493 (2001)
51. Lofstrom, K., Daasch, W.R., Taylor, D.: IC identification circuit using device mismatch. IEEE
International Solid-State Circuits Conference, pp. 372–373 (2000)
52. Pentium III serial numbers. http://www.pcmech.com/article/pentium-iii-serialnumbers/
53. Kahng, A.B., Lach, J., Mangione-Smith, W.H., Mantik, S., Markov, I.L., Potkonjak,
M., Tucker, P., Wang, H., Wolfe, G.: Watermarking techniques for intellectual property pro-
tection. IEEE/ACM Design Automation Conference, pp. 776–781 (1998)
54. Kahng, A.B., Mantik, S., Markov, I.L., Potkonjak, M., Tucker, P., Wang, H., Wolfe, G.: Robust
IP watermarking methodologies for physical design. IEEE/ACM Design Automation Confer-
ence, pp. 782–787 (1998)
55. Lach, J., Mangione-Smith, W.H., Potkonjak, M.: FPGA fingerprinting techniques for protecting
intellectual property. IEEE Custom Integrated Circuits Conference, pp. 299–302 (1998)
56. Wolfe, G., Wong, J.L., Potkonjak, M.: Watermarking graph partitioning solutions. IEEE/ACM
Design Automation Conference, pp. 486–489 (2001)
57. Alpert, C.J., Kahng, A.: Recent Directions in Netlist Partitioning. Integration, The VLSI jour-
nal (1995)
58. Dupuis, S., Ba, P.-S., Di Natale, G., Flottes, M.L., Rouzeyre, B.: A novel hardware logic
encryption technique for thwarting illegal overproduction and hardware Trojans. IEEE Inter-
national On-Line Testing Symposium, pp. 49–54 (2014)
59. Rajendran, J., Pino, Y., Sinanoglu, O., Karri, R.: Logic encryption: a fault analysis perspective.
In: Proceedings of the IEEE/ACM Design, Automation and Test in Europe, pp. 953–958 (2012)
60. Rajendran, J., Zhang, H., Zhang, C., Rose, G.S., Pino, Y., Sinanoglu, O., Karri, R.: Fault
analysis-based logic encryption. IEEE Trans. Comput. 64(2), 410–424 (2015)
61. Plaza, S.M., Markov, I.L.: Solving the third-shift problem in ic piracy with test-aware logic
locking. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 34(6), 961–971 (2015)
62. Chakraborty, R.S., Bhunia, S.: Security against hardware Trojan through a novel application
of design obfuscation. IEEE/ACM International Conference on Computer-Aided Design, pp.
113–116 (2009)
63. Colombier, B., Bossuet, L.: Survey of hardware protection of design data for integrated circuits
and intellectual properties. IET Comput. Digital Tech. 8(6), 274–287 (2014)
64. Baumgarten, A., Tyagi, A., Zambreno, J.: Preventing IC piracy using reconfigurable logic bar-
riers. IEEE Des. Test Comput. 27(1), 66–75 (2010)
65. Khaleghi, S., Da Zhao, K., Rao, W.: IC piracy prevention via design withholding and entan-
glement. Asia-Pacific Design Automation Conference, pp. 821–826 (2015)
66. Lee, Y.-W., Touba, N.A.: Improving logic obfuscation via logic cone analysis. IEEE Latin-
American Test Symposium, pp. 1–6 (2015)
67. Contreras, G.K., Rahman, M.T., Tehranipoor, M.: Secure split-test for preventing ic piracy by
uuntrusted foundry and assembly. IEEE International Symposium on Defect and Fault Toler-
ance in VLSI and Nanotechnology Systems, pp. 196–203 (2013)
68. Roy, J.A., Koushanfar, F., Markov, I.L.: Protecting bus-based hardware ip by secret sharing.
In: Proceedings of IEEE/ACM Design Automation Conference, pp. 846–851 (2008)
69. Plaza, S.M., Markov, I.L.: Protecting Integrated Circuits from Piracy with Test-aware Logic
Locking (2014)
70. Rajendran, J., Pino, Y., Sinanoglu, O., Karri, R.: Security analysis of logic obfuscation.
IEEE/ACM Design Automation Conference, pp. 83–89 (2012)
71. Subramanyan, P., Ray, S., Malik, S.: Evaluating the Security of Logic Encryption Algorithms.
IEEE International Symposium on Hardware Oriented Security and Trust, pp. 137–143 (2015)
72. Chakraborty, R.S., Bhunia, S.: Hardware protection and authentication through netlist level
obfuscation. IEEE/ACM International Conference on Computer-Aided Design, pp. 674–677
(2008)
222 R. Karri et al.
73. Chakraborty, R.S., Bhunia, S.: Security against hardware trojan through a novel application
of design obfuscation. IEEE/ACM International Conference on Computer-Aided Design, pp.
113–116 (2009)
74. Chakraborty, R.S., Bhunia, S.: RTL hardware ip protection using key-based control and data
flow obfuscation. IEEE International Conference on VLSI Design, pp. 405–410 (2010)
75. Koushanfar, Farinaz: Provably secure active IC metering techniques for piracy avoidance and
digital rights management. IEEE Trans. Inf. Forensics Secur. 7(1), 51–63 (2012)
76. Jarvis, R.W., McIntyre, M.G.: Split manufacturing method for advanced semiconductor cir-
cuits. US Patent no. 7195931 (2004)
77. FreePDK45:Metal Layers. http://www.eda.ncsu.edu/wiki/FreePDK45:Metal_Layers
78. Jagasivamani, M., Gadfort, P., Sika, M., Bajura, M., Fritze, M.: Split fabrication obfuscation:
metrics and techniques. IEEE Symposium on Hardware Oriented Security and Trust (2014)
79. Hill, B., Karmazin, R., Otero, C.T.O., Tse, J., Manohar, R.: A split-foundry asynchronous
FPGA. IEEE Custom Integrated Circuits Conference, pp. 1–4 (2013)
80. Valamehr, J., Sherwood, T., Kastner, R., Marangoni-Simonsen, D., Huffmire, T., Irvine, C.,
Levin, T.: A 3-D split manufacturing approach to trustworthy system development. IEEE Trans.
Comput.-Aided Des. Integr. Circuits Syst. 32(4), 611–615 (2013)
81. Vaidyanathan, K., Liu, R., Sumbul, E., Zhu, Q., Franchetti, F., Pileggi, L.: Efficient and secure
intellectual property (IP) design for split fabrication. IEEE Symposium on Hardware Oriented
Security and Trust (2014)
82. Naveed, A.: Sherwani. Springer Publications, Algorithms for VLSI Physical Design Automa-
tion (2002)
83. Rajendran, O., Sinanoglu, J., Karri, R.: Is split manufacturing secure? IEEE Design, Automa-
tion and Test in Europe Conference, pp. 1259–1264 (2013)
84. Vaidyanathan, K., Das, B.P., Sumbul, E., Liu, R., Pileggi, L.: Building trusted ICs using split
fabrication. IEEE Symposium on Hardware Oriented Security and Trust (2014)
85. Imeson, F., Emtenan, A., Garg, S., Tripunitara, M.: Securing Computer Hardware Using 3D
Integrated Circuit (IC) Technology and Split Manufacturing for Obfuscation. USENIX Secu-
rity (2013)
86. Altera. Altera Reveals Stratix 10 Innovations Enabling the Industrys Fastest and Highest
Capacity FPGAs and SoCs. http://newsroom.altera.com/press-releases/nr-altera-stratix10.htm
87. Verayo, P.: Physical unclonable function. http://www.verayo.com/tech.php
88. Intrinsic ID. Physical unclonable function. https://www.intrinsic-id.com/technology/
physically-unclonable-functions-puf/
Chapter 9
A Systematic Approach to Fault Attack
Resistant Design
Electronic systems are subject to temporary and permanent faults caused by imper-
fections in the manufacturing process as well as by anomalies of the environment.
Fault effects in electronics have been intensively studied in the context of system
reliability as well as error resiliency. However, faults can also be used as a hacking
tool. In a fault attack, an adversary injects an intentional fault in a circuit and ana-
lyzes the response of that circuit to the fault. The objective of a fault attack is to
extract cryptographic key material, to weaken cryptographic strength, or to disable
the security. Unlike some other attacks, such as power-based or electromagnetic-
based side-channel analysis, fault attacks do not require complex signal measure-
ment equipment. The threat model of a fault attack assumes an adversary who can
influence the physical environment of the electronic system—a condition that holds
for a large class of embedded electronics such as smart cards, key fobs, access con-
trols, embedded controllers, and so on. Fault attacks have been studied since the turn
of the century, and today a great variety of methods are available to attack all forms
of cryptography [3, 4, 16, 17].
A generic solution against faults is to use redundancy, such as by replicating the
hardware implementation, by repeating computations, or by applying data error-
coding techniques. The idea of redundancy is to tolerate sporadic faults by ensur-
ing that at least part of the circuit obtains a correct result. The advantage of fault
tolerant design is that it can handle (within some limits) any fault regardless of the
fault location and fault timing in the circuit. However, fault tolerant design using
redundancy is expensive. Spatial redundancy multiplies the hardware cost, and time
redundancy reduces the performance, each by a factor of several times. Full fault
tolerance is therefore only available to systems that can afford over-design. Despite
the costly overhead, most of these fault tolerant solutions are still not applicable to
the fault attack problem because in these designs, the fault is assumed to be random
and sporadic.
In fault attacks, faults are injected by an adversary rather than by nature. The
adversary is intelligent and determined, rather than random and indifferent. The
adversary also makes specific assumptions about the objectives of the fault attack,
and about the algorithm being cryptanalyzed. Indeed, because of the widespread
adoption of cryptographic standards, these assumptions are quite reasonable. This
means that the objective of a fault attack is quite specific: the objective is to extract
a secret key. In this chapter, we assume that rendering a circuit inoperable is not a
valid objective for a fault attack. Instead, such an attack belongs to a class of attacks
known as denial-of-service. We will concentrate instead on fault attacks that extract
a cryptographic key.
Another common assumption of the adversary is that many design details of the
cryptographic implementation are known. Indeed, by using basic reverse engineer-
ing techniques, the adversary may learn the execution schedule of a cryptographic
algorithm as it operates clock cycle by clock cycle, or the meaning of memory loca-
tions and registers used by the digital circuit. Knowledge of such design details is
often helpful for a fault attack, and therefore the worst-case assumption is to assume
that the adversary is fully knowledgeable about the implementation details of a cryp-
tographic design.
The question we wish to address in this chapter is the following: How to systemat-
ically build a fault attack resistant design? To answer this question, we first provide
an analysis of a successful fault attack requirements. The fault attack analysis lead
to two main contributions of this chapter. First, it differentiates the intentional fault
injection from random sporadic faults. Second, it provides an insight for designing a
fault attack resistant design. Although a designer cannot prevent a fault attack from
happening, a designer can control the effects of injected faults. By suitable design
techniques, it is therefore possible to create circuits that are harder to attack using
common fault attack techniques.
The chapter is organized as follows. In Sect. 9.2, we review common fault injec-
tion techniques, and their effects on digital circuits. In Sect. 9.3, we will describe the
four essential steps that an adversary has to take, in order to complete a fault attack.
In Sect. 9.4, we review several common fault analysis methods, and their require-
ments with respect to fault injection. In Sect. 9.5, we combine the insights from fault
injection (Sect. 9.3) with those from fault analysis (Sect. 9.4) to define fault attack
resistant design techniques. Finally, Sect. 9.6 concludes the chapter.
9 A Systematic Approach to Fault Attack Resistant Design 225
The fault characteristics resulting from a fault injection are commonly captured in a
fault model, which is also the starting point of various cryptanalytic methods. A fault
model expresses the important fault characteristics: the location of the fault within
the circuit, the number of bits affected by the fault, and the fault effect on the bits
(stuck-at, bit-flip, random, set/reset).
Table 9.1 lists four common fault effects: chosen bit fault, single bit fault, byte
fault, and random fault. For example, in chosen bit fault model, the attacker must
precisely select the location of the faulty bit and change its value to either 0/1.
As mentioned, the objective of the attacker is to build the fault model for a successful
post-processing of the information. There are several fault injection tools and tech-
niques for building the fault model. The following are six possible mechanisms of
fault injection.
∙ Clock Glitches are used to shorten the clock period of a digital circuit during
selected clock cycles [2, 36]. If the instantaneous clock period decreases below
the critical path of the circuit, then a faulty value will be captured in the memory or
state of the circuit. An adversary can inject a clock glitch by controlling the clock
line of the digital circuit, triggering a fault in the critical path of the circuit. If the
adversary knows the circuit structure, he or she will be able to predict location of
the circuit faults. Glitch injection is one of the least complicated methods of fault
injection, and therefore it can be considered as a broad threat to secure circuits.
∙ Voltage Starving can be used to artificially lengthen the critical path of a circuit,
to a point where it extends beyond the clock period [5]. This method is similar to
injection of clock glitches, but it does not offer the same precise control of fault
timing.
∙ Voltage Spikes cause an immediate change in the logic threshold levels of the
circuit [3]. This changes the logic value held on a bus. Voltage spikes can be used,
for example, to mask an instruction read from memory while it is moving over the
bus. Similar to clock glitches, voltage spikes have a global affect and affect the
entire circuit.
∙ Electromagnetic Pulses cause Eddy currents in a chip, leading to erroneous
switching and isolated bit faults [30]. Using special probes, EM pulses can be
targeted at specific locations of the chip.
∙ Laser and Light Pulses cause transistors on a chip to switch with photoelectric
effects [39]. Through focusing of the light, a very small area of the circuit can be
targeted, enabling precise control over the location of the fault injection.
∙ Hardware Trojans can be a source of faults as well. This method requires that
the adversary has access to the circuit design flow, and that the design is directly
modified with suitable trigger/fault circuitry. For example, recent research reports
on an FPGA with a backdoor circuit which disables the readback protection of the
design [33].
The fault injection mechanism determines the timing of the fault, the duration of
the fault (transient or permanent), and the fault intensity (weak/strong).Together,
these characteristics enable the adversary to select a specific fault model, which
is needed as the starting point of cryptanalysis by fault injection. The method of
fault injection also influences the difficulty of performing it. Depending on the level
of tampering required with the actual circuit, one distinguishes noninvasive, semi-
invasive [34], and invasive attacks. Table 9.2 illustrates the relation between the
aforementioned six possible fault injection mechanisms, along with the fault models
resulting from their use.
9 A Systematic Approach to Fault Attack Resistant Design 227
A fault model and a fault injection mechanism to trigger the fault model are two
essential ingredients of a fault attack. But to apply them in a successful fault attack,
we need to consider a larger scope. Figure 9.1 shows that a successful fault attack
consists of two steps, fault measurement and fault analysis. The fault injection is
part of the fault measurement phase, while the fault model is a building block in
fault analysis.
Indeed, Fig. 9.1 shows both the requirements of a successful fault attack and the
principles of the fault-attack resistant design. From the adversary’s point of view,
each step of the pyramid should be followed in order. An adversary first needs to
choose a fault model and fault analysis technique based on the target cryptosystem,
Then, he needs to obtain exploitable faults by following the steps of fault measure-
ment, from fault injection access to fault observation.
From the designer’s side, the steps of this pyramid should be considered while
designing a fault attack resistant device. For each step, the designer should evaluate
the costs and benefits of securing the design against this step. Using the evaluation
results, the designer is able to make design decisions to prevent the adversary from
building the required fault model. Next, we explain the steps of fault measurement
and demonstrate them using a case study.
The reality of fault measurement is more complicated than injecting a fault. First of
all, the adversary needs to be able to physically inject a fault. The fault injection also
needs to have the desired effect, and result in an exploitable fault, that results in the
required fault model. Finally, the exploitable fault needs to be observable. Following
is a more comprehensive definition of these four levels.
228 N.F. Galathy et al.
∙ Fault Injection Access: The first and foremost step of the fault measurement is
getting physical access to the device under test (DUT). For example, an adversary
needs to control external clock and supply voltage ports of DUT for clock and volt-
age glitching attacks, respectively [2]. Similarly, the adversary must have physical
access to chip surface for laser and electromagnetic pulse-based fault attacks [35].
In addition, an adversary may also need to control data inputs and outputs of DUT.
The amount of physical access needed for each attack is different.
∙ Actual Fault Injection: The second step of the fault measurement is disturbing
the operation of DUT by applying a physical stress on it. The applied physical
stress pushes DUT out of its normal operating conditions and causes faulty oper-
ation. Based on the chosen fault injection method, the adversary can control the
timing, location, and intensity of the applied physical stress. Each value of these
three parameters affects DUT differently and causes different faults in DUT oper-
ation. Therefore, the adversary needs to carefully set these parameters to create
an exploitable fault in DUT operation. In clock glitching, for example, the adver-
sary causes setup time violation by temporarily applying shorter clock cycles. The
adversary can control the timing and length (i.e., intensity) of the applied shorter
clock cycle. However, there is no control on the location of the applied physical
stress (i.e., a shorter clock cycle) in this case because clock is a global signal for
DUT.
∙ Fault Effect: The third step is creating a fault effect on DUT operation as a conse-
quence of the applied physical stress. The fault effect can be defined as the logical
(or digital) effect of the applied physical stress on DUT operation. For example, an
applied clock glitch might create 2-bit faults at the fault injection point. Similarly,
a laser pulse might affect 1-bit of DUT. On the other hand, it is also possible to not
9 A Systematic Approach to Fault Attack Resistant Design 229
create any fault effect even though a physical stress is applied on DUT. The adver-
sary has a limited and indirect control on the fault effect through controlling the
physical stress. The fault effect depends on various factors such as circuit imple-
mentation, used fault injection method, applied physical stress, etc. The adversary
may need to apply several physical stresses with different parameters to create the
desired fault effect on DUT operation [13, 22].
∙ Fault Observation: The final step toward the fault measurement is to observe the
effects of the fault injection in the output of the block cipher or the algorithm under
the attack. The authors of [41], show that there are methods that can compute
the probability of success of a fault injection attempt in this phase using observ-
ability analysis. This method basically computes the probability of observability
from the point of fault injection to the output of the cipher. In this work, we use
a probability-based observability analysis method for the probability of propagat-
ing an exploitable fault to the output. Observability analysis, which is widely used
in VLSI test area, reflects the difficulty of propagating the value of a signal to
primary outputs [38].
This equation specifies the setup time constraints of a circuit. The setup time
constraint of the longest (i.e., critical) path determines the minimum clock period
for the circuit. Applying a shorter clock period than this value will fail the setup
time constraints.
In this case, we inject faults into the operation of a circuit by violating its setup
time constraints. Setup time violation is a widely used low-cost fault injection mech-
230 N.F. Galathy et al.
anism [1]. In the following paragraphs, we explain fault measurement process for
fault injection using setup time violation.
In this section, we will explain the pyramid in Fig. 9.1 from the adversary’s point of
view. Any fault attack consists of two phases: a fault measurement phase and a fault
analysis phase.
Based on the adversary’s access to the target device and the information required
to attack a specific block cipher, the adversary aims for a fault model. Therefore,
choosing the fault model is a part of the fault analysis process. Then, in the mea-
surement process, the adversary goes through the four steps mentioned in Sect. 9.3
to build the fault model using actual measurements. In this section, we explain the
fault attack process with three different fault models. All of the example attacks are
on the advanced encryption standard (AES) algorithm. The details of this algorithm
are explained in the following section.
The AES algorithm consists of 10 rounds. The first 9 rounds have 4 main operations,
SBOX, ShiftRows(SR), MixColumns(MC), and AddRoundKey(ADK). Round 10
omits the MixColumn operation. Figure 9.3 shows the structure of the AES algo-
rithm. In this figure, P is the applied plaintext to the AES algorithm, S10 is the inter-
mediate state variable for round 10. K10 is the key for round 10 and C represents the
ciphertext. The faulty value of the variable x is shown by x′ .
DFA is one of the most studied types of attack on cryptographic systems such as
RSA [9], DES [6], and AES. DFA assumes that the attacker is in possession of the
device and is able to obtain two faulty and fault free ciphertexts for the same plaintext.
DFA also assumes that the attacker is aware of some characteristics of the injected
fault. There are many proposed types of DFA attack on the AES algorithm [7, 27,
29]. These attacks are based on different fault models and choose various methods
of injection techniques based on the fault model. In this section, we will explain the
steps of a simple electromagnetic pulse-based DFA attack on AES, which is proposed
by Dehbaoui et al. [11].
Fault Model: This attack adopts Piret’s fault model [28]. This fault model requires
an adversary to induce one byte fault in AES state between the start of round 9 and
the MixColumns operation.
Fault Measurement: The DUT for the attack is a RISC microcontroller running
AES algorithm. In this attack, the adversary injects faults by means of transient elec-
tromagnetic (EM) pulses. A small magnetic foil is used apply EM pulses without any
physical contact to DUT. The adversary can control the timing, energy, and position
of the applied EM pulses. Using different combinations of these three parameters,
the adversary can affect only one byte of the computation and can select the affected
byte. As a result, an adversary can induce the exploitable faults using this setup.
Fault Analysis: Due to the MixColumns in round 9, one byte fault in the beginning
of round 9 will cause 4 faulty bytes in the ciphertext. Therefore, we can find 4 bytes
of the key of round 9. Assuming that the attacker only injects fault into one byte,
the C and C’ differ in four bytes. There are 255 × 4 possible values for these four
bytes which is saved in a list D. For each key guess, the adversary should compute
the value of these four bytes using inverse equations of AES operations. The key is
potentially a correct candidate if the computed value is in the list D. The adversary
should continue injecting fault in the same location until only 1 key remains as the
candidate.
Fault sensitivity analysis (FSA) attack is proposed by Li et al., in CHES 2010 [22].
This attack is based on the fact that fault behavior is biased and can have a data
dependency on a secret value in a circuit.
Fault Model: If an adversary gradually increases the fault intensity, a circuit can
reach a point at which the output of the circuit becomes faulty. This threshold point is
called fault sensitivity. For setup time violation, the critical path delay determines the
fault sensitivity point. As the path delay distribution of a circuit is data-dependent,
the fault sensitivity of the circuit is also data-dependent. Therefore, the FSA fault
model is defined as a dependency between the input values of the SBOX to their
fault sensitivity. Based on the experiments shown in [22], the input values with larger
Hamming weight, have longer critical path delays as well. In this attack, the target
of the fault injection is round 10 of AES.
9 A Systematic Approach to Fault Attack Resistant Design 233
Fault Measurement: Based on the fault model, the authors of [22] choose the setup
time violation or clock glitch for their measurements. The assumption is that the
attacker is in physical possession of the device, therefore, he has access to the exter-
nal clock signal. The fault injection should be in the output of round 10 in the AES
algorithm for each byte of the ciphertext.
To inject the actual fault, the attacker increases the intensity of the fault injection
in the last round, gradually by increasing the frequency of the applied external clock.
As explained in Sect. 9.3, increasing the clock frequency, the timing paths of the
circuit will eventually be violated. Therefore, there is a moment that the output of
round 10 of the block cipher is not correct anymore. To build the fault model, the
adversary should apply several input values to the block cipher and record the clock
frequency corresponding to fault sensitivity for each input data.
Fault Analysis: For FSA, the effect of fault bias is the data dependency of fault
sensitivity. The adversary first inverts the ciphertext to round 10 input (S10 ) using
a key guess. Then, he estimates the effect of fault bias as the Hamming Weight of
round 10 input HW(S10 ). In this step, the attacker uses the Pearson correlation coef-
ficient to find the key guess for which the fault sensitivity is strongly correlated to
HammingWeight(S10 ) for all inputs.
Differential Fault Intensity Analysis (DFIA) is proposed by Ghalaty et al. [13]. This
attack is also based on the concept of biased fault behavior, but has a different per-
ception of the biased fault from the FSA attack.
Fault Model: Fault intensity is the strength by which a circuit is pushed outside of its
nominal operating conditions with the intent of inducing a fault. For example, when
faults are introduced using clock glitches, then the fault intensity corresponds to the
length of clock cycle that is obtained as a result of the glitches. The target of fault
injection for DFIA is the output of round 9 of AES. The fundamental assumption
of DFIA on fault model relies on the fact that a small change in fault intensity will
result in a small change in fault behavior.
Fault Measurement: The target of fault injection for this attack is the output of
round 9 in AES algorithm. The adversary can determine the timing of fault injection
by power analysis methods such as the methods mentioned in [19] to find the start
and end of each round of AES. The fault injection process is similar to FSA attack.
In this attack, first, the attacker applies an input to the AES algorithm. Then, he
gradually increase the fault intensity by increasing the clock frequency at the output
of round 9. Due to the biased effect of the fault injection and non-uniform distribution
of timing paths in the circuit, the number of faulty bits increases by increasing the
fault intensity.
234 N.F. Galathy et al.
Fault Analysis: To estimate the small change, the adversary computes the input of
′ ′′
round 10 (S10 , S10 ,...), by inverting the faulty ciphertexts and key guess for several
fault intensity levels. Then, he computes the distance between the hypothesized inter-
mediate variables by using the Hamming Distance function.
The fault bias assumption for DFIA enables the use of a distinguisher that looks
for the smallest change. Unlike the previous techniques, DFIA can combine fault
behaviors collected at multiple fault intensities. Hence, the complete fault bias char-
acteristic of a circuit can be exploited. Based on the assumption of fault attack, the
error values are close to each other for the correct key guess. For wrong key guesses,
the distance between injected error values will be random due to the non-uniform
behavior of the SBOX module. Therefore, the distinguisher function simply chooses
the key that shows the minimal distance between intermediate variables.
the effects of the injected fault at the output of the system. An interruption in any of
these steps would lead to an unsuccessful fault measurement and therefore a failure
in the fault attack process. From the designer’s point of view, to build a fault attack
resistant design, he should be able to thwart any of the steps of the pyramid and
prevent the attacker from progressing to the next step.
Accordingly, the designers of fault attack resistant hardware systems should aim
at thwarting the steps of fault measurement. Therefore, for each step of the pyramid
given in Fig. 9.4, the options for securing the design against this step should be eval-
uated at the design-time. Using the formulation in Fig. 9.4, the designer can address
each step independently. Depending on the security requirements and design con-
straints, it is also possible to combine different countermeasures that are designed
for different steps of the fault measurement. Each countermeasure brings an overhead
on the design while increasing the level of security. Thus, there is always a tradeoff
between the cost and security of a design. The designers can use the pyramid given in
Fig. 9.4 for a better and more systematic security evaluation of the countermeasures
while comparing cost-security tradeoff of different countermeasure options.
Next, we provide a survey of existing countermeasures against the fault attacks.
The countermeasures thwarting this step aim at preventing an adversary to gain phys-
ical access to the device so that the adversary cannot apply any physical stress on
DUT.
Shielding
Filtering
The main principle of filtering is to reduce or filter out the effects of the external
physical stress by placing on-chip components between some external pins (e.g.,
power supply pin) and the internal circuitry of DUT. For example, some devices
have built-in voltage regulators, which first conditions the external supply voltage
and then applies the conditioned supply voltage to the internal circuitry [40]. The
voltage regulators filter out some noise and glitches at the external supply voltage.
However, the filtering capabilities of a voltage regulator depends on its design and the
load capacitance. Therefore, some glitches are able to pass the regulator and affect
the internal circuitry. The problems of this countermeasure are its cost and physical
capabilities. The voltage regulator can only filter glitches with specific parameters.
Therefore, an adversary might create exploitable faults by applying a physical stress
that is outside of filtering capabilities of the regulator.
The purpose of the countermeasures at this step is detecting the dangerous physical
changes in DUT’s environment. After detecting a dangerous change, they produce
an alarm signal to indicate possible fault in the DUT operation.
Randomization
Fig. 9.5 a Operation of the Clock Monitor in the case of a Glitch-Free External Clock. b Operation
of the Clock Monitor in the case of a Glitchy External Clock
Detectors
The fault injection can also be thwarted using detectors that detect anomalies in the
physical environment of DUT. These detectors can sense the changes in the voltage,
light, temperature, and clock frequency. After the detection of an anomaly, an alarm
signal is raised and the required security action is taken by the circuit.
For example, Luo et al. proposed a clock monitor that detects if there is an anom-
aly in the clock signal of a circuit and raises an alarm [24]. The proposed clock
monitor relies on the fact that a clock glitch creates irregularity in the clock signal.
To detect such irregularity, they sample the external design clock (clkd ) with a faster
internal sampling clock (clks ) as illustrated in Fig. 9.5. For each cycle i of the exter-
nal clock clkd , they measure the length of high phase (nH i
) and low phase (nLi ) using
counters. Then, they compare the measured parameters of two consecutive clock
cycles i and i + 1. If the parameters do not match, an alarm signal is raised. If there
is no glitch in the external clock, the following equations are satisfied as it is shown
in Fig. 9.5(a):
nL0 = nL1
nH H
0 = n1
238 N.F. Galathy et al.
If a glitch is injected in the external clock signal, the parameters of two consecutive
cycles do not match as it is shown in Fig. 9.5(b). In this case, the following equations
is obtained, and thus, an alarm signal is generated:
0 ≠ n1 ,
nH 1 ≠ n2
H
nH H
The main limitation of the detectors is their physical capabilities. They are gen-
erally designed to detect physical stresses with specific parameters. If an adversary
applies a physical stress outside of the specified parameters, an exploitable fault may
occur. In addition, a detector designed against a specific fault injection means might
be vulnerable to another fault injection means or to a combination of multiple fault
injection means.
In this step, the countermeasures are designed to catch the fault effects caused by the
physical stress applied in the fault injection step. The countermeasures monitor the
values of the signals and generate an alarm signal in case of a faulty signal value.
The main principle of the concurrent error detection (CED) is detecting the faults in
parallel with the normal operation of DUT. Most of the proposed CED techniques
follows the general scheme shown in Fig. 9.6 [26]. In this scheme, a design con-
sists of three blocks: operation, prediction, and checker. The operation block takes
inputs and produces outputs based on the DUT specification. The prediction block
takes the same inputs as the operation block and predict some special characteristics
of system outputs based on these inputs. The prediction block can be designed by
Fig. 9.6 A conceptual diagram for concurrent error detection (CED) countermeasure
9 A Systematic Approach to Fault Attack Resistant Design 239
Canary Logic
The fault effect caused by setup time violation [32] can be detected using Canary
logic [31], which predicts timing errors through circuit-level timing speculation.
In Canary logic, each flip-flop (FF) in the design is converted into a timing error-
predicting Canary FF by adding a delay element, a shadow FF, and an XOR gate
(Fig. 9.7). The input data of the shadow FF is the delayed version of the main FF.
The timing errors are predicted by comparing the outputs main and shadow FFs via
the XOR gate. The output of the XOR gate is used as an alarm signal indicating a
timing error is about to occur. Because of the delay element, the shadow FF encoun-
240 N.F. Galathy et al.
ters a timing error before the main FF. Therefore, the shadow FF protects the main
FF against timing errors.
The main problem is this countermeasure is its area overhead. The area of a canary
FF is at least two times larger than of a regular FF because of the additional logic.
Therefore, replacing each regular FF with a canary FF brings a high area overhead.
A weakness of the canary logic is that it cannot raise the alarm signal if a timing
error occurs in both the main and shadow FFs [21]. Considering fault attacks are
based on intentional fault injection than random faults, this case is likely to happen.
The main purpose of the countermeasures at this step is making a faulty output
independent of the processed secret data. These countermeasures let an adversary
to inject faults and let the injected fault to propagate the output of DUT. However,
they guarantee that the adversary cannot exploit the faulty output in the fault analysis.
Delay Balancing
Delay balancing can be used to thwart FSA attacks that use setup time violation as the
fault injection means. In these attacks, characterizing the data dependency of DUT’s
fault sensitivity (i.e., dynamic critical path) is the main issue for the adversary. There-
fore, eliminating the factors that affect the data dependency of fault sensitivity is an
effective countermeasure for such attacks. Ghalaty et al. [12] proposed a systematic
delay balancing countermeasure to remove the dependency of the critical timing
delay to the processed data values in the circuits. They propose a transformation that
operates at two levels of abstraction, at netlist level and at gate level.
∙ Netlist level: The delay of the netlist must be independent of the input data.
∙ Gate level: The switching time of gates must be random during circuit evaluation,
meaning that the switching distribution is uniform over the computation time of
the circuit.
9 A Systematic Approach to Fault Attack Resistant Design 241
Dual-rail with precharge logic (DPL) is a countermeasure that was originally pro-
posed against side-channel attacks [10]. However, it is also inherently resistant
against some fault attacks. The main principle of DPL is to make the power con-
sumption independent of the processed data by consuming a constant amount of
power at each cycle.
In DPL, every signal 𝛼 is represented by two complementary wires (𝛼f , 𝛼t ). Every
computation has two phases, namely, precharge and evaluation. In the precharge
phase, all wires are initialized to the same value. Depending on the implementa-
tion, this initialization value is (0, 0) or (1, 1), called NULL0 and NULL1. These
two values are NULL tokens that do not contain any meaningful information. In the
evaluation phase, the actual computation takes place and NULL tokens alternate to
VALID tokens: (1, 0) or (0, 1), called VALID0 and VALID1. These two values are
VALID tokens that contain the value of the signal 𝛼. During evaluation, exactly one
of the complementary wires is toggled.
The inherent fault attack resistance of DPL is based on the fact that a fault turns a
VALID token into a NULL token [14]. The output value of a gate will be NULL if any
input of it is a NULL token. Considering high diffusion capabilities of cryptographic
algorithms, a NULL token will diffuses the very quickly while it is propagated to the
outputs. As a result, the faulty output does not carry any information about secret
data. As it is seen, faults are not detected in DPL. Instead, faulty values are allowed
to propagate to the outputs, knowing that they are not exploitable by an adversary.
Converting a single-rail logic into a dual-rail logic will bring both area and time
overhead.
Infective Computation
The main principle of the infective computation is to make the faulty output look
random (i.e., non-exploitable) [23]. This is achieved by propagating the fault effects
242 N.F. Galathy et al.
to the whole computation with a diffusion scheme. The diffusion scheme has no
effect if the fault injection is not successful. Infective computation techniques do not
require checking procedures, and thus they do not alter the computation flow.
In CHES 2014, Tupsamudre et al. proposed an infective countermeasure for AES
that utilize redundant and dummy rounds [37]. The conceptual diagram of their coun-
termeasure is shown in Fig. 9.8. In the cipher and redundant rounds, the round func-
tion of AES (fAES ) is applied on the plaintext. Thus, each AES round is executed
twice in this countermeasure. There are also dummy rounds that are randomly exe-
cuted throughout the execution of the algorithm. In a dummy round, the AES round
function is applied on a random data 𝛽 and a dummy secret key kd . The output of
the dummy round is the random data 𝛽. After each computation of an AES round, a
selection logic decides the output value. The selection logic computes if there is an
error in any of the dummy, redundant, and cipher rounds by applying simple XOR
and OR operations. Then, the output signal of the selection logic (i.e., select signal)
selects the output value. In other words, the select signal activates/deactivates a dif-
fusion mechanism. If the select signal is 0, the result of the cipher round is assigned
to the output (Fig. 9.8). Otherwise, the result of dummy round, which is random and
independent of the secret key, is assigned to output (Fig. 9.8). Therefore, the faulty
output cannot be exploited by the adversary.
9.6 Conclusion
resistant deign must prevent secure data leakage. The proposed hierarchy, classifies
the requirements of fault measurement and fault analysis for attacking a device into
four steps. The interruption of any of these steps prevents the attacker from launch-
ing a successful fault attack. Therefore, the pyramid of fault measurement (Fig. 9.4)
can be used by the designer to build a fault attack resistant design considering the
costs and security coverage of countermeasures in each step. The designer can find
the optimized combination of countermeasures to prevent fault measurement for an
attacker. While the pyramid provides a road map for designers to apply fault attack
countermeasures, it does not provide information on cost and security efficiency of
each countermeasure. This is considered as an open research problem.
Acknowledgements This research was supported through the National Science Foundation Grant
1441710, and through the Semiconductor Research Corporation.
References
1. Agoyan, M., Dutertre, J.M., Naccache, D., Robisson, B., Tria, A.: When Clocks Fail: On Criti-
cal Paths and Clock Faults. In: Smart Card Research and Advanced Application, pp. 182–193.
Springer (2010)
2. Balasch, J., Gierlichs, B., Verbauwhede, I.: An in-depth and black-box characterization of the
effects of Clock Glitches on 8-bit MCUs. In: 2011 Workshop on Fault Diagnosis and Tolerance
in Cryptography (FDTC), pp. 105–114 (2011)
3. Bar-El, H., Choukri, H., Naccache, D., Tunstall, M., Whelan, C.: The Sorcerer’s Apprentice
guide to fault attacks. Proc. IEEE 94(2), 370–382 (2006). Feb
4. Barenghi, A., Breveglieri, L., Koren, I., Naccache, D.: Fault injection attacks on cryptographic
devices: theory, practice, and countermeasures. Proc. IEEE 100(11), 3056–3076 (2012). Nov
5. Barenghi, A., Bertoni, G.M., Breveglieri, L., Pelliccioli, M., Pelosi, G.: Injection technolo-
gies for fault attacks on microprocessors. In: Joye, M., Tunstall, M. (eds.) Fault Analysis in
Cryptography. Information Security and Cryptography, pp. 275–293. Springer, Berlin (2012)
6. Biham, E., Shamir, A.: Differential fault analysis of secret key cryptosystems. In: Advances in
CryptologyCRYPTO’97, pp. 513–525. Springer (1997)
7. Blömer, J., Seifert, J.P.: Fault based cryptanalysis of the advanced encryption standard (AES).
In: Financial Cryptography, pp. 162–181. Springer (2003)
8. Bo, Y., Xiangyu, L., Cong, C., Yihe, S., Liji, W., Xiangmin, Z.: An AES chip with DPA resis-
tance using hardware-based random order execution. J. Semicond. 33(6), 065009 (2012)
9. Boneh, D., DeMillo, R.A., Lipton, R.J.: On the importance of eliminating errors in crypto-
graphic computations. J. Cryptol. 14(2), 101–119 (2001)
10. Danger, J.L., Guilley, S., Bhasin, S., Nassar, M.: Overview of dual rail with Precharge logic
styles to thwart implementation-level attacks on hardware cryptoprocessors. In: 2009 3rd Inter-
national Conference on Signals, Circuits and Systems (SCS), pp. 1–8. IEEE (2009)
11. Dehbaoui, A., Dutertre, J.M., Robisson, B., Orsatelli, P., Maurine, P., Tria, A.: Injection of
transient faults using electromagnetic pulses-practical results on a cryptographic system. IACR
Cryptol. ePrint Arch. 2012, 123 (2012)
12. Ghalaty, N.F., Aysu, A., Schaumont, P.: Analyzing and eliminating the causes of fault sensi-
tivity analysis. In: Proceedings of the Conference on Design, Automation & Test in Europe. p.
204. European Design and Automation Association (2014)
13. Ghalaty, N.F., Yuce, B., Taha, M., Schaumont, P.: Differential Fault Intensity Analysis. In:
2014 Workshop on Fault Diagnosis and Tolerance in Cryptography (FDTC), pp. 49–58. IEEE
(2014)
244 N.F. Galathy et al.
14. Guilley, S., Sauvage, L., Danger, J.L., Selmane, N.: Fault injection resilience. In: 2010 Work-
shop on Fault Diagnosis and Tolerance in Cryptography (FDTC), pp. 51–65. IEEE (2010)
15. Guo, X., Mukhopadhyay, D., Karri, R.: Provably secure concurrent error detection against
differential fault analysis. IACR Cryptol. ePrint Arch. 2012, 552 (2012)
16. Joye, M., Tunstall, M. (eds.): Fault Analysis in Cryptography. Information Security and Cryp-
tography. Springer, Berlin (2012)
17. Karaklajic, D., Fan, J., Verbauwhede, I.: A systematic M safe-error Detection in hardware
implementations of cryptographic algorithms. In: 2012 IEEE International Symposium on
Hardware-Oriented Security and Trust (HOST), pp. 96–101 (2012)
18. Karri, R., Wu, K., Mishra, P., Kim, Y.: Concurrent error detection schemes for fault-based side-
channel cryptanalysis of symmetric block ciphers. IEEE Trans. Comput.-Aided Des. Integr.
Circuits Syst 21(12), 1509–1517 (2002)
19. Kocher, P., Jaffe, J., Jun, B., Rohatgi, P.: J. Cryptogr. Eng. 1(1), 5–27 (2011)
20. Kömmerling, O., Kuhn, M.G.: Design principles for tamper-resistant Smartcard processors.
In: USENIX Workshop on Smartcard Technology, vol. 12, pp. 9–20 (1999)
21. Kunitake, Y., Sato, T., Yasuura, H., Hayashida, T.: Possibilities to miss predicting timing errors
in canary flip-flops. In: 2011 IEEE 54th International Midwest Symposium on Circuits and
Systems (MWSCAS), pp. 1–4. IEEE (2011)
22. Li, Y., Sakiyama, K., Gomisawa, S., Fukunaga, T., Takahashi, J., Ohta, K.: Fault sensitiv-
ity analysis. In: Cryptographic Hardware and Embedded Systems, CHES 2010, pp. 320–334.
Springer (2010)
23. Lomné, V., Roche, T., Thillard, A.: On the need of randomness in fault attack countermeasures-
application to AES. In: 2012 Workshop on Fault Diagnosis and Tolerance in Cryptography
(FDTC), pp. 85–94. IEEE (2012)
24. Luo, P., Fei, Y.: Faulty clock detection for crypto circuits against differential fault analysis
attack. Cryptol. ePrint Arch. Report 2014/883. http://eprint.iacr.org/ (2014)
25. Markantonakis, K., Mayes, K.: Secure Smart Embedded Devices. Platforms and Applications.
Springer, Berlin (2013)
26. Mitra, S., McCluskey, E.J.: Which concurrent error detection scheme to choose? In: Test Con-
ference, 2000. Proceedings. International, pp. 985–994. IEEE (2000)
27. Moradi, A., Shalmani, M.T.M., Salmasizadeh, M.: A generalized method of differential fault
attack against AES cryptosystem. In: Cryptographic Hardware and Embedded Systems-CHES
2006, pp. 91–100. Springer (2006)
28. Piret, G., Quisquater, J.J.: A differential fault attack technique against SPN structures, with
application to the AES and KHAZAD. In: Cryptographic Hardware and Embedded Systems-
CHES 2003, pp. 77–88. Springer (2003)
29. Quisquater, J.J., Samyde, D.: Electromagnetic analysis (EMA): measures and counter-
measures for Smart Cards. In: Smart Card Programming and Security, pp. 200–210. Springer
(2001)
30. Quisquater, J., Samyde, D.: Eddy current for magnetic analysis with active sensor. In: Esmart
(2002)
31. Sato, T., Kunitake, Y.: A simple flip-flop circuit for typical-case designs for DFM. In: 8th
International Symposium on Quality Electronic Design, 2007. ISQED’07, pp. 539–544. IEEE
(2007)
32. Selmane, N., Guilley, S., Danger, J.L.: Practical setup time violation attacks on AES. In:
Seventh European Dependable Computing Conference, 2008. EDCC 2008, pp. 91–96. IEEE
(2008)
33. Skorobogatov, S., Woods, C.: Breakthrough silicon scanning discovers backdoor in military
chip. In: CHES, pp. 23–40 (2012)
34. Skorobogatov, S.P.: Semi-invasive attacks—A new approach to hardware security analysis.
Technical report. UCAM-CL-TR-630, University of Cambridge, Computer Laboratory (2005)
35. Skorobogatov, S.P., Anderson, R.J.: Optical fault induction attacks. In: Cryptographic Hard-
ware and Embedded Systems-CHES 2002, pp. 2–12. Springer (2003)
9 A Systematic Approach to Fault Attack Resistant Design 245
36. Takahashi, J., Fukunaga, T., Gomisawa, S., Li, Y., Sakiyama, K., Ohta, K.: Fault injection
and key retrieval experiments on an evaluation board. In: Joye, M., Tunstall, M. (eds.) Fault
Analysis in Cryptography, pp. 313–331. Information Security and Cryptography, Springer,
Berlin (2012)
37. Tupsamudre, H., Bisht, S., Mukhopadhyay, D.: Destroying fault invariant with randomiza-
tion. In: Cryptographic Hardware and Embedded Systems–CHES 2014, pp. 93–111. Springer
(2014)
38. Wang, L.T., Wu, C.W., Wen, X.: VLSI Test Principles and Architectures: Design for Testabil-
ity. Academic Press (2006)
39. van Woudenberg, J., Witteman, M., Menarini, F.: Practical optical fault injection on secure
microcontrollers. In: 2011 Workshop on Fault Diagnosis and Tolerance in Cryptography
(FDTC), pp. 91–99 (2011)
40. Yanci, A.G., Pickles, S., Arslan, T.: Characterization of a voltage Glitch attack detector for
secure devices. In: Symposium on Bio-inspired Learning and Intelligent Systems for Security,
2009. BLISS’09, pp. 91–96. IEEE (2009)
41. Yuce, B., Ghalaty, N.F., Schaumont, P.: TVVF: Estimating the vulnerability of hardware cryp-
tosystems against timing violation attacks. In: 2015 IEEE International Symposium on Hard-
ware Oriented Security and Trust (HOST), pp. 72–77. IEEE (2015)
Chapter 10
Hardware Trojan Attacks
and Countermeasures
Hassan Salmani
10.1 Introduction
Reported by Ernst&Young LLP [1], modern devices like smart mobility and cloud
computing are demanding more from silicon chips (e.g., lower power consumption
for mobile devices and data centers, increasing integration of functions).
System-on-chip (SoC) solutions that integrate increasing number of functions on a
single chip serve as a primary way to address costumer demands. Semiconductor
companies are integrating processor and memory cores with power management,
graphic processors, a potentially large number of different wireless communications
technologies (e.g., CDMA, GSM, WiFi, Bluetooth) and many other functions. With
increasing the complexity of modern devices, proliferating specialized require-
ments, and decreasing time-to-market window, some companies are turning to
third-part IP instead of designing in-house as a cost-effective approach. According
to a new market research report of “Semiconductor (Silicon) IP Market by Form
Factor (Integrated Circuit IP, SOC IP), Design Architecture (Hard IP, Soft IP),
Processor Type (Microprocessor, DSP), Application, Geography and Verifica-
tion IP - Forecast & Analysis to 2013–2020” published by MarketsandMarkets is
expected to grow at a CAGR of 12.6 % from 2014 to 2020 and reach $5.63 billion
in 2020 [2].
There are three main categories of IPs [3]: soft, firm, and hard—Fig. 10.1
depicting their relationships and tradeoffs. Soft IP blocks are specified using RTL or
higher level descriptions. As a hardware description language (HDL) is
process-independent, they are more suitable for digital cores. They are highly
flexible, portable, and reusable, but not necessarily optimized in terms of timing and
power. Presented at the layout level, hard IP blocks are highly optimized for a
H. Salmani (✉)
Howard University, Washington, D.C., USA
e-mail: [email protected]
Fig. 10.2 Worldwide—Intelligent systems as percentage of total systems in each major industry
(%) [4]
In early day of the semiconductor industry, a single company would often be able to
design, manufacture, and test a new chip. However, the costs of building manu-
facturing facilities—more commonly referred “fab”—have gone extremely high.
A fab could cost over $200 million dollars back in the 1980s; however, with
employing advanced semiconductor manufacturing equipment to produce chips with
ever-smaller features, a modern fab costs much more [8]. For example, in late 2012
Samsung made a new fab in Xian, China that cost $7 billion. It has been estimated
that “[i]ncreasing costs of manufacturing equipment will drive the average cost of
semiconductor fabs between $15 billion and $20 billion by 2020.” [9].
Due to confluence of increasingly complex supply chains and cost pressures, the
horizontal supply chain has become prevalent [10]. Figure 10.3 shows the per-
centage of design activities outsourced while chip level design constitutes about
40 % by 2005. Integrated Circuits (ICs) or chips are at the core of any modern
computing system, and their security grounds the security of entire system.
Notwithstanding the central impact of ICs security, malicious modification of IC
circuit by untrusted parties has raised serious concerns for critical applications such
as fail-safe military applications.
To address the issue, the Department of Defense and the National Security
Agency of United States jointly funded a “Trusted Foundry” at an IBM semicon-
ductor manufacturing facility in Vermont, US in 2004. The Trusted Foundry
252 H. Salmani
program is “to ensure that mission-critical national defense systems have access to
leading-edge integrated circuits from secure, domestic sources.” Although the
Trusted Foundry program is used to produce the most sensitive chips, these chips
constitute only a small fraction of chips used for military applications. Department
of Defense heavily relies on commercial supply chain to provide routers, navigation
equipment, and most other electronics hardware—and therefore exposed to any
associated vulnerabilities.
may require a variety of components like memories and chips with different
applications and functionalities.
After providing the system specifications and choosing the structure of system
and its required components, design development requires different tools. Each
component demands specific attention to meet all the system specifications. To
expedite system development and to reduce the final cost, outsourced alternatives
have gradually replaced in-house processes. Third-party IP cores have displaced the
in-house libraries of logic cells for synthesis. Commercial software has supplanted
homegrown Computer Aided Design (CAD) tool software. In the next step,
designed chips are signed-off for fabrication. Nowadays, most companies are fab-
less, outsourcing mask production and fabrication. Besides custom designs, com-
panies can reduce total cost and accelerate system development using
commercial-off-the-shelves (COTSs), reprogrammable modules, like
micro-controllers, reconfigurable components, or field programmable gate arrays
(FPGAs). Afterwards, they manufacture printed circuit boards (PCBs) and assemble
system components on them. Finally, the PCBs are put together to develop units;
the entire system is the integration of these units.
In each step, different verifications or tests are performed to ensure its correct-
ness, as shown in Fig. 10.4. Functional and parametric verifications ascertain the
correctness of design implementation in terms of service and associated require-
ments, like power and performance. Wafer and package tests after the fabrication of
custom designs separate defective parts and guarantee delivered chips. The PCB
fabrication is a photolithographic process and susceptible to defects; therefore, a
PCB should be tested before placing devices on it. After the PCB assembly, the
PCB is again tested to verify that the components are properly mounted and have
not been damaged during the PCB assembly process. The tested PCBs create units
and finally the system, which is also tested before shipping for field operation [11].
Each step of system development is susceptible to security breaches. An
adversary may change system specifications to make a system vulnerable to
malicious activities or susceptible to functional failures. As external resources, like
third-party IPs and COTSs, are widely used in design process and system inte-
gration, adversaries may hide extra circuit(s) in them to undermine the system at a
specific time or to gain control over it. The untrusted foundry issue is rooted in the
outsourcing of design fabrication. Establishing a chip fabrication factory is extre-
mely expensive and most semiconductor companies have become fabless in recent
years. They ask foundries to fabricate their designs to reduce the overall cost. The
third party, however, may change the designs by adding extra circuits, like back
doors to receive confidential information from the chip, or altering circuit param-
eters, like wire thickness to cause a reliability problem in the field. The PCB
assembly is even susceptible, as it is possible to mount extra components on
interfaces between genuine components. In short, the cooperative system devel-
opment process creates opportunities for malicious parties to take control of the
system and to run vicious activities. Therefore, as a part of the system development
process, security features should be installed to facilitate validation, and to unveil
any deviation from genuine specifications.
254 H. Salmani
The practice of outsourcing design and fabrication in the interest of economy, has
raised serious national security concerns, since an adversary can subvert a design by
adding extra circuits, called hardware Trojans [12]. In general, a hardware Trojan is
defined as any intentional alteration to a design in order to alter its characteristics.
A hardware Trojan has a stealthy nature and can alter design functionality under
rare conditions. It can serve as a time bomb and disable a system at a specific time,
or it can leak secret information through side-channel signals.
A Trojan may affect circuit AC parameters such as delay and power; it also can
cause malfunction under rare conditions. As shown in Fig. 10.5, a hardware Trojan
consists of Trojan payload and Trojan trigger. A functional Trojan takes inputs from
some internal nets of the main circuit to the Trojan payload and re-stitches some
other nets of the main circuit through Trojan payload to modify design function-
ality. The Trojan trigger determines the activation condition(s) under which the
Trojan payload can propagate erroneous values into the main circuit.
The first detailed taxonomy for hardware Trojans was presented in [13, 14]. This
comprehensive taxonomy lets researchers examine their methods against different
Trojan types. Currently, the industry lacks metrics to evaluate the effectiveness of
methods in detecting Trojans. Such metrics could foster a comprehensive taxonomy
to help analyze Trojan detection techniques. Because malicious alterations to a
chip’s structure and function can take many forms, the Trojan taxonomy is
decomposed into three main categories (see Fig. 10.6) according to their physical,
activation, and action characteristics. Although Trojans could be hybrids of this
classification (for instance, they could have more than one activation characteristic),
this taxonomy captures the elemental characteristics of Trojans and is useful for
defining and evaluating the capabilities of various detection strategies.
The physical characteristics category describes the various hardware manifes-
tations of Trojans. The type category partitions Trojans into functional and para-
metric classes. The functional class includes Trojans that are physically realized
through the addition or deletion of transistors or gates, whereas the parametric class
refers to Trojans that are realized through modifications of existing wires and logic.
The size category accounts for the number of components in the chip that have been
added, deleted, or compromised. The distribution category describes the location of
the Trojan in the chip’s physical layout. The structure category refers to the case
when an adversary is forced to regenerate the layout to insert a Trojan, which could
then cause the chip’s physical form factor to change. Such changes could result in
different placement for some or all design components. Any malicious changes in
physical layout that could change the chip’s delay and power characteristics would
facilitate Trojan detection. Wang and colleagues identified current adversaries’
capabilities for minimizing the probability of detection.
Activation characteristics refer to the criteria that cause a Trojan to become
active and carry out its disruptive function. Trojan activation characteristics fall into
two categories: externally activated (e.g., by an antenna or a sensor that can interact
with the outside world) and internally activated (which are further classified as
always on and condition based), as Fig. 10.6 shows. “Always on’” means the
Trojan is always active and can disrupt the chip’s function at any time. This
subclass covers Trojans that are implemented by modifying the chip’s geometries
such that certain nodes or paths have a higher susceptibility to failure. The
adversary can insert the Trojans at nodes or paths that are rarely exercised. The
condition-based subclass includes Trojans that are inactive until a specific condition
is met. The activation condition could be based on the output of a sensor that
monitors temperature, voltage, or any type of external environmental condition
(such as electromagnetic interference, humidity, altitude, or temperature). Alter-
natively, this condition could be based on an internal logic state, a particular input
pattern, or an internal counter value. The Trojan in these cases is implemented by
256 H. Salmani
adding logic gates and/or flip-flops to the chip, and hence is represented as a
combinational or sequential circuit.
Action characteristics identify the types of disruptive behavior introduced by the
Trojan. The classification scheme, shown in Fig. 10.6, partitions Trojan actions into
three categories: modify function, modify specification, and transmit information.
The modify-function class refers to Trojans that change the chip’s function by
adding logic or by removing or bypassing existing logic. The modify-specification
class refers to Trojans that focus their attack on changing the chip’s parametric
properties, such as delay when an adversary modifies existing wire and transistor
geometries. Finally, the transmit-information class includes Trojans that transmit
key information to an adversary.
Trojan circuits are sly, triggering only under rare conditions. Trojans are
designed to be silent most of their lifetime, to have a very small size relative to their
host designs, and to make only limited contributions to circuit characteristics.
Analyzing the vulnerabilities of IC development process requires the knowledge of
design, fabrication, and test processes. To ensure a client’s IC is authentic, the entire
design and fabrication process must be made trustworthy or manufactured ICs
should be verified by clients for trustworthiness.
Hardware Trojans have negligible effect on a circuit and rarely become fully
activated. While considerable number work has been presented on hardware Trojan
detection, they can be broadly categorized into two groups: side-channel signal
analysis and logic-value analysis. Majority of work on Trojan detection based on
side-channel analysis has focused on power and delay side-channel signals. To
enhance Trojan detection resolution, some techniques have proposed embedding
monitoring systems into main circuits to capture any abnormality in circuit per-
formance or power consumption. And some other work also recommended design
for hardware trust to magnify Trojan impact during authentication. In addition to
side-channel based techniques, detection techniques based on logic-value analysis
mainly focus on generating effective test patterns to fully activate Trojans and
propagate design malfunction to primary outputs.
In [15], it has shown that extra capacitance incurred by a hardware Trojan attribute
to wire and gate capacitances would change the delay of path connected to Trojan
payload or Trojan trigger. Simulation results for a Trojan (a minimum sized NAND
10 Hardware Trojan Attacks and Countermeasures 257
Table 10.2 The impact of different l3 on the delay of a path in the original circuit [15]
Without Trojan Location 1 Location 2 Location 3 Location 4
(ps) (ps) (ps) (ps) (ps)
Path delay 764.5 794.5 837.7 890.0 953.8
Increased delay 0 30 73.2 125.5 189.3
Fig. 10.9 The basic architecture for shadow register Trojan prevention scheme [16]
path delays are extended beyond the threshold determined by the process variations.
The measurement circuit characterizes a selected path by measuring its exact delay.
CLK1 is the main clock that drives all flop-flops in the main circuit. CLK2 is a clock
with the same frequency as CLK1 but shifted and drives a show register whose
input is the input of register at the end of path being characterized. By shifting
CLK2, the exact delay of selected path is obtained with a precision of the skew step
size whenever the comparison result is unequal.
10 Hardware Trojan Attacks and Countermeasures 259
One of pioneer work in hardware Trojan detection is [17]. In this work, a set of
patterns is applied to a batch of chips. With each pattern application, the power
trace on each chip is collected. The chips are reverse engineered and inspected to
ensure they are Trojan free. The collection of power traces from Trojan-free chips
serves as a reference. After obtaining the reference, the same set of patterns is
applied to a design under authentication and its power traces are collected. The
power traces are compared with the reference and any measurable difference
beyond a specific threshold flags Trojan existence.
One of major challenges with power-based techniques is process variations.
Manufactured chips of one circuit, although have the same functionality, present
different characteristics in terms of transistor parameters such voltage threshold and
channel length due to limited accuracy of manufacturing equipment. Variations are
broadly categorized into inter-chip and intra-chip variations. Inter-chip variations
show a slight shift in parameters from a chip to another chip. On the other hand,
intra-chip variations imply random process variations inside a chip where voltage
threshold of a transistor is reduced while that of nearby one increased. As a result,
two manufactured chips by the same company may present noticeable difference in
their power consumption. The difference may be too high such that Trojan con-
tribution into power consumption might be masked due to process variations.
A number of techniques have been proposed to reduce the impact of process
variations and enhance power-based Trojan detection techniques.
To mitigate the impact of process variations, a multi-supply transient-current
integration methodology is proposed in [18]. While the Fig. 10.10 presents the
concept, a set of random patterns are applied to both a chip without Trojan and a
chip under authentication. While the vertical axis presents charge (Q), the inte-
gration of current over time (t), any measureable different above a predefined
threshold (D(t)) indicates Trojan existence. The technique benefits the fact that the
impact of intra-chip process variations will be canceled over the time by activating
different portions of a circuit by applying random test patterns. Furthermore, the
technique does not incur any area overhead.
In [25], the intrinsic relationship between dynamic current (IDDT) and maximum
operating frequency (Fmax) of a circuit is used to isolate the effect of a Trojan circuit
from process noise. Figure 10.11a, b show average IDDT and Fmax values for an
8-bit ALU circuit (c880 from ISCAS-85 benchmark suite) obtained from simulation
in HSPICE for 100 chips which lie at different process corners. The process corners
are obtained by only considering inter-die variations on transistors’ voltage
threshold. A combinational Trojan (8-bit comparator circuit) is inserted on a
non-critical path in c880; therefore, Trojan impact can be only observed on IDDT
and it does not affect Fmax. As shown in Fig. 10.11a, the spread in IDDT due to
variations easily masks the effect of the Trojan. The problem becomes more severe
with decreasing Trojan size or increasing variations in device parameters in scaled
technologies. Figure 10.11b indicates Fmax for each process corner. While Fmax is
used for calibrating the process corner of the chips, the delay of any path in the
circuit can be used for this purpose.
To distinguish Trojan contribution from process variations impact, the intrinsic
relationship between IDDT and Fmax can be utilized to differentiate between the
original and tampered versions. The plot for IDDT versus Fmax for the ISCAS-85
c880 circuit is shown in Fig. 10.11c. It can be observed that two chips (e.g., Chipi
and Chipj) can have the same IDDT value, one due to the presence of Trojan and the
other due to process variation. By considering only one side-channel parameter, it is
not possible to distinguish between these chips. However, the correlation between
IDDT and Fmax can be used to distinguish malicious changes in a circuit under
process noise. The presence of a Trojan will cause the chip to deviate from the trend
line. As seen in Fig. 10.11c, the presence of a Trojan in Chipi causes a variation in
IDDT when compared to a golden chip (Chipk), while it does not have similar effect
on Fmax as induced by process variation, i.e., the expected correlation between IDDT
and Fmax is violated by the Trojan.
In another work [26], some methods based upon post-silicon multimodal thermal
and power characterization techniques are presented to detect and locate IC Trojans.
The approach first estimates the detailed post-silicon spatial power consumption
using thermal maps of the IC, and it then applies the two-dimensional principal
component analysis to extract features of the spatial power consumption. Finally, it
uses statistical tests against the features of authentic ICs to detect the Trojan.
Fig. 10.11 a Average IDDT values at 100 random process corners (with maximum variation
of ±20 % in inter-die Vth) for c880 circuit. The impact of Trojan (8-bit comparator) in IDDT is
masked by process noise. b Corresponding Fmax values. The Fmax versus IDDT plot can help
identify Trojan-containing ICs under process variations [25]
262 H. Salmani
While majority of work has been focused on Trojan detection based on side-channel
signal analysis, some little work has been on the full activation of hardware Trojans
and the propagation of generated erroneous logic values by the Trojan payload to an
observation point.
Authors in [30] first perform a Trojan target analysis and then apply a Trojan
detection procedure. In the first step, the analysis identifies Trojan trigger vectors
(q), shown in Fig. 10.12, whose occurrence is less than a specific threshold. The
analysis also isolates possible nets used as Trojan payload. In the next step, the
Trojan detection procedure generates a specific set of test vectors to produce
rare-triggering vectors and propagate erroneous logic values to an observation
point. Trojan test vectors are combined with traditional test patterns, such as
stuck-at fault test patterns, and applied during design testing.
10 Hardware Trojan Attacks and Countermeasures 263
some design techniques to detect hardware Trojans after design manufacturing, and
the other group has mainly analyzed switching activities in gate-level netlist to
capture hardware Trojans.
Design Techniques
sensors or the design path), then its presence can be detected by observing a poor
correlation between these two delay ranges. The post-silicon self-authentication
process is shown on the right-hand-side of Fig. 10.13.
In another work [33], a temporal self-referencing approach is proposed for
detecting sequential Trojan. The approach compares the current signature of a chip
at two different time windows to completely eliminate the effect of process noise,
thus providing high detection sensitivity for Trojans of varying size. The effec-
tiveness of the technique is that the transient current “signature” of a Trojan-free
circuit should remain constant over different time windows when the circuit
undergoes the same set of state transitions multiple times. However, in a
Trojan-infected circuit, the current signature varies over multiple time windows for
the same set of state transitions of the original circuit, due to uncorrelated state
transitions in the Trojan.
The unused circuitry identification (UCI) technique is one of the first such tech-
niques which distinguishes minimally used logic from the other parts of the circuit
[34]. First, UCI creates a data-flow graph for a circuit. Nodes of graph are signals
(wires) and state elements and its edges indicate data flow between the nodes.
Based on this data-flow graph, UCI generates a list of all direct and indirect signal
pairs where data flows from a source signal to a sink signal. In the following, UCI
simulates the HDL code using design verification tests to find the set of data-flow
pairs where intermediate logic does not affect the data that flows between the source
and sink signals. UCI centers on the fact that the HT circuitry mostly remains
inactive within a design, and hence such minimally used logic can be distinguished
from the other parts of the circuit.
VeriTrust [35] flags suspicious circuitries by identifying potential trigger inputs
used in HTs, based on the observation that these inputs keep dormant under
non-trigger condition and hence are redundant to the normal logic function of the
circuit. In order to detect the redundant inputs, it first performs functional testing
and records the activation history of the inputs in the form of sums-of-products
(SOP) and product-of-sums (POS). Then it further analyzes these unactivated SOPs
and POSs to find the redundant inputs. However, because of the functional veri-
fication constraints, VeriTrust can see several unactivated SOPs and POSs and thus
regard the circuit to be potentially infected resulting in false positives.
FANCI [36] applies Boolean function analysis to flag suspicious wires in a
design which have weak input-to-output dependency. For each input in the com-
binational logic cone of an output wire, a control value (CV), which represents the
percentage impact of changing an input on the output, is computed. If the mean of
all the CVs is lower than a threshold, then the resulting output wire is considered
malicious. This is a probabilistic method where the threshold is computed with
some heuristic to achieve a balance between security and the false positive rate.
A very low threshold may result in a high false positive rate by considering most of
266 H. Salmani
the wires (even non-malicious ones) as malicious, whereas a high threshold may
actually result in false negatives by considering a HT related (malicious) wire to be
not malicious.
An information-theoretic approach for Trojan detection has been proposed in
[37]. It basically estimates the statistical correlation between signals in a circuit for
Trojan detection with the use of OPTICS clustering algorithm. To study the cor-
relation between the signals, inputs patterns are applied and a weighted graph of
design created. While the technique presents full coverage for selected benchmarks,
the accuracy of technique highly depends on observing enough activity on each
signal for studying signals correlation and presented results indicated nonzero false
positive rate. Furthermore, the application of the technique for large circuits may
require considerable processing time and memory usage. In another effort, a
score-based classification method is presented for identifying hardware Trojans
[38]. The proposed technique extracts Trojan characteristics introduced at
Trust-HUB [39] and defines an incremental metric to isolate some of the Trojan
nets from the rest of circuit.
After circuit synthesis and during physical design, placement tools spread cells such
that circuit routability is guaranteed and circuit constrains in terms of power, per-
formance, and size are met. This often leaves small gaps between cells, it is
impossible to fill 100 % of the area with regular standard cells in VLSI designs.
After completing placement and routing, designers usually fill the empty spaces
with filler cells or decoupling capacitor (DECAP) cells to reduce design rule check
(DRC) violations attributed to the base layers and ensure power rail connection.
However, filler cells do not have functionality. If designers want to make some
changes, well known as Engineering Change Order (ECO), the filler cells could be
deleted and the empty spaces can be utilized for new gates. On the other hand,
intelligent attackers can identify and remove some filler cells for Trojan insertion,
because removing these non-functional filler cells does not change the original
functionality of circuit.
In [40], the built-in self-authentication (BISA) technique is introduced to fill
unused spaces in a circuit layout by functional filler cells, called BISA cells, instead
of non-functional filler cells. These BISA cells are connected together to form a
combinational circuit, the BISA circuit, that is independent from the original cir-
cuits. The BISA circuit is designed so that stuck-at patterns can test all its gates,
thus any change on BISA cells will be detected. Furthermore, BISA cells are the
same as standard cells that the circuit uses, thus identifying these cells will be
extremely difficult. Thus, BISA can be used to prevent Trojan insertion or make
Trojan insertion extremely difficult.
Figure 10.14 shows the structure of BISA consisting of a test pattern generator
module (TPG), BISA circuit under test, and output response analyzer (ORA). In
10 Hardware Trojan Attacks and Countermeasures 267
this paper, the linear feedback shift register (LFSR) is used as TPG and the multiple
input signature response (MISR) as ORA. The output of ORA is used as signature
to detect hardware Trojans. The BISA circuit under test is composed of all BISA
cells which are inserted into unused spaces. The smaller combinational circuit with
fewer gates is, the higher test coverage is. Therefore, the BISA circuit is divided
into a number of smaller combinational logic blocks, called BISA blocks.
Each BISA block can be considered as an independent combinational logic block.
Figure 10.15 shows application of BISA to System05 in 90 nm technology node.
Table 10.3 shows BISA effectiveness under ten attacks. In the system05 circuit,
418 BISA cells are inserted to fill unused spaces. LFSR and MISR with size of 32
are used to form the BISA structure. 616 ATPG patterns can reach 99.65 % testable
coverage. When 500 patterns from LFSR are applied, the stuck-at fault test cov-
erage is 81 %. In Table 10.3, case 0 shows the result for the genuine BISA result.
Fig. 10.15 a System05 before BISA insertion. b System05 after BISA insertion [40]
268 H. Salmani
Table 10.3 Test suite strength and search space reduction [2]
Strength Suit size Covered pattern
t = 2 11 32,512
t = 3 37 2,731,008
t = 4 112 170,688,000
t = 5 252 8,466,124,800
t = 6 720 347,111,116,800
t = 7 2,462 12,099,301,785,600
t = 8 17,544 366,003,879,014,400
Five kinds of gates are selected to be removed from different BISA blocks sepa-
rately. In addition, another five types of gates are selected to be changed to other
types of gates in different BISA blocks separately. The results of ten cases are
shown in Table 10.3. In each case, the signature generated from MISR is different
from the genuine signature, which shows that BISA has detected these attacks. In
Table 10.3, an internal cell means it has children cells, and a leaf cell is a cell that
does not have children cells. (Table 10.4)
Although there has been a significant amount of work on hardware Trojan detection
and prevention, no systematic approach to assess the susceptibility of a circuit to
Trojan insertion has been developed. Sections in a circuit with low controllability
and observability are considered potential areas for implementing Trojans. This
necessitates a thorough circuit analysis to identify potential Trojan locations. Pre-
sented in [41], a comprehensive flow has been developed to perform independent
10 Hardware Trojan Attacks and Countermeasures 269
Fig. 10.17 The dummy flip-flop structures when (a) Pi0 << Pi1, and (b) Pi0 >> Pi1 [42]
detection purpose, it is possible to shut down a part of circuit and reduce circuit
switching activity using the scan-chain reordering technique meanwhile increase
Trojan activity using the dummy scan flip-flop technique.
Infrastructure IP for SoC security (IIPS) [44] is another approach to incorporate
security into a design for its protestation against (1) scan-based attack for infor-
mation leakage through low-overhead authentication; (2) counterfeiting attacks
through integration of a Physical Unclonable Function (PUF); and (3) hardware
Trojan attacks through a test infrastructure for trust validation. Figure 10.19 pre-
sents IIPS’s block diagram consisting of a Master Finite State Machine (M-FSM)
that controls the working mode of IIPS, a Scan Chain Enabling FSM (SE-FSM) to
provide individual control over activation of scan chains in the SoC, and a clock
control module to generate necessary clock and control signals for performing
ScanPUF authentication and path delay-based hardware Trojan detection.
Regarding hardware Trojan detection, IIPS enables the clock sweeping technique
for Trojan detection through monitoring of delay shift by observing the latched
value under clock sweep.
272 H. Salmani
Fig. 10.19 Block diagram of the IIPS module showing interconnection with other IP cores in a
SoC using SoC boundary scan architecture [44]
abrupt transition of blocks to sleep states that results in loss of data and performance
degradation.
Hardware Trojans in network-on-chip (NoC) is another serious challenge in
complex designs. In [46], it is shown that a hardware Trojan can mask itself as
transient errors which can only be activated under very specific conditions to avoid
detection. Induced transient errors in data can exploit vulnerabilities created by the
fault-tolerant techniques. For example, intentional data corruption requires data
retransmissions that may lead to a Denial-of-Service (DoS) attack by creating false
congestion between the routers by consuming network resources. While the error
may appear as benign for the system, this is intentionally created by HTs to create
DoS attack and disrupt the system. In another work [46] it has studied DoS attacks
in the NoC routers where a hardware Trojan can maliciously change the flit
source/destination address or flit type information of a packet that has left the
transmitter network interface (NI). If a Trojan payload modifies the destination
address of a packet, that packet could be directed to an unauthorized IP core. A drop
of the header flit or tail flit will result in the incomplete packet being retained in the
router until some operation arrives to reset the router. To prevent limit hardware
Trojan in NoCs, a collaborative dynamic permutation and flit integrity check
method is proposed by [46] that is capable of examining the invariables of NoC to
immediately terminate the detected HTs.
A novel technique with low-overhead security framework for a custom
many-core router using Machine Learning techniques has been presented by [47].
While it is assumed that processing cores and memories are safe, and anomaly is
included only through router. The attack corrupts the router packet by changing the
destination address that results in traffic diversion, route looping, or core spoofing
attack. To detect hardware Trojans in routers a “Golden Data Set” based on
hardware feature analysis and anomaly insertion effects has been developed. The
Golden Data Set considers Source Core, Destination Core, Packet Transfer Path,
Distance, Dynamic Power Range, Execution Time Range, Clock Frequency,
Supply Voltage and studies their correlation to reduce the complexity of proposed
machine-leaning-based detection technique.
There have been significant efforts to address hardware Trojans, a very challenging
issue in electronic chips. Although variety of techniques and methodologies has
been proposed, majority of them are carrying certain assumptions or not well
scalable to modern designs such that make their applicability limited. One of main
assumptions is the existence of a golden model as a reference in side-channel based
Trojan detection techniques. A golden model can be obtained by reverse engi-
neering that is a costly and distractive process. Meanwhile, it may not provide a
perfect reference because of variations in process parameters between chips. Fur-
thermore, scalability of proposed techniques is not well studied as most experiments
274 H. Salmani
are performed on small circuits compared with industrial ones. In addition, it would
be difficult to model hardware Trojans contrary to defects caused by manufacturing
as hardware Trojans are intentional and malicious modification by nature. Another
challenge in hardware Trojan detection is the lack of standard metrics to quanti-
tatively determine the security of a design. The metrics make it possible to measure
the effectiveness of different techniques and compare their strengths and weak-
nesses. Finally, although there have been some efforts on developing Trojans
design, there is a need to more comprehensive trust benchmarks that different
researchers can use to evaluate and compare their solutions.
References
15. Xiao, K., Zhang, X., Tehranipoor, M.: A clock sweeping technique for detecting hardware
Trojans impacting circuits delay. IEEE Des. Test 30(2), 26–34 (2013)
16. Li, J., Lach, J.: At-speed delay characterization for IC authentication and Trojan horse
detection. In: Proceedings of IEEE International Symposium Hardware-Oriented Security and
Trust, pp. 8–14 (2008)
17. Agrawal, D., Baktir, S., Karakoyunlu, D., Rohatgi, P., Sunar, B.: Trojan detection using IC
fingerprinting. In: Proceedings of the Symposium on Security and Privacy, pp. 296–310
(2007)
18. Wang, X., Salmani, H., Tehranipoor, M., Plusquellic, J.: Hardware Trojan detection and
isolation using current integration and localized current analysis. In: Proceedings of the
International Symposium on Fault and Defect Tolerance in VLSI Systems, pp. 87–95 (2008)
19. Huang, H., Bhunia, S., Mishra, P.: MERS: statistical test generation for side-channel analysis
based Trojan detection. In: ACM Conference on Computer and Communications Security
(CCS), Vienna, Austria, 24–28 Oct 2016
20. Ferraiuolo, A., Zhang, X., Tehranipoor, M.: Experimental analysis of a ring oscillator network
for hardware Trojan detection in a 90 nm ASIC. In: Proceedings of IEEE/ACM International
Conference on Computer-Aided Design, pp. 37–42 (2012)
21. Rajendran, J., Jyothi, V., Sinanoglu, O., Karri, R.: Design and analysis of ring oscillator based
design-for-Trust technique. In: Proceedings of IEEE VLSI Test Symposium, pp. 105–110
(2011)
22. Rad, R., Plusquellic, J., Tehranipoor, M.: A sensitivity analysis of power signal methods for
detecting hardware trojans under real process and environmental conditions. IEEE Trans.
Very Large Scale Integr. Syst. 18(12), 1735–1744 (2010)
23. Narasimhan, S., Yueh, W., Wang, X., Mukhopadhyay, S., Bhunia, S.: Improving IC security
against Trojan attacks through integration of security monitors. IEEE Des. Test Comput. 29
(5), 37–46 (2012)
24. Karimian, N., Tehranipoor, F., Rahman, M.T., Kelly, S., Forte, D.: Genetic algorithm for
hardware Trojan detection with ring oscillator network (RON). 2015 IEEE International
Symposium on Technologies for Homeland Security (HST), Waltham, MA, pp. 1–6 (2015)
25. Narasimhan, S., Du, D., Chakraborty, R.S., Paul, S., Wolff, F., Papachristou, C., Roy, K.,
Bhunia, S.: Hardware Trojan detection by multiple-parameter side-channel analysis. IEEE
Trans. Comput. 62(11), 2183–2195 (2013)
26. Hu, K., Nowrozy, A.N., Reday, S., Koushanfar, F.: High-sensitivity hardware Trojan
detection using multimodal characterization. In: Proceedings of the Conference on Design,
Automation and Test in Europe, pp. 1271–1276 (2013)
27. Koushanfa, F., Mirhoseini, A.: A Unified framework for multimodal submodular integrated
circuits Trojan detection. IEEE Trans. Inf. Forensics Secur. 6(1), 162–174 (2011)
28. Potkonjak, M., Nahapetian, A., Nelson, M., Massey, T.: Hardware Trojan horse detection
using gate-level characterization. In: Proceedings of Design Automation Conference,
pp. 688–693 (2009)
29. Alkabani, Y., Koushanfar, F.: Consistency-based characterization for IC Trojan detection. In:
Proceedings of IEEE/ACM International Conference on Computer-Aided Design, pp. 123–
127 (2009)
30. Wolff, F., Papachristou, C., Bhunia, S., Chakraborty, R.S.: Towards Trojan free trusted ICs:
problem analysis and detection scheme. In: Proceedings of ACM Design, Automation and
Test in Europe Conference, pp. 1362–1365 (2008)
31. Voyiatzis, A.G., Stefanidis, K.G., Kitsos, P.: Efficient triggering of Trojan hardware logic. In:
2016 IEEE 19th International Symposium on Design and Diagnostics of Electronic Circuits &
Systems (DDECS), Kosice, pp. 1–6 (2016)
32. Li, M., Davoodi, A., Tehranipoor, M.: A sensor-assisted self-authentication framework for
hardware Trojan detection. IEEE Des. Test 30(5), 74–82 (2013)
33. Narasimhan, S., Wang, X., Du, D., Chakraborty, R.S., Bhunia, S.: TeSR: A robust temporal
self-referencing approach for hardware trojan detection. In: Proceedings of IEEE International
Symposium on Hardware-Oriented Security and Trust, pp. 71–74 (2011)
276 H. Salmani
34. Hicks, M., Finnicum, M., King, S.T., Martin, M., Smith, J.M.: Overcoming an untrusted
computing base: detecting and removing malicious hardware automatically. In: IEEE
Symposium on Security and Privacy, pp. 64–77 (2010)
35. Zhang, J., Yuan, F., Wei, L., Sun, Z., Xu, Q.: VeriTrust: verification for hardware trust. In:
ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 61:1–61:8 (2013)
36. Waksman, A., Suozzo, M., Sethumadhavan, S.: FANCI: identification of stealthy malicious
logic using Boolean functional analysis. In: Proceedings of the 2013 ACM SIGSAC
Conference on Computer & Communications Security (CCS), pp. 697–708 (2013)
37. Çakir, B., Malik, S.: Hardware Trojan detection for gate-level ICs using signal correlation
based clustering. In: Proceedings of the 2015 Design, Automation & Test in Europe
Conference & Exhibition (DATE), pp. 471–476 (2015)
38. Oya, M., Shi, Y., Yanagisawa, M., Togawa, N.: A score-based classification method for
identifying hardware-Trojans at gate-level Netlists. In: Proceedings of the 2015 Design,
Automation & Test in Europe Conference & Exhibition (DATE), pp. 465–470 (2015)
39. Salmani, H., Tehranipoor, M., Karri, R.: On design vulnerability analysis and trust benchmark
development. In: IEEE International Conference on Computer Design (ICCD) (2013)
40. Xiao, K., Tehranipoor, M.: BISA: Built-in self-authentication for preventing hardware Trojan
insertion. In: Proceedings of IEEE International Symposium on Hardware-Oriented Security
and Trust (HOST), pp. 45–50 (2013)
41. Tehranipoor, M., Salmani, H., Zhang, X.: Integrated Circuit Authentication Hardware Trojans
and Counterfeit Detection. Springer (2014)
42. Salmani, H., Tehranipoor, M., Plusquellic, J.: A novel technique for improving hardware
Trojan detection and reducing trojan activation time. In: IEEE Trans. Very Large Scale Integr.
(VLSI) Syst. 20(1), 112–125 (2012)
43. Salmani, H., Tehranipoor, M.: Layout-aware switching activity localization to enhance
hardware trojan detection. IEEE Trans. Inf. Forensics Secur. 7(1), 76–87 (2012)
44. Wang, X., Zheng, Y., Basak, A., Bhunia, S.: IIPS: infrastructure IP for secure SoC design.
IEEE Trans. Comput. 64(8), 2226–2238 (2015)
45. Jayashankara Shridevi, R., Rajamanikkam, C., Chakraborty, K., Roy, S.: Catching the Flu:
emerging threats from a third party power management unit. In: 2016 53nd
ACM/EDAC/IEEE Design Automation Conference (DAC), Austin, TX, pp. 1–6 (2016)
46. Frey, J., Yu, Q.: A hardened network-on-chip design using runtime hardware Trojan
mitigation methods. Integration (VLSI J.) (2016)
47. Kulkarni, A., Pino, Y., French, M., Mohsenin, T.: 2016. Real-time anomaly detection
framework for many-core router through machine-learning techniques. J. Emerg. Technol.
Comput. Syst. 13(1) Article 10 (June 2016), 22 pp. doi:http://dx.doi.org/10.1145/2827699
48. Mokhoff, N., Wallace, R.: Outsourcing trend proves: complex by design. EE Times. http://
www.eetimes.com/document.asp?doc_id=1152570 (2005)
Chapter 11
In-place Logic Obfuscation for Emerging
Nonvolatile FPGAs
11.1 Introduction
∙ Nonvolatile memory (NVM) FPGAs utilize antifuse [21] or Flash memory [22]
to maintain configuration data on-chip. Physical attack, e.g., probing after reverse
engineering [4], is the major security threat.
∙ Partial run-time reconfiguration emerges as an important security issue too. For
example, Xilinx products realize run-time reconfiguration via internal configura-
tion access port (ICAP), which is less secure due to port vulnerability [3, 13].
Updating configurations remotely could go through public and insecure network,
which requires more than one authentication schemes to enhance security level.
Many novel FPGA architectures were proposed by utilizing the emerging NVM
technologies, such as phase change memory (PCM), spin-transfer torque RAM (STT-
RAM), and resistive RAM (RRAM) [5, 20, 27]. On the one hand, the use of NVM
technologies promises fast operations as conventional SRAM-based FPGAs (SRAM-
FPGA), increases configuration capacity, and lowers system power consumption sig-
nificantly [18]. On the other hand, the nonvolatile storage of logic configuration in
these architectures raises a big concern in design security: powerful attackers could
access configuration memory which indeed contains the entire design without any
further protection. Note that the situation does not exist in SRAM-FPGA in which
data cannot be retained during powering off. Certainly, user can erase the data in
configuration memory after usage and initialize it from external or in-package mem-
ory when needed. The security concern in such an operation mode then becomes
similar to that of SRAM–FPGA at communication port. In summary, logic and stor-
age components made of NVMs are more vulnerable to physical attacks, making IP
protection, and data security even more challenging.
This work targets at the security issue in NVM-based FPGAs. Particularly, a hard-
ware security scheme is proposed for RRAM-based FPGA (RRAM-FPGA), in which
RRAM devices are used to construct look-up tables (LUTs) for logic functions as
well as block RAMs (BRAMs) for configuration and temporary data storage [7]. The
design demonstrates a high density of logic integration and well supports partial run-
time reconfiguration. The hardware security scheme in the work protects RRAM-
FPGA in three aspects:
1. An obfuscated configuration is loaded to BRAMs, combining a Chip DNA for
logic function identification. The FPGA system operates the designed function-
ality only when all the pieces of the logic configuration are correctly selected
from BRAMs and assembled in a proper sequence.
2. When a higher security level is needed, the system enters the blank mode by eras-
ing the contents on nonvolatile logic and routing elements. Even attackers obtain
the obfuscated configuration on BRAMs through physical attacks, the design can-
not be revealed or reproduced without Chip DNA.
3. We combine the communication ports of initialization and run-time reconfigu-
ration in RRAM-FPGA. The bitstream loading scheme is enhanced by the en-
crypted addressing, which enables partial random configuration loading and se-
cret key updating to resist bitstream piracy and protocol-based denial-of-service
attack.
11 In-place Logic Obfuscation for Emerging Nonvolatile FPGAs 279
The three key components together offer a high level protection on the hardware
and data communication of RRAM-FPGA. Our evaluations show that at acceptable
system loading and execution performance, the proposed scheme can resist level 3
attackers [1, 3]. Meanwhile, the communication port protected by the encrypted ad-
dressing demonstrates a much lower probability of protocol-based denial-of-service
attack compared to the modern FPGAs with AES encryption.
The rest of the paper is organized as follows. Section 11.2 gives a brief introduc-
tion on hardware security in FPGAs and the preliminary of RRAM-FPGA design.
Section 11.3 describes the threat models in RRAM-FPGA and the corresponding
solutions in this work. Section 11.4 presents the design details of the proposed secu-
rity scheme. We present the security evaluation and system performance analysis in
Sect. 11.5. At the end, Sect. 11.6 concludes the paper.
11.2 Background
Most of the commercial FPGAs use SRAM-based look-up tables (LUTs) to realize
logic functions [2, 32]. An external memory is needed to store design configuration
and initialize system during powering up. The connection between FPGA and its
external memory, therefore, is the weakest point in data protection. Attackers could
probe the signal at the connection to discover the bitstream and even reform the sys-
tem into denial-of-service [13]. Encryption technologies, such as AES, are widely
adopted to protect bitstreams in modern FPGAs [34]. Physical unclonable function
(PUF) is another popular solution in preventing attacks of bitstream reverse engi-
neering [17].
Nonvolatile FPGAs equipped with antifuse [19] or Flash memory [24] do not re-
quire external memory, and therefore have higher security level than SRAM-FPGAs.
However, the on-chip logic configuration could be pirated through probing attack
after reverse engineering [13]. For example, Lattice products [19] use a in-system
programmable scheme, which integrates NVMs of bitstream storage and SRAMs of
function logics into one package. Distributed security bits are placed in the silicon
as security fuses in loading configuration data. The technique aims at non-invade at-
tack and provides a moderately high (MODH) security level. Microsemi, previously
known as Actel, supplies FPGAs with functional memory in Flash fabric, which is
innately MODH device [24]. A large variety of cryptography services, e.g., AES-
128, SHA-256, and PUF, are offered [23]. Moreover, anti-tamper protection scheme
is provided to further protect design from physical attacks. It includes a physical con-
tainment to detect physical attacks and a system level protective loop to detect the
disturbs of protective mesh [24]. Once tamper is over the limitation, penalty such as
erasure of the entire design would be kicked in.
280 Y.-C. Chen et al.
FPGAs built with various emerging NVMs have been proposed previously [5, 7, 20,
27]. In this work, the FPGA architecture built with RRAM technology is taken as the
example case for its extremely high density, fast execution speed, and better support
on run-time reconfiguration [7].
As illustrated in Fig. 11.1, the smallest reconfiguration unit (RU) in RRAM-
FPGA includes not only the logic and routing elements but also a block RAM
(BRAM). The BRAM stores temporary data and logic configuration to enhance func-
tionality flexibility, and execution performance [7]. In this architecture, the logic
configuration is divided into two steps.
∙ Step 1: Through bitstream loading, design configuration is broadcast to and stored
in BRAMs.
∙ Step 2: Each RU distributes the configuration in BRAM to the corresponding logic
and routing elements through special tracks.
By leveraging RRAM technology, RRAM-FPGA significantly reduces leakage
power consumption and increases logic integration density [6]. However, the sys-
tem and data protection becomes more challenging.
Nanoscale memory devices such as PCM, STT-RAM, and RRAM, demonstrate su-
perior advantages on security primitives as compared with complementary metal
oxide semiconductor (CMOS)-based memory devices in modern silicon on-chip
(SOC) designs. Nanoscale memory devices have native characteristics of ultra low
power consumption, fast accessing time, and stronger robustness, which can be ap-
plied to address security issues, such as piracy, counterfeiting, and side channel at-
tacks. Emerging solutions of hardware security including physical unclonable func-
tions (PUF), public physical unclonable functions (PPUFs), nonvolatile memories
(NVMs), memristor-based true random number generator (MTRNG), unique signa-
tures, tamper detection circuits, and cryptographic architectures, have received in-
tense study in recent years. The following is an introduction of PUF, PPUFs, NVMs,
and MTRNG in modern SOC designs.
Though nanoscale memory devices have advantages in SOC designs, innate non-
volatility of the devices incurs a concern of data leak in applications of memory
system since data lasts longer time as compared to conventional memory devices,
such as SRAM and DRAM. For a high density RRAM-FPGA, the concern becomes
even serious since FPGAs rely on on-chip memory system to configure logic func-
tions. In the following sections, we will introduce a simple and effective solution to
overcome the security concerns of nanoscale memory devices in FPGAs. The tech-
nique also helps to address conventional FPGA security flaw, such as spoofing and
replay attacks [8, 10].
282 Y.-C. Chen et al.
Figure 11.2 summarizes three major security threats in RRAM-FPGA and the corre-
sponding hardware solutions proposed in this work. The design and implementation
details of the proposed security scheme shall be described in Sect. 11.4.
Threat 1: Pirating configurations in BRAMs. As aforementioned, RRAM-FPGA
first loads a design into distributed BRAMs [7]. It is unlikely to encrypt the logic
configuration at this step because BRAMs are also be used as data memory in system
operation. Moreover, introducing an encryption scheme to each BRAM can severely
increase design area and complexity. Thus, the physical attack is a major threat when
distributing a RRAM-FPGA with preloaded IP: attackers may obtain the BRAM
content through probe attack and then duplicate it on other FPGAs. As illustrated
in Fig. 11.2a, we propose to leverage the extremely high storage density of RRAM
technology and place obfuscated copies of configurations. A Chip DNA is used to
enable the logic function. As such, even an unauthorized attacker has obtained the
data in BRAMs, without the Chip DNA, he/she is not able to identify the correct
logic combination or discover the system functionality.
Threat 2: Physical attack on logic components. After FPGA initialization, attack-
ers can obtain the exact design information by probing the memory units used for
(a)
(b)
(c)
Fig. 11.2 Three major threats in RRAM-FPGA: a Pirating configuration in BRAMs. b Physical
attack on logic components. c Attacking bitstream at communication port
11 In-place Logic Obfuscation for Emerging Nonvolatile FPGAs 283
logic and routing operations. Therefore, it is even more important to protect the data
in logic and routing components. For the designs with a higher security level, we
introduce a blank mode in which the configuration data on logic and routing ele-
ments will be erased at the end of normal operations or as power failure occurs, as
shown in Fig. 11.2b. Note that the design is still maintained in BRAMs and can be
re-initialized with Chip DNA.
Threat 3: Attacking bitstream at communication port. The remote and partial run-
time reconfigurations are naturally supported in RRAM-FPGA. Hence, bitstream
protection is important to prevent piracy or protocol-based denial-of-service attack.
Bitstream piracy targets at revealing logic function by predicting configurations from
load sequence or comparing bitstreams from different customers. Protocol-based
denial-of-service attack, including random pattern injection and replay attacking,
can also hurt system integrity. Here, we propose a encrypted addressing scheme. As
illustrated in Fig. 11.2c, it blends a configuration bitstream by mixing up loading se-
quence and adding redundant pieces. Meanwhile, it can support the change of secret
key for encryption and decryption to prevent replay attacking in remote reconfigu-
ration.
Attacker model: Powerful attacker. This work assumes powerful attackers who
have equipment to perform physical attacks and possess cutting edge supercomput-
ers for brute force attacks. They have the capability to fetch data stored in RRAM
through reverse engineering. They also have knowledge of attacks at communication
port through network or eavesdropping. Such attackers are level 3 attackers based
on IBM’s report [1]. In the proposed secure RRAM-FPGA, three elements—the
preloaded logic configuration with obfuscation, the Chip DNA, and the control of
communication network—are needed to acquire logic configuration data and acti-
vate the chip functionality. In the work, we assume powerful attackers can obtain
either two but not all of the three elements. Attackers shall be able to reveal the logic
design based on the logic configuration and the Chip DNA. However, probing attack
to obtain the data from all the RRAM cells is extremely time and cost consuming.
Attackers with the logic configuration and the control of communication port will
still require the Chip DNA to activate the chip functionality. Otherwise, it is highly
impossible to acquire the real design from the obfuscated data. By taking the control
of communication port and possessing Chip DNA, attackers may perform denial-of-
service attack. However, there is no way to acquire logic function.
By blending the real design with some redundant data, the obfuscated configurations
are generated and then loaded into BRAMs. Each BRAM contains only one required
configuration. All the remaining copies result in denial-of-service. The FPGA system
operates normally only when the required logic configurations are properly selected
and assembled. The indices of correct copies in all the RUs form a configuration
indication, which is defined as Chip DNA.
Here, let’s use a FPGA design with four RUs (R1 ∼ R4) in Fig. 11.3 to demon-
strate the configuration obfuscation. We assume a RU contains up to four different
design configurations (C1 ∼ C4). The different colors in the figure represent the log-
ics belonging to different designs. The FPGA could operate in the following condi-
tions:
1. Blank configuration. Even the configuration data has been loaded in BRAMs, the
logic and routing elements cannot be set up without Chip DNA. In another words,
the FPGA is not functional yet.
2. Normal configuration. The FPGA operates normally under a given design con-
figuration (i.e., Yellow function) after applying a proper DNA sequence (i.e., C1-
C2-C4-C3) to RUs (i.e., R1-R2-R3-R4).
3. Faulty configuration occurs if an incorrect DNA is applied. For instance, the DNA
sequence C1-C2-C4-C4 triggers unmatched logic functions from different RUs:
R4 is set up for Cyan design, which is not compatible to the Yellow functions of
the other RUs. Thus, the FPGA cannot work properly.
4. Multi-boot is naturally supported in the proposed design. For example, beside
the aforementioned Yellow function, the sample FPGA can also realize Magenta
function when applying another DNA sequence C3-C4-C2-C1. To further en-
hance the security level, we can generate the obfuscated data by packing logics
with similar routing connections within the same RU or intentionally inject cer-
11 In-place Logic Obfuscation for Emerging Nonvolatile FPGAs 285
tain redundant routing information. In this way, guessing the logic pattern through
connection relation would become even harder.
During system initialization, the Chip DNA is loaded through the DNA module
(refer Fig. 11.3). The DNA sequence shall be encrypted by AES to resist wiretap at-
tack. After loading into the FPGA, the DNA sequence will be partitioned and distrib-
uted to corresponding RUs. In hardware implementation, we can allocate an index
to each RU, or share the same piece of DNA among a group of RUs to reduce the
length of DNA.
The introduction of obfuscated copies requires larger BRAMs and potentially in-
creases the size of RUs, which is a major concern of the proposed scheme. For-
tunately, RRAM technology offers very compact data storage (∼40 × of SRAM)
and can be integrated in 3D monolithic stacking structure [14], making high den-
sity BRAM possible. Based on the design parameters in [7], we analyzed and com-
pared the area cost of BRAM and RRAM-FPGA. The results are summarized in
Table 11.1. Here, the area values of these designs are all normalized to that of the
baseline SRAM-FPGA which does not utilize BRAMs. Note that the area increment
in RRAM-FPGA has a nonlinear relationship with BRAM capacity: adding a BRAM
with one or four copies of configuration logic induces 47.5 % or 76.9 % area over-
head to the design without BRAM. This is because the 3D structure is adopted in
constructing BRAMs [7]. Overall, a RRAM-FPGA with four copies of configuration
logic occupies 34 % less area compared to the baseline SRAM-FPGA.
The capacity of logic confiscation shall be users’ choice. For economical usage,
users can create obfuscation logic with BRAM of two copies to save area cost. In the
work, we suggest the BRAM capacity of four copies of configuration data. As such,
the system can protect high sensitive designs by utilizing four copies of logic obfus-
cation. Some less sensitive applications, instead, can take only half of the capacity
for logic obfuscation and the rest memory to store temporary data, which improves
system performance by reducing IO accesses.
In a general purpose application built with nonvolatile FPGA, the configuration data
is kept in logic and routing elements. However, the nonvolatile configuration data
is in danger of physical attack, e.g., probing attack, potentially resulting in security
286 Y.-C. Chen et al.
concerns. Here, blank mode operation is proposed for the RRAM-FPGA to prevent
physical attacks. Whenever the system completes normal operations or detects power
failure, it automatically erases the content on logic and routing elements with assist
of a power-off erasing scheme. At the end of normal operations, an instruction shall
be issued to enable the blank configuration before the system is powered off. In the
case of power failure, a backup power is necessary to erase at last partial config-
uration data. Since the programming energy of RRAM is relative small, on-board
soldered battery [19] or super-capacitor [16] shall be sufficient enough. The blank
configuration in Sect. 11.4.1 indeed is a form of blank mode. Even attackers obtain
the content in BRAMs, recovering the design functionality is difficult because the
correct configuration copies are hidden by the dummy copies.
Note the blank mode operation cannot be supported in the conventional and many
emerging NVM FPGAs, in which the logic configuration is stored only in the logic
and routing elements. In contrast, our design has a copy of obfuscated design in
BRAMs. Therefore, the design can always be recovered as far as the end-user keeps
the proper Chip DNA.
(a)
(b)
(c)
(d)
Fig. 11.4 Various loading schemes: a serial loading; b random loading; c dummy loading; and d
instruction and key loading
dress decryption, are shown in Fig. 11.3. However, such a simple loading scheme
inevitably faces security issues, such as bitstream piracy and protocol-based denial-
of-service attack, due to the sequential address loading and unprotected configura-
tion. In this work, we propose the encrypted addressing to protect the communication
ports aiming at different conditions.
Scenario 1—Bitstream piracy: Attackers could reveal configurations from loading
sequence by comparing two or multiple bitstreams belonging to different customers.
Our Solution—Random loading scheme can be easily realized in the address-based
design by hashing the loading sequence as shown in Fig. 11.4b. As a result, the ad-
dress sequences of different bitstreams are not comparable. The address hashing shall
be done by designers/IP providers at software level. It doesn’t introduce extra hard-
ware cost to RRAM-FPGA system.
Scenario 2—Protocol-based denial-of-service attack: While proceeding partial
reconfiguration, attackers may record the encrypted addresses and inject fake con-
figuration at these addresses. They even do not have to find out decrypted address to
make system malfunction. Our Solution—Dummy loading: We can intentionally in-
serts dummy data during loading process, as illustrated in Fig. 11.4c. These dummy
data eventually will be thrown away once the system detects that the dummy address
corresponds to invalid RU location. Note that a large pool of invalid addresses are
288 Y.-C. Chen et al.
available for dummy loading. For example, a FPGA having 32,768 RUs need only
15 bits to address all the RUs, while length of coded address is 128 bits.
Scenario 3—Replay attacking indicates the situation when attackers record (a
piece of) the old bitstream and replay it after designer configures a new design into
the system. Our Solution – Instruction and key loading: We define a small instruction
set by leveraging a few unused bits of the address code. A Key-renew operation is
introduced to update the secret Key of AES as illustrated in Fig. 11.4d. The instruc-
tion I1 requests the key verification operation and the old key K1 is loaded through
configuration port for comparison. If K1 matches the secret Key stored in the Key
module, the system approves the key updating operation. The instruction I2 initiates
the loading of the new key K2, which is encrypted based on the old secret key.
Major security concerns of FPGAs can be cataloged into two aspects: (1) the piracy
of logic configuration and (2) the denial-of-service attack. Our proposed methodol-
ogy can successfully protect a design from physical attack, analytic attack and pattern
recognition attack. In this section, we will use a RRAM-FPGA composed of 32,768
RUs, each of which allows four design copies, to evaluate the security level of the
proposed scheme.
An end-user can design his own logic function or buy IP core from other IP com-
panies or silicon vendors. In RRAM-FPGAs, obfuscated configuration, encrypted
address, and Chip DNA utilize different protection schemes. Without a complete set
of the three parts, it is very difficult for an attacker to pirate the correct logic function.
For powerful attackers who have FPGA board and get all obfuscated configu-
ration data, they still need Chip DNA to activate the function. Here, the length of
Chip DNA determines the complexity of analytic attack, which in general is very
high. Theoretically, the number of rounds through brute attack is 265,536 , which is
extraordinary large to figure out the correct DNA.
Loading bitstream to FPGA over insecure network is under protocol eavesdrop-
ping. Attackers could get unencrypted obfuscated logic and encrypted address. Thus,
it is still hard to place a piece of logic function into correct location. Attackers may
be able to duplicate the bitstream to other FPGAs. However, different AES key can
be applied, indicating that the location mapping of logic function pieces is different.
Even AES module is under side channel attack and not secure, we still have the Chip
DNA to protect the chip function: an incorrect DNA cannot activate device or reveal
logic configuration as explained in Sect. 11.4.1.
11 In-place Logic Obfuscation for Emerging Nonvolatile FPGAs 289
The run-time and remote reconfigurations have become more and more popular on
FPGA-based embedded systems, e.g., space system and set top boxes [3]. Modern
FPGAs use ICAP for partial run-time reconfiguration, which is more vulnerable
compared to initialization port. Remote configuration usually transmit data through
existing network, which could be insecure. Multiple authentications are needed to
guarantee the authentication process not pirated by the attackers. The proposed
290 Y.-C. Chen et al.
RRAM-FPGA does not differentiate the initialization and partial run-time recon-
figuration. Both operations go through the same communication port, are protected
by the same hardware scheme, and hence have the same security level.
The major threats in run-time and remote reconfiguration include protocol-based
denial-of-service attack and replay attacks. Among the 128 bits of an address code,
only 15 bits are valid for the RRAM-FPGA with 32,768 RUs. Hence, when attack-
ers inject false or blank configurations to damage system integration, the chance to
hit the valid addresses is very low. Probability of protocol-based denial-of-service
attack of the proposed encrypted addressing when varying address length from 128
bits to 256 bits. The size of RRAM-FPGA changes from 16,364 RUs to 65,536 RUs.
For the FPGA with 32,768 RUs and 128-bit address, the probability of a successful
protocol-based denial-of-service attack is as low as 2−116.3 . Large FPGA design with
more RUs has a higher attack probability because more bits in address are valid. For
such a system, properly extending the length of address should be sufficient. The
replay attack could be successful if attackers record the information during transfer-
ring and resend it through the same network to attack the FPGA system. The secret
key updating of the encrypted addressing loading scheme can prevent this type of
attack. When attackers replay the old configurations, the decrypted address based on
the new secret key will be mapped to an invalid address. So nothing will be loaded
into RUs. In contrast, the system designers with the new secret key are still able to
change ciphertext of the address and DNA and load function for the next remote
configuration.
The time to distribute the genuine configurations from BRAMs to logic and rout-
ing elements is irrelevant to the number of obfuscated copies. Figure 11.6b shows
the configuration time of different RRAM-FPGAs with various capacity. Because
the programming mechanism is designed to serially program memory cells for less
area cost, the design with 8-input LUTs needs longer configuration time. And larger
FPGAs with more RUs takes longer time in configuration if DNA index is loaded to
RU in sequence. However, if we assume the Chip DNA is prefetched and stored, and
functions in all RUs are distributed in parallel (“fast”), the configuration time can
be shorten to 10−6 to 10−5 s.
11.6 Conclusion
to resist physical attack, and the encrypted addressing to guard from the bitstream
attacks. Our simulations show that the proposed design can resist level 3 attackers
by slightly prolonging system loading and execution performance. The proposed
RRAM-FPGA is suitable for embedded and high performance computing systems
for better design flexibility and hardware security.
References
1. Abraham, D., Dolan, G., Double, G., Stevens, J.: Transaction security system. IBM Syst. J.
30(2), 206–229 (1991)
2. Altera: Logic Array Blocks and Adaptive Logic Modules in Stratix V Devices (2011). http://
www.altera.com
3. Badrignans, B., Danger, J., Fischer, V., Gogniat, G., Torres, L.: Security Trends for FPGAS:
From Secured to Secure Reconfigurable Systems. Springer (2011)
4. Bottom Line Technlogy: Reverse Engineering/Re-Engineering Services (2011). http://www.
bltinc.com
5. Chen, Y., Zhao, J., Xie, Y.: 3D-NonFAR: three-dimensional non-volatile FPGA architecture
using phase change memory. In: International Symposium on Low Power Electronics and De-
sign (ISLPED), pp. 55–60 (2010)
6. Chen, Y.C., Wang, W., Li, H., Zhang, W.: Non-volatile 3D stacking RRAM-based FPGA. In:
International Conference on Field Programmable Logic and Applications (FPL), pp. 367–372
(2012a)
7. Chen, Y.C., Wang, W., Zhang, W., Li, H.: uBRAM-based run-time reconfigurable FPGA
and corresponding reconfiguration methodology. In: International Conference on Field-
Programmable Technology (FPT), pp. 80–86 (2012b)
8. Devic, F., Torres, L., Badrignans, B.: Secure protocol implementation for remote bitstream
update preventing replay attacks on FPGA. In: IEEE International Conference on Field Pro-
grammable Logic and Applications (FPL), pp. 179–182 (2010)
9. Dimou, K., Wang, M., Yang, Y., Kazmi, M., Larmo, A., Pettersson, J., Muller, W., Timner,
Y.: Handover within 3gpp lte: design principles and performance. In: 2009 IEEE Vehicular
Technology Conference Fall (VTC), pp. 1–5 (2009)
10. Drimer, S., Kuhn, M.G.: A protocol for secure remote updates of FPGA configurations. In:
International Workshop on Applied Reconfigurable Computing, pp. 50–61 (2009)
11. Dworkin, M.: Recommendation for Block Cipher Modes of Operation. Technical report, DTIC
Document (2001)
12. Huffmire, T., Brotherton, B., Sherwood, T., Kastner, R., Levin, T., Nguyen, T., Irvine, C.:
Managing security in FPGA-based embedded systems. IEEE Des. Test Comput. 25(6), 590–
598 (2008)
13. Huffmire, T., Irvine, C., Nguyen, T., Levin, T., Kastner, R., Sherwood, T.: Handbook of FPGA
Design Security. Springer (2010)
14. ITRS: International Technology Roadmap for Semiconductors 2011 Edition (2011). http://
www.itrs.net/
15. Karam, R., Liu, R., Chen, P.Y., Yu, S., Bhunia, S.: Security primitive design with nanoscale de-
vices: a case study with resistive RAM. In: ACM Great Lakes Symposium on VLSI (GLVLSI),
pp. 299–304 (2016)
16. Knoth, S.: Supercaps Can Be a Good Choice Over Batteries for Backup Applications (2012).
http://www.eetimes.com/document.asp?doc_id=1280982
17. Kumar, S.S., Guajardo, J., Maes, R., Schrijen, G.J., Tuyls, P.: The butterfly PUF protecting IP
on every FPGA. In: IEEE International Workshop on Hardware-Oriented Security and Trust
(HOST), pp. 67–70 (2008)
11 In-place Logic Obfuscation for Emerging Nonvolatile FPGAs 293
18. Kuon, I., Tessier, R., Rose, J.: Fpga architecture: survey and challenges. Found. Trends Elec-
tron. Des. Autom. 2(2), 135–253 (2008)
19. Lattice: FPGA Design Security Issues: Using the ispXPGA Family of FPGAs to Achieve High
Design Security (2003). http://www.latticesemi.com
20. Liauw, Y., Zhang, Z., Kim, W., Gamal, A., Wong, S.: Nonvolatile 3D-FPGA with monolithi-
cally stacked RRAM-based configuration memory. In: IEEE International Solid-State Circuits
Conference Digest of Technical Papers (ISSCC), pp. 406–408 (2012)
21. Microsemi: Axcelerator Family FPGAs (2012a). http://www.actel.com
22. Microsemi: IGLOO Low Power Flash FPGAs (2012b). http://www.actel.com
23. Microsemi: Introduction to the SmartFusion2 and IGLOO2 Security Model (2013a). http://
www.microsemi.com
24. Microsemi: Overview of Data Security Using Microsemi FPGAs and SoC FPGAs (2013b).
http://www.microsemi.com
25. Minkovich, K.: MCNC benchmark (2007). http://cadlab.cs.ucla.edu/~kirill/
26. Nechvatal, J., Barker, E., Bassham, L., Burr, W., Dworkin, M.: Report on the Development of
the Advanced Encryption Standard (AES). Technical report, DTIC Document (2000)
27. Paul, S., Mukhopadhyay, S., Bhunia, S.: A circuit and architecture codesign approach for a
hybrid CMOS-STTRAM nonvolatile FPGA. IEEE Trans. Nanotechnol. (TNANO) 10(3), 385–
394 (2011)
28. Potkonjak, M., Goudar, V.: Public physical unclonable functions. Proc. IEEE 102(8), 1142–
1156 (2014)
29. Rose, G.S., Rajendran, J., McDonald, N., Karri, R., Potkonjak, M., Wysocki, B.: Hardware
security strategies exploiting nanoelectronic circuits. In: IEEE Asia and South Pacific Design
Automation Conference (ASP-DAC), pp. 368–372 (2013)
30. Suh, G., Clarke, D., Gasend, B., Van Dijk, M., Devadas, S.: Efficient memory integrity verifi-
cation and encryption for secure processors. In: Annual IEEE/ACM International Symposium
on Microarchitecture (MICRO), pp. 339–350 (2003)
31. Wang, Y., Wen, W., Li, H., Hu, M.: A novel true random number generator design leveraging
emerging memristor technology. In: ACM Great Lakes Symposium on VLSI (GLVLSI), pp.
271–276 (2015)
32. Xilinx: 7 Series FPGAs Overview (2011a). http://www.xilinx.com
33. Xilinx: Partial Reconfiguration of Xilinx FPGAs Using ISE Design Suite (2011b). http://www.
xilinx.com
34. Xilinx: Virtex-5 FPGA Configuration User Guide (2012). http://www.xilinx.com
Chapter 12
Security Standards for Embedded
Devices and Systems
12.1 Introduction
V. Kowkutla (✉)
Texas Instruments, Mail Station E4000 12500 TI Blvd, Dallas, TX 75243, USA
e-mail: [email protected]
S. Ravi
Texas Instruments India, 66/3 Bagmane Tech Park, C V Raman Nagar,
Bangalore 560093, India
e-mail: [email protected]
regulatory standards such as ISO 26262 [11] and IEC 61508 [12]. While there is no
uniform automotive security standard in place yet, security has been largely defined
by the general purpose standard Common Criteria [13], which been largely con-
cerned with the protection of assets against malicious attackers. This standard will
also be reviewed later in this chapter.
In the rest of this chapter, we describe the two layers of a hierarchical security
model that the embedded system designer typically uses to achieve his or her
application specific security needs.
• Foundation level security targeted at basic security services such as privacy,
authentication, and integrity is achieved predominantly using the powerful
mathematical functions called cryptographic algorithms. Cryptographic algo-
rithms are the subject of excellent books in data and network security [14, 15].
In Sect. 2, we attempt to give a brief insight into these building blocks and their
coverage through standards.
• Security protocols like TLS or SSL [16], IPSec [17], etc, have been the classical
overlay atop foundational cryptographic algorithms to achieve computer com-
munication and network security. In the embedded world, both general purpose
and end equipment or application level security standards are now becoming
niche overlays that encompass not just foundational cryptographic algorithms
but also a “basket of cryptographic and non-cryptographic solutions” to achieve
system security goals. One example of such a security goal can be “Zero-ize
secure memory in a payment terminal chip” upon detection of an attack. In
Sects. 3 and 4, we look at examples of security standards that are actively used
by chip and system vendors in the embedded space.
It would also be worth noting that both these layers are effectively implemented
using a combination of hardware and software techniques in embedded SoCs.
Examples of various applicable solutions themselves are not covered in this chapter,
and can be found in [7] and other chapters in this book.
FIPS-197 [19] is the specification for the Advanced Encryption Standard (AES),
which is a symmetric block cipher that can be used to encrypt and decrypt data.
The AES algorithm (Rijndael was selected as NIST AES standard in Nov. 2001)
encrypts (decrypts) data in blocks of 128 bits. Encryption (decryption) can be
performed using cryptographic keys of 128, 192, or 256 bits.
A representative block diagram of AES is shown in Fig. 12.1, where encryption
and decryption of a block of data happens through an iterative application of four
basic transformations. Each iteration, called a round, works on a block of data
generated from the previous iteration and a round key which is derived from the
AES key through a key expansion cycle. The four basic transformations in an
Key
InvSubBytes
SubBytes
Round 1
InvShiftRows
Expanded key
ENCRYPTION
ShiftRows
DECRYPTION
MixColumns
Add round key W[4-7]
InvMixColumns
Round 1
InvShiftRows
ShiftRows
Ciphertext Ciphertext
Fig. 12.1 Representative block diagram of AES Encryption and Decryption Operations
12 Security Standards for Embedded Devices and Systems 299
FIPS 46-3 [21] is the standard specification of the Data Encryption Algorithm
(DEA) or Data Encryption Standard (DES). DES/DEA has now become outdated
due to the inherent weakness of a 56-bit key to brute force attacks with evolution of
computational power and has been withdrawn as a standard in 2005 [22]. Please
note that the 56-bit key with additional 8 error detection bits form the 64-bit key
that one commonly associates with DEA.
Triple DES or Triple DEA at its simplest level involves applying the DEA
algorithm three times. When the keys of each application of DEA are independent,
the key strength is equivalent to 168 key bits. Though DES is no longer a NIST
approved standard, Triple DEA with DEA as the core engine is an approved
standard as specified by NIST Special Publication (SP) 800-67 [23].
With the origins of the AES influenced by the DEA algorithm, the structure of
the DEA algorithm is in many ways similar. The DEA algorithm is an iterative
application of mathematical transformations on an input data stream which is
divided into blocks. The block size is 64 bits and each iteration in a DEA algorithm
is called a round. There are 16 rounds and each round processes the 64-bit block in
two 32-bit halves. One half is processed by a transformation function called
F-function, while the other half is xored with the result of the F-function. The
halves are then swapped before the next round. The F-function in each round is a
sequential application of four transformations—expansion, key mixing, substitution
and permutation. Decryption is again a sequence of inverse transformations with the
same key.
FIPS 180-4 [24] specifies the approved algorithms to compute the condensed digest
or hash of a message. A cryptographic hash can conceptually be thought of as
similar to the checksum representation for a given piece of data, but it is
300 V. Kowkutla and S. Ravi
mathematically more secure. The security comes from two angles—for a given
secure hash algorithm, it is computationally infeasible to (a) find a message that
corresponds to a given message digest, or (b) to find two different messages that
produce the same message digest.
The FIPS 180-4 standard specifies SHA-1, SHA-224, SHA-256, SHA-384,
SHA-512, SHA-512/224, and SHA-512/256 as the approved secure hash algo-
rithms. For these algorithms, the message digests range in length from 160 to 512
bits. As per NIST’s 2012 policy on hash functions [25], recently discovered
weaknesses in SHA-1 have limited its usage to certain applications. The other
approved SHA algorithms today are collectively called SHA-2. There is a plan to
have the next-generation hash algorithms (KECCAK) standardized as SHA-3
algorithms in the future. Secure hash algorithms are also used in combination with
other cryptographic algorithms, such as digital signature algorithms and keyed-hash
message authentication codes, or in the generation of random numbers (bits).
In this section, we will present a brief overview of the generic security standards
which are also employed in the embedded space.
Table 12.3 Sample comparison of FIPS 140-2 security levels along various dimensions
FIPS Level Physical security Security against Operating system
140-2 of environmental
security security conditions
level exploits
Security Lowest Not required Not required Unevaluated
level 1 operating system
Security Tamper evidence Not required Evaluated at the
level 2 CC evaluation
assurance level
EAL2 (or higher)
Security Tamper Detection and Not required Evaluated at the
level 3 Response (High CC evaluation
Probability) assurance level
EAL3 (or higher)
Security Highest Tamper Detection and Required Evaluated at the
level 4 Response (Highest CC evaluation
Probability). Zeroize assurance level
security parameters EAL4 (or higher)
302 V. Kowkutla and S. Ravi
If we review any dimension in FIPS 140-2 in detail, we will see the levels of
security requirements increase from one level to another. Picking physical security
as an example, we can see that
• Physical security mechanisms are not required for Security Level 1.
• Security Level 2 enhances the physical security aspects of a Security Level 1.
Per FIPS 140-2, it includes the requirements for tamper-evidence such as, use of
tamper-evident coatings/seals or for pick-resistant locks on removable module
covers or doors to protect against unauthorized physical access.
• Security Level 3 provides tamper resistant physical security—preventing the
intruder from gaining access to critical security parameters held within the
cryptographic module. This level has a high probability of detecting and
responding to physical tampering of the cryptographic module. Physical security
mechanisms include tamper detection/response circuitry that erases all critical
security parameters when the removable covers/doors of the cryptographic
module are opened.
• Security Level 4 requires physical security mechanisms to detect and respond to
all unauthorized accesses. This level offers high probability of detecting all
physical tamper attacks resulting in the immediate eraser of critical security
parameters. This level of security is appropriate for applications deployed in
physically unprotected environments needing high security.
This is true for other security dimensions as well (in some cases, they could be
same as the previous level). Additional observations on each security level are listed
below for further illustration.
• In Security Level 1, security requirements for cryptographic modules are very
basic—e.g., use one approved algorithm or security function. This level is
appropriate for low-cost applications where the security requirements are very
low and does not need physical/network security. From an operating system
perspective, cryptographic software and firmware can be executed on a general
purpose computing system using an unevaluated operating system. An example
of a Security Level 1 is a personal computer (PC) encryption board.
• Security Level 2 requires role-based authentication to authorize an operator to
assume a specific role to perform corresponding set of tasks. Cryptographic
software and firmware can be executed on a general purpose computing system
using an operating system that has been evaluated at the Common Criteria
(CC) evaluation assurance level EAL2 (or higher). We will be reviewing CC in
the next section.
• Security Level 3 requires identity-based authentication mechanisms to authen-
ticate the identity of an operator and verifies that the identified operator is
authorized to assume a specific role to perform corresponding set tasks. Security
Level 3 requires encryption on critical secure parameters while writing into or
reading from the cryptographic modules. These writing/reading ports must be
physically separated from ports on other interfaces. The underlying operating
12 Security Standards for Embedded Devices and Systems 303
system should have been evaluated at the Common Criteria (CC) evaluation
assurance level EAL3 (or higher).
• Security Level 4 provides a complete envelope of protection around the cryp-
tographic module. This level provides protection against all types of tamper
attacks including security vulnerabilities due to environmental conditions or
fluctuations outside of the module’s normal operating ranges for voltage and
temperature.
Overall, these security levels provide a cost-effective way to rate a cryptographic
module covering a wide range of products and application environments in which it
may be deployed. Users of cryptographic modules can make their selection based
on the individual area ratings and overall rating depending on the environment in
which the cryptographic module will be implemented.
requirements which meet the needs of products. The product security specification
named as “Security Targets” in the chart is used to define the security objectives,
attacks under consideration, specification of the security functions, and the assur-
ance measures. This then forms the input to the evaluation phase where the
expected result is a confirmation that Security Targets meet the desired assurance
levels. The results from the evaluation phase help both the product developer and
end system user. Finally, when the system is in operation, it is highly possible that
new vulnerabilities may surface or the deployment environment assumptions
change. Reports are then made to the developer for changes back to the specifi-
cation and development, and the product cycle is repeated.
12 Security Standards for Embedded Devices and Systems 305
Evaluation
OperaƟon
The Payment Card Industry (PCI) council is responsible for developing and
managing the security standards for payment card transactions. The PCI security
standards define the technical and operational requirements mandated by the Pay-
ment Card Industry Security Standards Council (PCI SSC) for two primary
objectives (a) security of credit, debit, and cash card transactions and (b) protection
of cardholders against misuse of their personal information. These standards cover
end-to-end security requirements starting from the point of entry of card data into a
system, to how the data is processed, through secure payment applications. Com-
pliance with the PCI set of standards is enforced by the Council’s five founding
global payment brands—American Express, Discover Financial Services, JCB
International, MasterCard Worldwide and Visa Inc.
306 V. Kowkutla and S. Ravi
Fig. 12.3 An overview of the payment card industry security standards [26]
• Secure memory and key protection: Critical secure parameters or data must be
stored in protected area(s) of the device and must be protected from modifica-
tion. There should be no feasible way to determine any PIN-security-related
cryptographic key resident in the device or make any modifications to secure
information.
• Secure interfaces: Data entering or leaving the secure device must always be
encrypted. One strong requirement is that there should be no feasible way to
determine any entered and internally transmitted PIN digit by any means.
• Tamper Reaction/Response: The device must implement tamper response
mechanisms. Upon tamper detection, one of the key responses must be to erase
any sensitive data that may be stored in the device and immediately make the
device inoperable.
The Europay Mastercard Visa (EMV) Standard is a security standard for payment
cards and terminals with embedded smart cards (also known as chip card). Chip
cards store data on an embedded secure microchip rather than magnetic stripes. The
embedded microchip provides enhanced transaction security features and other
application capabilities not possible with traditional magnetic stripe cards. The
transaction protection benefits offered by EMV standard cover the following.
• Protection against counterfeit fraud: The standard employs stronger authenti-
cation methods and unique transaction elements such as advanced encryption to
verify that the card is genuine. This is applicable to both online and offline
transactions.
• Embedded card risk analysis capabilities: The standard defines the conditions
under which the issuer will permit the transaction to be conducted offline and the
conditions that force transactions online for authorization if offline limits have
been exceeded.
• Card/data tamper protection: Offers protection against tampering of card and
POS data during online transaction processing by attaching a dynamic cryp-
togram to each authorization and clearing transaction.
• Cardholder verification: Robust cardholder verification methods to protect
against lost and stolen card fraud
Thus, the levels of protection available against chip card account data thefts and
counterfeit fraud are significantly enhanced.
EMV Co manages, maintains and enhances the EMV Integrated Circuit Card
Specifications for chip-based payment cards and acceptance devices, including
point of sale (POS) terminals and ATMs. The EMV standard is currently owned by
American Express, JCB, MasterCard, and Visa. EMV specifications ensure inter-
operability between chip-based payment cards and terminals. Specifications
encompass both contact and contactless cards. Contact cards are cards which must
be physically inserted into a card reader for transactions; Contactless cards are cards
capable of transmitting data wirelessly over short distances using radio-frequency
communication technology. The EMV Chip Specifications are based on existing
International Organization for Standardization (ISO) standards as follows:
• ISO/IEC 7816: Identification Cards: Integrated Circuit(s) Cards
• ISO/IEC 14443: Identification Cards: Contactless Integrated Circuit(s) Cards—
Proximity Cards
SoCs designs targeting chip-based payment cards and acceptance devices must
meet EMV security standards as defined in the EMV Integrated Circuit Card
Specifications [9]. EMV also establishes and administers testing and approval
processes to evaluate compliance with the EMV Specifications.
12 Security Standards for Embedded Devices and Systems 309
EMV and PCI Standards together offer an enhanced security for payment trans-
action data in the following ways. EMV chip provides an additional level of
authentication at the point of sale that increases the security of a payment trans-
action and reduces chances of fraud. Once the card is entered into the merchant’s
system, the cardholder’s confidential information is transmitted and stored on their
network in a clear (unencrypted) exposing it for variety of frauds. This is where PCI
Standards come in. On top of EMV chip at the POS, they offer protections for the
POS device itself and provide layers of additional security controls. It covers
end-to-end security requirements starting from the point of entry of card data into a
system, to how the data is processed, through secure payment applications.
Systems or chips that need to comply with various standards need to go through a
rigorous compliance assessment with the standards consortiums or third-party labs
that are approved to evaluate them.
For example, the PCI Security Standards Council is responsible for managing
the security standards, while compliance with the PCI Security Standards is
enforced by the payment card brands. The PCI Security Standards Council operates
a number of programs to train, test, and certify organizations and individuals who
can then assess and validate adherence to PCI Security Standards. PCI Security
Standards Council maintains an up to date list of Qualified Security Assessors—
internal and external organizations who have been qualified by the Council to
assess compliance to PCI Standards.
Similarly, the EMV Standards Committee has setup a Security Evaluation
Process based on a complete set of published EMV Security Standard documents
(specifications, requirements, and security guidelines). These documents are made
available to product providers and security evaluation laboratories for the devel-
opment and security evaluation of their products. Security Evaluation Process is
intended to provide organizations with valuable and practical information relating
to the general security performance characteristics and the suitability of use for
smart card related products, Platforms and ICs.
EMVCo uses independent security evaluation laboratories to perform security
evaluations. An up to date list of these organizations is maintained on its website
[9]. The EMV security guidelines support product providers in developing and
testing their products, and test laboratories in performing security evaluations.
310 V. Kowkutla and S. Ravi
Security Evaluation Process evaluates the security features of IC, Platform, and
Integrated Chip Card products. Evaluation includes:
• Integrated circuit hardware with its dedicated software, Operating System, Real
Time Environment
• Firmware and software routines required to access the security functions of the
IC
• Payment application software running on the platform
Once the product passes the evaluation assessment, a certificate will be released
indicating the level of security compliance of the product with respect to the
Security Standards requirements.
12.6 Summary
Security standards such as FIPS and Common Criteria have so far been the bulwark
of general purpose systems—software and hardware—that need to offer security
services. In this chapter, we first looked at the need for embedded security, fun-
damental cryptographic algorithms and associated standards, as well as general
purpose security standards. We then also saw how market specific end-to-end
security concerns have led to the evolution of application specific standards.
A specific application we looked at in detail is payment transactions. We saw how
security standards such as PCI and EMV are building on the general security
principles and defining a custom set of security requirements for the payment
ecosystem. We hope the overview provided in this chapter and the references for
further reading will give a good launchpad for any embedded SoC/system devel-
oper to understand their security requirements better and define the next steps in
addressing them.
References
8. Payment Card Industry (PCI) PIN Transaction Security (PTS) Point of Interaction (POI),
Modular Security Requirements Version 4.0, June 2013. https://www.pcisecuritystandards.
org/documents/PCI_PTS_POI_SRs_v4_Final.pdf
9. EMV Requirements Security Requirements—http://www.emvco.com
10. Checkoway, S., et al.: Comprehensive experimental analyses of automotive attack surfaces.
In: Proceedings of the Usenix Security Symposium 2011
11. ISO 26262-1:2011 “Road vehicles—Functional safety”. www.iso.org
12. International Electrotechnical Commission (IEC)—Functional Safety and IEC 61508. http://
www.iec.ch/functionalsafety/
13. Common Criteria for Information Technology Security Evaluation Part 1: Introduction and
general model, Sept 2012, Version 3.1, Revision 4, CCMB-2012-09-001
14. Stallings, W.: Cryptography and Network Security: Principles and Practice, 6th edn. Pearson
(2013)
15. Schneier, B.: Applied Cryptography: Protocols, Algorithms and Source Code in C. Wiley
(2015)
16. Transport Layer Security (TLS), https://datatracker.ietf.org/wg/tls/documents/
17. IP Security Maintenance and Extensions (ipsecme). http://datatracker.ietf.org/wg/ipsecme/
documents/
18. NIST, Cryptographic Toolkit. http://csrc.nist.gov/groups/ST/toolkit/index.html
19. Federal Information Processing Standards Publication 197 Nov 26, 2001 “Specification for
the Advanced Encryption Standard (AES)”
20. Advanced Encryption Standard, https://en.wikipedia.org/wiki/Advanced_Encryption_
Standard
21. Federal Information Processing Standards Publication 46-3, 1999 Oct 25 “Data Encryption
Standard (DES)”
22. NIST Withdraws Outdated Data Encryption Standard. http://www.nist.gov/itl/fips/060205_
des.cfm
23. NIST Special Publication 800-67, Recommendation for the Triple Data Encryption Algorithm
(TDEA) Block Cipher. http://csrc.nist.gov/publications/nistpubs/800-67-Rev1/SP-800-67-
Rev1.pdf (2012)
24. Federal Information Processing Standards Publication 180-4, March 2012, Secure Hash
Standard (SHS). http://csrc.nist.gov/publications/fips/fips180-4/fips-180-4.pdf
25. NIST’S Policy on Hash Functions. http://csrc.nist.gov/groups/ST/hash/policy.html
26. PCI Quick Reference Guide. https://www.pcisecuritystandards.org/pdfs/pci_ssc_quick_guide.
pdf
27. Federal Information Processing Standards Publication 140-2, May 25, 2001 “Standard for
Security Requirements for Cryptographic Modules”
28. Safety & security architecture for automotive ICs, Yash SainiArun Jain, 25 Sept 2013
Chapter 13
SoC Security: Summary and Future
Directions
S. Bhunia (✉)
University of Florida, 216 Larsen Hall, 968 Center Dr., Gainesville, FL 32611, USA
e-mail: [email protected]
S. Ray
NXP, Austin, USA
S. Sur-Kolay
Indian Statistical Institute, Kolkata, India
new challenge. These IPs are often obtained from untrusted third-party vendors,
which creates trust concerns—in particular, the IPs may potentially come with
malicious implants or Trojans. Similarly, third-party design tools used in the SoC
integration process may be untrusted and can cause malicious design changes.
Designing trusted SoCs with untrusted components has emerged as a major chal-
lenge with modern SoC design flow. Secure SoC design methodologies need to be
accompanied with appropriate post-silicon validation approaches, which aim at
ensuring that the fabricated SoC works in a system in secure trustworthy fashion
and is impervious to known attack vectors.
Design and validation of SoCs require addressing two broad classes of security
and trust issues: (1) protection against the hardware security threats, which include
all forms of side-channel attacks (power analysis, timing, EM, and fault injection),
hardware IP piracy, hardware Trojan attacks, micro-probing attacks, and scan based
attacks; and (2) protection of multitude of security assets on chip against malicious
access by software running on the SoC. With respect to the first class of protection,
it is worth noting that existing IP level solutions do not scale well at SoC level. For
example, an IP level Trojan detection solution may not work at SoC level. It would
almost certainly miss a Trojan instance in the interconnect fabric. Similarly, anal-
ysis of and protection against side-channel attacks for crypto IPs in isolation may
not be as precise or effective when integrated into an SoC. Hence, it requires
consideration of other IP blocks and interaction among them for more accurate
security analysis and for developing countermeasures that provide right level of
protection for an SoC at optimal overhead. Furthermore, even though IP level
solutions are effective to protect individual IPs, they need to be integrated at the
SoC level to enable holistic protection of the SoC. It would require a centralized
infrastructure in SoC to control the protection mechanisms inside individual IPs.
Modern SoC designs contain a number of critical security assets, including keys,
firmware, cryptographic modules, fuses, and private user data, which are sprinkled
across multiple IPs often cross-cutting hardware and software boundaries. Conse-
quently, developing an SoC design critically requires: (1) specifying, implementing,
and validating security policies to govern the access and interaction of these assets
during field operation, (2) developing on-chip security architectures to enable
effective validation of SoC resiliency against security attacks and vulnerabilities;
and (3) comprehensive validation of these security policies both before and after
fabrication to ensure they are protected against undesired leakage or manipulation.
Several chapters of this book have been dedicated to examine the state of the
research and practice in the important areas of design and validation of SoC security
and to emphasize the cooperation and trade-offs between the two areas. We have
also presented an overview of the industrial practices in security assurance and
validation of modern SoC designs. The goal has been to provide an understanding
of the current state of the practice, and to describe the different pieces of a highly
complex ecosystem that must interact and cooperate to ensure security and trust-
worthiness of our computing devices.
13 SoC Security: Summary and Future Directions 315
Recent years have seen an explosion of deeply embedded, smart, and highly
connected computing devices in diverse form factors. In particular, wearable and
implant technologies, cyber physical systems (CPS), and Internet of Things
(IoT) have made significant forays into nearly all aspects of our life. With advances
in technology, design of new and advanced sensors, pervasive connectivity, and the
trend in business towards cloud-driven data-centric solutions, the future is projected
to see an even higher proliferation of systems comprising of such devices that
coordinate through cloud to solve complex, distributed tasks. Commensurate with
computing capability, the applications have also scaled in complexity by several
factors, e.g., from smart phones to smart cities. The evolving paradigm of com-
puting systems and their wide-spread deployment in diverse fields would impose
increasingly strong demands on security and trust of SoCs used in these systems.
The problem is accentuated by the features of smartness and ubiquitous connec-
tivity of these systems, which give rise to new opportunities for attacks. Hence, SoC
security designers and validators would face ever-growing challenges in meeting
the security demands of future systems. It would require major research activities
and innovations in architecture, design methodologies, security analysis, and pre-/
post-silicon validation. One related future research direction that we believe will
gain strong momentum and would complement design and validation of
high-assurance SoC will be development of software, which can ensure trusted
system operation in the presence of untrusted hardware—in particular, untrusted
SoC.
Given the broad spectrum of potential vulnerabilities and corresponding miti-
gation strategies, the subject of SoC security today is highly fragmented. Different
research groups focus on different aspects of the problem, often without a good
understanding of the trade-offs and synergies involved in applying the different
approaches on the same artifact. For example, there has been little work on inte-
grating techniques for supply-chain security with architectural initiatives for
design-level security implementation. Consequently, security research in different
communities run the danger of reinventing the “wheel” already employed in
another context, or creating a solution that conflicts with the fundamental
requirements of another one. Hence, effective collaboration between researchers
working in various fields of SoC design and validation is urgently needed.
Although we have covered a broad spectrum of activities on SoC security in this
book and tried to provide a comprehensive overview of SoC security and trust
challenges and solutions, we have only depicted a small but important part of the
SoC design and validation process. There are more complexities involved in the
overall process, including trade-offs with power management, physical design,
testing, as well as complex supply-chain issues, which we have touched periph-
erally. The readers interested in deeper exploration are encouraged to explore into
some of the references, which include challenges and surveys of specific compo-
nents, and use the discussions in this book as a glue for connecting the different
316 S. Bhunia et al.
pieces. The editors strongly hope that the book will provide adequate background
and stimulate interest of the readership in exploring more relevant knowledge in
this field and in pursuing research towards innovations of critical needs.