Security Evaluation at Design Time For Cryptographic Hardware
Security Evaluation at Design Time For Cryptographic Hardware
UCAM-CL-TR-665
ISSN 1476-2986
Number 665
Computer Laboratory
April 2006
15 JJ Thomson Avenue
Cambridge CB3 0FD
United Kingdom
phone +44 1223 763500
http://www.cl.cam.ac.uk/
c 2006 Huiyun Li
This technical report is based on a dissertation submitted
December 2005 by the author for the degree of Doctor of
Philosophy to the University of Cambridge, Trinity Hall.
Technical reports published by the University of Cambridge
Computer Laboratory are freely available via the Internet:
http://www.cl.cam.ac.uk/TechReports/
ISSN 1476-2986
Abstract
Consumer security devices are becoming ubiquitous, from pay-TV through mobile
phones,
PDA, prepayment gas meters to smart cards. There are many ongoing research efforts
to keep
these devices secure from opponents who try to retrieve key information by
observation or
manipulation of the chip’s components. In common industrial practise, it is after
the chip
has been manufactured that security evaluation is performed. Due to design time
oversights,
however, weaknesses are often revealed in fabricated chips. Furthermore, post
manufacture
security evaluation is time consuming, error prone and very expensive. This evokes
the need
of design time security evaluation techniques in order to identify avoidable
mistakes in design.
This thesis proposes a set of design time security evaluation methodologies
covering the
well-known non-invasive side-channel analysis attacks, such as power analysis and
electromagnetic analysis attacks. The thesis also covers the recently published
semi-invasive optical
fault injection attacks. These security evaluation technologies examine the system
under test
by reproducing attacks through simulation and observing its subsequent response.
The proposed design time security evaluation methodologies can be easily
implemented
into the standard integrated circuit design flow, requiring only commonly used EDA
tools.
So it adds little non-recurrent engineering (NRE) cost to the chip design but helps
identify
the security weaknesses at an early stage, avoids costly silicon re-spins, and
helps succeed in
industrial evaluation for faster time-to-market.
4
Acknowledgements
This project would not have been possible without the support of many people. I
would like to
thank my supervisor Dr. Simon Moore for his valuable help and encouragement
throughout
my research. A. Theodore Markettos and Jacques Fournier provided considerable help
with
experiments. This work could not have been completed without EDA tools for which I
have
to thank Robert Mullins. Thanks are also due to Scott Fairbanks for helpful
discussions on
HSPICE coding, and Sergei Skorobogatov for discussions on optical fault injection.
I would
like to give thanks to my husband who endured this long process with me, always
offering
support and love. Engineering and Physical Sciences Research Council (EPSRC) funded
this
research project.
5
6
Contents
1 Introduction
9
1.1
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
1.2
Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 10
2 Background
2.1 Overview of Smart Card Technologies . . . . . . . . . . . . . . . . . . . . .
2.1.1 Types of Smart Card Interface . . . . . . . . . . . . . . . . . . . . .
2.1.2 Smart Card Architecture . . . . . . . . . . . . . . . . . . . . . . . .
2.1.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Smart Card Security Mechanisms . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.2 Confidentiality, Integrity and Non-repudiation . . . . . . . . . . . . .
2.2.3 “Security through Obscurity” vs. “Kerckhoffs’ Principle” . . . . . .
2.3 Smart Card Attack Technologies . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Non-invasive Attacks . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.2 Invasive Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.3 Semi-invasive Attacks . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Defence
Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.1 Countermeasures to non-invasive attacks . . . . . . . . . . . . . . .
2.4.2 Countermeasures to semi-invasive and invasive attacks . . . . . . . .
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
13
13
15
16
18
19
22
22
23
23
26
26
27
27
27
28
29
29
30
34
34
36
37
39
39
39
40
43
43
7
4.2
45
45
45
47
47
48
50
52
53
54
54
55
55
56
57
58
61
61
63
63
64
4.3
4.4
73
8
Chapter 1
Introduction
1.1
Motivation
Cryptographic devices, such as secure microcontrollers and smart cards, are widely
used in
security applications across a wide range of businesses. These devices generally
have an embedded cryptographic processor running cryptographic algorithms such as
triple DES, AES
or RSA, together with a non-volatile memory to store the secure key. Although the
algorithms
are provably secure, the system can be broken if the keys can be extracted from
smart cards or
terminals by side-channel analysis attacks, such as timing analysis [35], power
consumption
analysis [37], or electromagnetic radiation analysis [54] attacks. Timing and power
analysis
have been used for years to monitor the processes taking place inside
microcontrollers and
smart cards. It is often possible to figure out what instruction is currently being
executed and
what number of bits set/reset in an arithmetic operation, as well as the states of
carry, zero
and negative flags. However, as chips become more and more complex with
instruction/data
caches and pipelining mechanisms inside their CPUs, it becomes more and more
difficult
to observe their operation through direct power analysis. A statistical technique
has more
recently been used to correlate the data being manipulated and the power being
consumed.
This technique works effectively, and is easily extended from the power side-
channel to the
electromagnetic side-channel.
With the advancing attack techniques, it is no longer sufficient for the
cryptographic processors to withstand the above passive attacks, they should also
endure attacks that inject
faults into the devices and thus cause exploitable abnormal behaviour. The abnormal
behaviour may be a data error setting part of the key to a known value, or a missed
conditional
jump reducing the number of rounds in a block cipher. Optical fault injection [58]
appears to
be a powerful and dangerous attack. It involves illumination of a single transistor
or a group
of adjacent transistors, and causes them to conduct transiently, thereby
introducing a transient
logic error.
Many designs are contrived to keep cryptographic devices secure against these
attacks. To
evaluate these designs, it is common industrial practise to test the design post
manufacture.
This post-manufacture analysis is time consuming, error prone and very expensive.
This has
driven my study of design-time security evaluation which aims to examine data-
dependent
characteristics of secure processors, so as to assess their security level against
side-channel
analysis attacks. Also this design-time security evaluation should cover optical
fault injection
attacks which have recently aroused interest in the security community.
9
This design-time security evaluation should be easily employed in the framework of
an
integrated circuit (IC) design flow. It should be systematic and exhaustive and
should be
performed in a relatively short time while providing relatively accurate and
practical results
(compared to commercial post-manufacture test).
1.2
Approaches
1.3
11
12
Chapter 2
Background
2.1
Smart cards were first introduced in Europe in 1976 in the form of memory cards,
used to
store payment information for the purpose of reducing thefts from pay phones. Since
then
smart cards have been evolving into a much more advanced form to have both
microprocessor
and memory in a single chip. They are now widely used for secure processing and
storage,
especially for security applications that use cryptographic algorithms.
The Joint Technical Committee 1 (JTC1) of the International Standards Organisation
(ISO) and the International Electrotechnical Commission (IEC) defined an industry
standard
for smart card technology in 1987. This series of international standards ISO/IEC
7816 [6],
started in 1987 with its latest update in 2003, defines various aspects of a smart
card, including physical characteristics, physical contacts, electronic signals and
transmission protocols, commands, security architecture, application identifiers,
and common data elements [2].
ISO/IEC 7816 describes a smart card as an Integrated Circuit Card (IC card) which
encompasses all those devices where an integrated circuit is contained within an
ISO ID1 identification card piece of plastic [6]. The standard card is 85.6mm ×
53.98mm × 0.76mm, the same
size as a credit card. When used as a Subscriber Identity Module (SIM) card, the
plastic card
is small, just big enough to fit inside a cellphone.
Contact
Chip
Card Body
(Front)
Card Body
(Back)
Chip
Antenna
14
contact and contactless interfaces, the card is referred to as a combi card.
I/O
CPU
coprocessor
RAM
busses
EEPROM
ROM
charge pump
bonding pads
15
2.1.3 Applications
Smart cards are entering a dramatically growing number of service applications to
take the
place of money, tickets, documents and files. Credit cards, cash-less pay phones,
road toll
systems, logical access control devices, health care files and pay TV are just a
few of the
current examples. Some of the applications will be discussed as follows [5].
1. Transportation
With billions of transport transactions occurring each day, smart cards have found
a
place in this rapidly growing market. For example, using contactless smart cards
allows
a passenger to ride several buses and trains during his daily commute to work while
not
having to worry about complex fare structures or carrying change. In Singapore and
London, for example, buses and underground railways use contactless smart cards to
collect fares. Each time passengers enter a bus or underground, they pass their
card in
front of a reader which deducts the fare from the credit stored on the card.
2. Communication
Prepaid Telephone Cards Although various forms of magnetic and optical card have
been used for public telephone services, most telephone operators choose smart
cards as the most effective card form due to their small overhead. Currently about
80 countries throughout the world use smart cards in public telephone services.
Securing Mobile Phones The Global System for Mobile communications (GSM) is a
digital cellular communication system widely used in over 90 countries worldwide. A
GSM phone uses a SIM card which stores all the personal information
of the subscriber. Calls to the subscriber mobile number will be directed
accordingly and bills will be charged to the subscriber’s personal account. Secure
data
concerning the GSM subscription is held in the smart card, not in the telephone.
A secret code, known as a PIN (Personal Identification Number), is also available
to protect the subscriber from misuse and fraud.
3. Electric Utilities
Electric utility companies in the United Kingdom, France and other countries are
using
smart cards to replace meter reading for prepayment. Customers purchase electricity
at
authorised payment centers and are issued with a smart card. Customers can also use
the card to access information about their account such as amount remaining, amount
consumed yesterday or last month, and the amount of remaining credit. An emergency
threshold is built in to allow customers to use electricity and pay at a later
time. Once
the emergency threshold is consumed, electricity is shut off.
4. Computer Security
Boot Integrity Token System (BITS) The boot integrity token system (BITS) was
developed to protect computer systems from a large number of viruses that affect
the booting system, and enforce control of access [18]. BITS is designed so that
the computer boots from a boot sector stored on the smart card, bypassing the
16
boot sector on the computer which can easily be infected by a virus. The card can
also be configured to allow access to the computer only by authorised users.
Authentication in Kerberos In an open distributed
computing
environment (DCE), a workstation cannot be trusted to identify its users because
the workstation may not be located in a well controlled environment and may be
far away from the central server. A user can be an intruder who may try to attack
the system or pretend to be someone else to extract information from the system
which he/she is not entitled to.
Kerberos [60] is one of the systems which provides trusted third-party
authentication services to authenticate users on a distributed network environment.
Basically, when a client requests an access to a particular service from the
server, the
client has to obtain a ticket or credential from the Kerberos authentication server
(AS). The client then presents that credential to the ticket granting server (TGS)
and obtains a service ticket. Hence, the user can request the service by submitting
the service ticket to the desired server.
Using this protocol, the server can be assured that it is offering services to the
client authorised to access them. This is because Kerberos assumes that only the
correct user can use the credential as others do not have the password to decrypt
it. However, a user can actually request the credential of others, because the user
is not authenticated initially.
In this way, an attacker can obtain the credential of another user, and perform
an off-line attack using a password guessing approach as the ticket is sealed by
password only. This security weakness of Kerberos is identified in [26] and some
implementations integrate a smart card into the Kerberos system to overcome this
problem. The security of Kerberos is enhanced by authenticating the user via a
smart card before granting the initial ticket, so that one user cannot have the
ticket
of another [26].
5. Medical / Health
Smart cards can also carry medical information such as details of medical insurance
coverage, drug sensitivities, medical records, name and phone number of doctors,
and
other information vital in an emergency.
In the United States, Oklahoma City has a smart card system called MediCard,
available since 1994. This smart card is able to selectively control access to a
patient’s
medical history, which is recorded on his/her MediCard. However, essential
information, including family physician and close relative to contact, is available
to emergency
personnel in extreme circumstances. Smart card readers are installed at hospitals,
pharmacies, ambulance services, physician’s offices and even with the fire
department, allowing the MediCard to be used in both ordinary and emergency
circumstances [5].
Germany has issued cards to all its citizens that carry their basic health
insurance information. In France and Japan, kidney patients have access to cards
that contain their
dialysis records and treatment prescriptions. These cards are designed with
security
features to control access to the information for authorised doctors and personnel
only.
6. Personal Identification
Several countries including Spain and South Korea have begun trials with smart
cards
17
that provide identification (ID) for their citizens. An ID document in the form of
a
smart card can hold digitised versions of the holder’s signature, photograph and
probably his/her biometric information. In an ID system that combines smart card
and
biometric technologies, a "live" biometric image (e.g., scan of a fingerprint or
iris) is
captured at the point of interaction and compared to a stored biometric image that
was
captured when the individual enrolled in the ID system. Smart cards provide the
secure,
convenient and cost-effective ID technology that stores the enrolled biometric
template
and compares it to the "live" biometric template. This kind of personal ID system
is
designed to solve the fundamental problem of verifying that individuals are who
they
claim to be [9].
7. Payment Card
The payment card has been in existence for many years. It started in the form of a
card
embossed with details of the card-holder, such as account number, name, expiration
date, which could be used at a point of sale to purchase goods or services. The
magnetic
stripe was soon introduced to cut the cost and errors involved in keying in
vouchers for
embossed cards. The magnetic stripe also allowed card-holder details to be read
electronically in a suitable terminal and allowed automated authorisation. As the
criminal
fraternity found ways of producing sufficiently good counterfeit cards, magnetic
stripe
cards have now been developed to the point where there is little or no further
scope
for introducing more anti-crime measures. An improvement over traditional magnetic
strips is Watermark Magnetics technology [39] where a unique watermark pattern is
encoded for each card. Watermark encoding relies on the changes in particle
orientation. It differs from traditional magnetic stripe encoding which relies on
polarity
reversals. Together with an active reading technology, the watermark pattern
encoded
into each card is secure against fraudulent attempts at duplication. However,
although
possessing the merits of low cost and high security, Watermark Magnetics does not
have the memory capacity of the widely publicised smart cards. This has caused the
card association of Europay, MasterCard and Visa (EMV) to announce an extensive
commitment to include a microprocessing chip on all credit and debit cards
distributed
worldwide [23].
From the anti-crime perspective, there are a number of benefits in adopting the
smart
card. The card itself (or in conjunction with the terminal) can make decisions
about
whether or not a transaction can take place. Secret values can be stored on the
card
which are not accessible to the outside world allowing for example, the card to
check
the cardholder’s PIN without having to go online to the card issuer’s host system.
Also
there is the possibility of modifying the way the card works while it is inserted
in a
point of sale terminal even to the point of blocking the card from further
transactions if
it has been reported lost or stolen.
2.2
In the previous section, I kept saying smart cards provide security in various
kinds of applications. But what is the actual meaning of “security” in the aspect
of information technology?
Generally speaking, there are four primary properties or requirements that security
addresses
here:
18
◦ Confidentiality is the assurance that information is not disclosed to
unauthorised individuals or processes.
◦ Integrity is ensuring that information retains its original level of accuracy
◦ Authentication is the process of recognising/verifying valid users or processes
and
what system resources a user or process is allowed to access
◦ Non-repudiation provides assurance to senders and receivers that a message cannot
subsequently be denied by the sender
To fulfil these four basic requirements of security, various security mechanisms
are available to the designers of cryptographic devices. The most important
mechanisms are based on
the use of cryptographic algorithms, which encrypt/decrypt sensitive information
using secret
keys.
Smart card security mechanisms are based on the use of cryptographic algorithms.
Let
us consider an application environment to illustrate a typical security mechanism
of smart
cards. In this environment as shown in Figure 2.4, we have a personal computer with
an
attached smart card reader (the terminal). The terminal provides the remote
interface to allow
the smart card to communicate with the authentication center (e.g., via the
Internet).
2.2.1 Authentication
Consider the environment illustrated in Figure 2.4. There are actually four
entities involved
in the act of authentication:
• the card-holder
• the smart card
• the terminal system
• the remote authentication center
Card-holder Authentication
To authenticate the identities involved requires two separate actions [33]. First,
the cardholder must authenticate himself to the smart card. This step prevents
fraudulence by some
person other than the real card-holder. Normally, the mechanism used to
authenticate identity
is the proof of knowledge of a secret shared between the card and the holder. In
this case, the
card-holder is usually asked to enter a PIN, typically a four- to eight-digit
number that can be
entered through a PIN pad or a terminal keyboard. The PIN is passed over to the
card, which
verifies that it matches a stored PIN value on the card (e.g. on the EEPROM). It
should be
noted that the card-holder must trust the host computer when entering the PIN. If
the terminal
is not trustworthy, then the PIN could be compromised, and an impostor could use
the PIN
to authenticate himself to the card and use the card on behalf of someone other
than the true
card-holder.
19
Authentication Center
Terminal System
Card Reader
Personal Computer
Smart Card
Card-holder
20
Authentication Between the Card and the Authentication Center
The next process is the mutual authentication of the card with the authentication
center (AuC),
or in some cases, only the authentication of the card to the AuC. The
authentication between
the AuC and the card is also based on proving knowledge of a shared secret.
However, the
secret should not appear on the communication channel linking the card and the
terminal. For
example, let us consider a naïve protocol illustrated in Figure 2.5. The left part
is the operations performed by AuC via the terminal in the middle whilst the right
part is the operations
performed by the card in response to commands issued from the terminal.
Authentication
Center
Terminal
Smart
Card
Generate
a nonce: N
Encryption:
M = F (N, Akey)
Encryption:
M’ = F (N, Akey)
M’ = M ?
No
Card
is fake
Yes
Card is
authenticated
Figure 2.5: The process of the card authenticating itself to the AuC
First, the AuC generates a “number used once”, or nonce, N. Then the AuC issues a
command via the terminal for the card to authenticate, along with the nonce N. The
card
encrypts N using the secret key Akey , generating M by computing M = F(N, Akey ). M
is
returned to the AuC which compares the result with its own computation M ′ = F(N,
Akey )
where Akey is its copy of the key. If M ′ = M, it means the card knows the true key
Akey , i.e.,
the card is authenticated. If M ′ 6= M, then the card is fake and will be rejected.
In some
protocols, the AuC authenticates itself to the card in a similar manner.
This “challenge-response” authentication method prevents attackers from
intercepting the
conversation and getting the key Akey , since the key Akey never passes through the
communication channel and the challenge is unique for each transaction. The scheme
presented,
however, must be refined to prevent “man-in-the-middle”, replay or other attacks
[11].
21
It should be noted that the illustrated authentication process makes use of very
important
characteristics of smart cards [33]. First, all of the shared secret Akey is stored
on the smart
card in a secure manner. Even when an impostor gains control of a card (e.g., via a
rogue
terminal), he cannot easily extract this secret information from the card. In fact,
it takes great
deal of effort to extract information from the card. Attacks on smart cards will be
examined
later in detail. The second useful characteristic of smart cards is their
capability to perform
complex cryptographic algorithms, as the cards contain microprocessors and are in
essence
computer platforms.
22
publicly-known algorithms, although ciphers used to protect classified government
or military information are still often kept secret.
Another advantage of keeping the key rather than the algorithm secret is that the
disclosure of the cryptographic algorithm would lead to major logistic headaches in
developing,
testing and distributing implementations of a new algorithm. Whereas if the secrecy
of the
algorithm were not important, but only that of the keys used with the algorithm,
then disclosure of the keys would require a much less arduous process to generate
and distribute new
keys. Or in other words, the fewer the things one needs to keep secret in order to
ensure the
security of the system, the easier it is to maintain that security.
2.3
According to the above description, the security of a smart card system must not
depend on
keeping the cryptographic algorithm secret, but on keeping the key secret. Attack
approaches
thus mainly focus on how to retrieve secret keys. Depending on the extend of
physical intrusion, and thus on the amount of evidence left on the target device,
attacks can be categorised
into three types: non-invasive, invasive or semi-invasive attacks.
23
On the other hand, the absolute Hamming weight information may leak through the
data
bus. For example, when a precharged bus is used in a design where the data bus is
usually
precharged to “1”. The number of “0”s driven on to the precharged bus determines
the amount
of current discharged from a capacitive load Cload .
Power analysis attacks exploit these two data-dependent information leakages in an
attempt to extract secret keys. Power analysis can be performed in two ways: Simple
Power
Analysis (SPA) and Differential Power Analysis (DPA). The former uses pattern
matching to
identify relevant power fluctuations, while the latter uses statistical analysis to
extract information correlated to secret keys [37]. For example, Figure 2.6
demonstrates the first round of
Data Encryption Standard (DES) cryptographic algorithm. The 64-bit input block is
divided
into left and right halves L0 and R0 , which are swapped. The left 32-bit half is
expanded
into 48 bits and then XORed with the 48-bit secret key of the first round (K1).
Take K1 as
8 6-bit subkeys: K1 = [K11 ...K18 ]. Then each subkey is XORed with 1/8 of the
expanded
L0 . For example, K11 (6 bits) is XORed with the first 6 bits of the expanded L0 ,
resulting the
6-bit S1_input going to the substitution box S1. DPA begins by running the DES
algorithm
N times for N random values of plaintext input. For each run, the power consumption
trace is
collected. Then the attacker hypothesises all 26 possible values of the subkey
K11 . For each
guessed subkey, the attacker calculates the corresponding intermediate output
S1_out put (4
bits). Then he divides the power traces into two groups according to one bit (e.g.,
the least
significant bit) of S1_out put. The attacker averages each partition to remove
noise, and finally computes a differential trace (the difference between the
averages of the two partitions).
If the subkey hypothesis is false, then the two partitions are randomly grouped,
and the differential trace should be random; If the subkey hypothesis is true,
noticeable peaks will occur
in the differential trace, indicating points where the subkey was manipulated.
Timing Attack
Smart cards take slightly different amounts of time to perform different operations
[36]. Attackers can then garner the leaked information to obtain the secret keys
just as they do through
power analysis attacks. For example, the cryptographic algorithms based on modular
exponentiation, such as Diffie-Hellman and RSA, consist of computing R = yx mod n.
The
goal is to find the w-bit long secret key x. If a particular bit of xk is 1, then
Rk is computed as
Rk = (xk · y) mod n; if this bit xk is 0, then Rk is computed as Rk = xk . The slow
operation
Rk = (xk · y) mod n takes a long time to process, thus leaking the information
about xk .
Masking timing characteristics was suggested as a countermeasure [36]. It could be
done
either by making all operations take exactly same amount of time or by adding
random delay.
However, these are difficult for the following reasons:
• Fixed time implementations are slow since the whole system speed will have to
depend
on the slowest operation.
• Making software run in fixed time is hard, because complier optimisations and
other
factors can introduce unexpected timing variations. If a timer is used to delay
returning
results until a pre-specified time, power consumption may in turn change
detectably.
• Random delays can be filtered out by collecting more measurements. The number of
samples required increases roughly with the square of the timing noise. For
instance, if
a modular exponentiator whose timing characteristics have a standard deviation of
10
24
L0 (32 bits)
R0 (32 bits)
expansion
XOR
K1
6 bits x 8
S1_input
...
S1
S8
S1_output
4 bits x 8
Defence Technologies
This section discusses defence technologies that can be used to improve smart card
security,
and how they can be evaluated through simulation.
All of the above defences can be used in isolation or combination in system design
and
their effect can be evaluated by the simulation methodologies proposed in Chapters
3 and 4.
2.5
Summary
This chapter reviews the smart card technologies, including the structure and the
applications
of smart cards. The security mechanisms are also discussed, such as authentication,
confidentiality, integrity and non-repudiation. Existing attack technologies are
surveyed and classified
into non-invasive, invasive and semi-invasive attacks, depending on the physical
destruction
level and temper evident level of the card. Power analysis attacks and
electromagnetic analysis attacks in the class of non-invasive attacks, and optical
fault induction attacks in the class
of semi-invasive attacks are introduced in detail as they are the subject of
Chapter 3, 4 and 5
respectively.
28
Chapter 3
Simulating Power Analysis Attacks
As introduced in Section 2.3.1, CMOS circuits consuming data-dependent power during
an
operation may leak information in the form of Hamming weight or transition count.
Someone analysing this data-dependent power carefully could deduce sensitive
information that a
cryptographic device such as a smart card strives to protect. There are two kinds
of power
analysis attack: Simple Power Analysis (SPA) and Differential Power Analysis (DPA).
The
former primarily uses pattern matching to identify relevant power fluctuations. It
helps attackers to observe macro properties of an algorithm, but it is still very
difficult to pinpoint
individual instructions let alone individual bits of data. DPA, on the other hand,
uses statistical techniques to detect variations in power consumption so small that
individual key bits can
be identified. Compared to SPA, DPA is more dangerous as it does not require the
attacker to
know implementation details of the target code.
To keep cryptographic devices secure against power analysis attacks, a huge amount
of
research has been undertaken to hide or avoid the correlation between the data
being manipulated and power being consumed. However, in common industrial practice,
design evaluation of secure devices could only be performed after chips are
manufactured. This postmanufacture analysis is time consuming, error prone and very
expensive. This has driven the
study of design-time security evaluation against DPA which aims to examine data-
dependent
power characteristics of secure processors.
3.1
Commercial power estimation tools are already widely used in integrated circuit
(IC) design
to provide power consumption details needed to meet power budgets and
specifications, to
select the proper packaging, to determine cooling requirements and estimate battery
life for
portable applications. For example, Synopsys® delivers a complete solution to
verify power
consumption at different levels of the design process. These products include:
PrimePower,
PowerMill®/NanoSim® and RailMill®.
Synopsys PrimePower is a dynamic, full-chip power analysis tool for complex
multimilliongate ASICs (Application-Specific ICs). PrimePower builds a detailed
power profile of the
design based on the circuit connectivity, the switching activity, the net
capacitance and the
cell-level power behaviour data in the Synopsys .db library. It then calculates the
power behaviour for a circuit at the cell level and reports the power consumption
at the chip, block,
and cell levels [3].
29
Synopsys NanoSim is a transistor-level circuit simulation and analysis tool, with
simulation speeds orders of magnitude higher than SPICE, NanoSim has the capacity
for multimillion transistor designs, and SPICE-like accuracy for designs at 0.13
micron and below.
NanoSim uses intelligent partitioning techniques along with a combination of event-
based
and time-based simulation. A typical SPICE engine treats the entire design as one
monolithic
block and evaluates all nodes at each time step. NanoSim, on the other hand, uses a
“divide and conquer” approach where the design is automatically partitioned into
smaller stages
based on the channel connectivity, so that any given stage or partition is
evaluated only when
an input controlling node is triggered.
There are power analysis tools from other EDA (Electronic Design Automation) tool
vendors. The list below is not an exhaustive inventory, but may provide an overview
for those
interested in DPA simulation.
• Synopsys power solution (www.synopsys.com)
– RTL-level: Power Compiler, mainly for power optimisation
– Gate-level: PrimePower, mainly for power analysis
– Transistor-level: PowerMill/Nanosim, mainly for power analysis
• Apache power solution (www.apache-da.com)
– From design to verification: RedHawk-SDL, a full-chip physical power analysis
tool
• Sequence power solution (www.sequencedesign.com)
– Architectural/RTL/Gate-level: PowerTheater, a comprehensive set of power analysis
tools
testbench
Logic
synthesis
Functional/
gate-level power
simulation
gate-level
netlist
Floorplanning
P&R
extraction
parasitics
DPA
simulation
technology
models
transistorlevel netlist
Functional/
transistor-level
power simulation
DPA
simulation
31
plaintext inputs are encrypted with a key. With a guessed subkey, the power traces
are partitioned into two groups and averaged. Only repeatedly executing “1” (or
“0”) in some fixed
points in time during the computation causes power difference not to be smoothed
out. Therefore, the two averaged power traces of each partition ultimately reveal a
data dependency of
the processor operations. With two runs with different operands, this simulation
methodology will be able to examine data-dependent power characteristics of secure
processor designs,
which are the fundamental weakness a real DPA attack exploits.
technology
models
DPA simulation
re-synchronizing
data files
netlist
re-sampling
(according to
measuring set up)
Function /
power
simulation
low-pass filtering
(due to on-chip
power grid effect
and measuring set
up)
Idd(t)
parasitics
DPA
traces
Once the two sets of current data Idd(t) are collected, they are passed to MATLAB™
programs to implement the DPA simulation, as illustrated in Figure 3.2. The DPA
simulation
is mainly processing of Idd(t) data, involving:
• re-synchronising two sets of data for ‘differential’ analysis
• re-sampling the data according to the measurement setup. This step is optional.
If the
simulation time step is unnecessarily small (for example 1ps compared to nanosecond
scale of normal measurement sampling frequency), then the data can be decimated for
faster simulation speed.
• low-pass filtering the data, considering the load resistance of the measurement
instrument and on-chip parasitic capacitance, inductance etc. More detail is
presented in the
next subsection.
Finally, DPA is performed by subtracting one power trace from another. Security
weakness will be manifested as pulses in the DPA trace, revealing data-dependent
power characteristics of the design under test.
LC Resonance Effect
In Figure 3.2, the current data Idd(t) obtained from power estimation tools
involves only
the core circuitry, which is shown in the dashed box in Figure 3.3. It considers
neither onchip parasitics, such as power grid capacitance (C powergrid ) and on-
chip decoupling capacitance (Cdecoupling ), nor the package inductance (L
package ). In measurement, they all count and
should be considered in the simulation methodology.
32
VDD pin
DC
Logic circuitry
...
...
ilogic
VSS pin
Lpackage
VDD pin
Logic circuitry
VSS pin
+
vscope
Cdecoupling
...
...
Cpower grid
imeasured
R1
C
Figure 3.4: Measuring power consumption of a chip with on-chip parasitics and
package inductance
Transforming the circuit into a Norton equivalent structure and replacing the
current
source with ilogic obtained from logic circuitry power simulation (such as shown in
Figure 3.3), we get Figure 3.5 where the on-chip capacitance Conchip = C powergrid
+ Cdecoupling ,
and C powergrid is derived from [30] as the lumped capacitor between the power and
ground
network.
Lpackage
+
ilogic
imeasured
R1
Conchip
vscope
_
Figure 3.5: RLC low-pass filter for input current ilogic obtained from logic
circuitry power simulation
The RLC circuit in Figure 3.5 forms a low-pass filter for input
√ current ilogic , with the 3dB
1
cutoff frequency of output current imeasured at fcutoff = 1/2π LC.
1 the frequency at which the output current is 70.7% of the input current.
33
Take the Springbank test chip as an example. This chip was fabricated in the UMC
0.18
µm 6-layer metal process as part of the G3Card project [19, 24]. The chip is
packaged in
PGA120 (Pin Grid Array 120 pin) and mounted in a ZIF (Zero-insertion Force) socket
on the
evaluation board. The package inductance (L package ), here including bond wire
inductance,
trace inductance, pin inductance and socket inductance, is about 10nH. Power-grid
capacitance and on-chip capacitance is about 400pF. The 3dB cutoff frequency
fcutoff is calculated
to be 79.6MHz, and this is used for the simulation later.
3.2
Results
DPA simulation has been carried out on the Springbank test chip. Figure 3.6 shows a
picture
of the test chip which contains five 16-bit microprocessors with different design
styles. This
experiment addresses the dual-rail asynchronous processor (DR-XAP) only (in the
middle of
the chip).
Figure 3.6: Springbank test chip showing the microprocessor (DR-XAP) in the middle
is under DPA
test
I target simple instructions (e.g. XOR (exclusive OR), shift, load, store etc)
which can
give a good indication of how the hardware reacts to operations of cryptographic
algorithms.
A short instruction program runs twice with operands of different Hamming weight.
The first
run computes #H’11 XOR #H’22, while the second computes #H’33 XOR #H’55. Figure 3.7
shows a fragment of the instruction program.
ld
ld
st
ld
x, #H’FFF0
al, #H’0011
al, @(0,x)
al, #H’0022
loop:
nop
nop
nop
nop
nop
xor
nop
nop
nop
nop
nop
ld
st
ld
nop
nop
bra
; initialise stack
; load the 2 operands for first run
; 5 ‘no-operation’ constructions to
; ease synchronisation in measurement
al, @(0,x)
; construction to be analysed
; On first run: #H’11 xor #H’22
; On second run: #H’33 xor #H’55
al, #H’0033
al, @(0,x)
al, #H’0055
loop
Figure 3.7: Fragment of the instruction program used for the DPA evaluation
DRXAP Current Comparison over XOR (H#11 xor H#22 vs. H#33 xor H#55)
0.08
0.07
0.06
0.05
Current (A)
0.04
nop
nop
xor
nop
nop
nop
nop
0.03
0.02
0.01
−0.01
−0.02
2.8665
2.867
2.8675
2.868
2.8685
Time (s)
2.869
2.8695
2.87
−4
x 10
Then I perform a second order low-pass filtering on the original power curves, as
described in the previous section. Figure 3.9 demonstrates the filtered power
traces and their
differential trace.
0.04
DRXAP Low−pass Filtered Current Comparison over XOR (H#11 xor H#22 vs. H#33 xor
H#55)
current 1, (H#11 xor H#22)
current 2, (H#33 xor H#55)
difference
0.035
0.03
Current (A)
0.025
0.02
nop
nop
nop
nop
nop
xor
nop
nop
nop
nop
0.015
0.01
0.005
−0.005
2.8665
28 ns
2.867
2.8675
5.5% of current peak
2.868
2.8685
Time (s)
2.869
2.8695
2.87
−4
x 10
Figure 3.9: Power Simulation: DR-XAP executing XOR, low pass filter applied
It takes about 3 minutes to run the power simulation with Synopsys PrimePower over
the
10,000 gates of the processor DR_XAP. The data processing with MATLAB takes about 2
minutes. All the simulation work is done on a 1.6 GHz AMD Athlon processor with 2
GB
memory.
36
two power traces
synchronising signal
Figure 3.10: Differential Power Analysis of the DR-XAP processor on the Springbank
chip (experimental graph)
cover the power used by memory accesses – we had no memory power model available.
This
in turn raises the ratio of differential power to operation power. The upper power
curves for
the XOR operations differ in shape from those measured also caused by no memory
accessing
power. This produces a significant drop in power simulation at the point where one
operand
of the XOR operations is fetched from memory.
Using caches can reduce the number of power-hungry memory fetches. However,
frequent cache misses, e.g, when there are many different data referenced by the S-
box in a
cypher, take longer encryption time [63]. Obtaining key differences by observing
the encryption time can reduce the key search space. This so-called cache attack
requires careful use of
caches in more complex processors.
3.3
Summary
37
38
Chapter 4
Simulating EMA Attacks
As introduced in Chapter 2, cryptographic devices could be broken through analysing
electromagnetic radiation [54, 8, 25] during computation so as to extract
information about the
secret key. Like the DPA attacks described in Chapter 3, differential
electromagnetic analysis
(DEMA) attacks deploy similar sophisticated statistical techniques in order to
detect variations in EM emission so small that individual key bits can be
identified.
DEMA followed DPA in posing a real threat to smart card security. A serious
research
effort has been made to counter the DEMA attacks. These countermeasures generally
endeavour to hide or avoid the correlation between the data being manipulated and
the EM sidechannel information. To evaluate these techniques, I propose design-time
security evaluation
of their effectiveness against EMA attack. This aims to examine data-dependent EM
characteristics of secure processors, so as to assess their security level against
EM side-channel
analysis attacks.
4.1
Background
E · dS = q
(4.1)
B · dS = 0
(4.2)
E · dl = −
dΦB
dt
39
(4.3)
I
B · dl = µ0
#
dΦE
ε0 εr
+i
dt
(4.4)
Where,
E = Electric Field Strength, V /m2
B = Magnetic Flux Density, Tesla or N/A · m
F
ε0 = 8.85418782 × 10−12 m
, Permittivity of a vacuum
εr , Relative permittivity, the ratio of permittivity of a dielectric relative to
that of a vacuum
µ0 = 4π × 10−7 H
m , Permeability of a vacuum
Maxwell’s Equations explain the origin of EM radiation: waves of interrelated
changing
electric and magnetic fields propagate through space. Referring to the third and
forth equations, we know that in an integrated circuit, it is the changing current
flowing in a closed loop
that produces a changing magnetic field which in turn produces a changing electric
field.
Il
(1 + jβr)e− jβr sin θ~Φ
4πr2
(4.5)
Er
I
y
1
∇×H
jωε
Ile− jβr
[2 cos θ(1 + jβr)~r + sin θ(1 + jβr − β2 r2 )~θ]
3
jωε4πr
(4.6)
(4.7)
where~r and ~θ represent that the electric field E has two components along the r
and θ direction in spherical coordinates.
Let us consider approximations for the electric and magnetic fields in near and far
fields
as βr changes:
• Case I. Near Field When βr ≪ 1, i.e. r ≪ λ/2π,
Ile− jβr
[2 cos θ~r + sin θ~θ]
jωε4πr3
Ile− jβr
H ∼
sin θ~Φ
=
4πr2
E ∼
=
(4.8)
(4.9)
Ile
E ∼
sin θ~θ
= jωµ
4πr
Ile− jβr
H ∼
sin θ~Φ
jβ
=
4πr
Note that HEΦθ = ωµ
β =
(4.10)
(4.11)
µ
ε . E and H are orthogonal to each other and are both orthogonal
to the direction of propagation. The relative strength of the electric and magnetic
field
is fixed, which is defined as the wave impedance. Electric and magnetic fields are
jointly referred to as electromagnetic field in far field.
41
Time-variant Magnetic Circuit
Circuits can also generate time-variant magnetic emissions which are the dual of
circuits
generating time-variant electric emissions. A current loop excited by an AC source
carrying
current I generates electric and magnetic fields. In spherical coordinates as shown
in Figure 4.2, the magnetic and electric fields generated by the current loop
mirror those for the
dipole:
z
current loop, in x-y plane
Hr
I
y
H=
IAe− jβr
[2 cos θ(1 + jβr)~r + sin θ(1 + jβr − β2 r2 )~θ]
4πr3
(4.12)
IAe− jβr β
(1 + jβr) sin θ~Φ
(4.13)
jωµ4πr2
Let us consider approximations for the electric and magnetic fields in near and far
fields
as βr changes:
E=
(4.14)
(4.15)
(4.16)
(4.17)
E and H are orthogonal to each other and are both orthogonal to the direction of
propagation. They are now together referred to as an electromagnetic field.
42
From the above description, EM radiation is determined by two things:
• The source – whether it is open ended (dipole) or closed (current loop). If the
source
is a current loop, which is applied in an IC circuit, measuring H in near field is
more
efficient than measuring E.
• The measurement distance – in the near field or far field.
However in each case, the measured element (E or H ) is proportional to current I.
This is
the fundamental reason why current is used to represent EM field (in some cases,
the rate of
change of current is used and the reason will be explained in next section).
V =−
∂B
· ds
S ∂t
(4.18)
over surface S using area element ds. Let us rewrite it into the following
equation,
which says the measurement output is proportional to the rate of change of the
current
which causes the magnetic field.
V =M
dI
dt
(4.19)
where M denotes the mutual inductance between the sensor and the concerned circuit.
Inductive sensors sense the change of magnetic flux, so I use the rate of change of
the current dI/dt to track EM emission. Simulation for this type of sensor involves
differential calculation on current consumption data.
• Magnetoresistive sensors
These are used in hard disk drives for reading and are made of materials that have
resistance linear to the magnetic field (H) [53]. The magnetoresistive probe output
is
proportional to the magnitude of the field, rather than the rate of change of the
magnetic
field such as in inductive probes.
• Hall probe
A Hall probe works by way of the Hall effect. Any charged particle moving
perpendicular to a magnetic field will have a Lorentz force upon it, given by F =
q(v × B). However the moving electrons accumulate an electric field which gives the
electrons an electric force in the other direction by F = qE, where E =
Vmeasured /d. Thus, Vmeasured ∝ B.
The detectable field range of Hall-effect sensors are above 10 gauss [17], too
large to
discern EM emanation from a chip through ambient noise.
There are also far-field electromagnetic field sensors such as log-periodic
antennas. They
generally measure far-field electromagnetic field and often work with other
equipment to
harness modulated emissions. For example, an AM receiver tuned to a clock harmonic
can
perform amplitude demodulation and extract useful information leakage from
electronic devices [8].
This is not an exhaustive list of field sensors, but illustrates that different
types of sensors measure different types of field, so different approaches are
required to conduct EM
simulations.
44
4.2
approximately 1.5 × 108 m/s . The rule of thumb is that we usually need to consider
the transmission-line effect
when the edge length is shorter than three times the longest dimension of a device.
Fast signal edges in smart
card chips with an edge rate of under 1ns have to be considered as “high speed”
only when the longest chip
dimension is beyond 50mm. Smart card chips are typically < 5mm, so wires are never
longer than 10mm, but
even this is unlikely.
45
HDL
(Verilog or VHDL)
design code
std. cell
library
Logic
synthesis
testbench
gate-level
netlist
Floorplanning
P&R
Extraction
parasitics
Verilog/HSPICE
co-simulation
EM analysis
transistorlevel netlist
technology
models
package
lumped
elements
technology
models
Idd(t) data process
for EM Analysis
testbench
Verilog
netlist
Functional /
power
simulation
Simulate EM emissions
-- Sensed by which type of
field sensors?
-- Direct or modulated
emissions?
I(t)
Spice
netlist
Low-pass filter
Chip parasitics
&
package lumped
elements
46
DEMA
traces
Low-pass Filtering Effect of EM Sensors
Since the EM sensors low-pass filter the EMA traces, the two sets of processed
current consumption data have to be low-pass filtered at the end of the EMA data
processing procedure.
Considering the inductance in inductive sensors, and the load resistance from
connected instruments (e.g. an amplifier or an oscilloscope), an RL low-pass filter
is formed as shown in
Figure 4.5. Its 3dB cutoff4 frequency is fcutoff = R/2πL.
IN
OUT
Finally, DEMA is performed by subtracting one EMA trace from another. Security
weaknesses will manifest as pulses in the DEMA trace, revealing data-dependent EM
characteristics of the design under test. The term DEMA here refers to the
variation (difference) in the
EM emissions, instead of statistical treatment correlating the variation to
hypothetical data
being manipulated as in a real DEMA attack [54]. This is because the proposed
methodology
is to evaluate data-dependent EM characteristics of secure processor designs, which
are the
fundamental weakness a real DEMA attack exploits and can be identified with
deterministic
data.
4.3
Evaluation Results
47
distances between individual current paths are much shorter than the distance from
the circuit
to the sensor. The approximation also neglects the effect of different orientations
of branch
currents, assuming they are flowing in parallel and the produced field are added as
scalars
rather than vectors. This approximation may result in quantitative magnitude
difference from
the real emission, but it is effective in simulating differential analysis where
the qualitative
difference is crucial.
variation in
differential trace
Figure 4.6: EMA simulation over the S-XAP processor executing XOR with different
operands
variation in
differential trace
Figure 4.7: EMA measurement over the S-XAP processor executing XOR with different
operands
(experimental graph)
Both the measurement and the simulation results observe the differential trace
peaks when
the processor is executing XOR logic operations. This means data dependent EM
emission
is leaking information related to key bits then, which means vulnerability to EMA
attacks.
The agreement between the measurement and the simulation results confirms the
validity of
the proposed EMA simulation approach. The simulated EM traces in Figure 4.6 are
lower in
shape compared to those measured around the circled places, as the simulation
includes no
power contribution from memory accesses.
To compare the DPA attack and the DEMA attack, Figure 4.8 demonstrates DPA
measurement over S-XAP processor performing the same code. Although we did only 4
measurement runs to average out noise, data dependent power consumption can clearly
identify
when the processor is executing XOR logic operations. The peak-to-peak in the
differential
trace (DPA) is about 6% of the peak-to-peak of the original signals (Power Analysis
1 and
Power Analysis 2). As a comparison, the peak-to-peak DEMA is about the same level
of the
peak-to-peak of the original signals (EMA 1 and EMA 2) in Figures 4.6 and 4.7,
indicating
the same level of information leakage in the EM side-channel and in the power
channel.
49
variation in
power traces
Figure 4.8: DPA measurement over the S-XAP processor executing XOR with different
operands (experimental graph)
insignificant variation
in differential trace
Figure 4.9: EMA simulation over the DR-XAP (asynchronous dual-rail) processor
executing XOR with
different operands
insignificant variation
Figure 4.10: EMA measurement over the DR-XAP (asynchronous dual-rail) processor
executing XOR
with different operands (experimental graph)
51
largest peak in
differential trace
Figure 4.11: EMA simulation over the DR-XAP (asynchronous dual-rail) processor
executing XOR
with different operands, examining modulated emissions
4.4
Summary
A simulation methodology for EMA has been proposed on the basis of an analytical
investigation of EM emissions in CMOS circuits. This simulation methodology
involves simulation
of current consumption with circuit simulators and extraction of IC layout
parasitics with
extraction tools. Once collected, the data of current consumption is processed with
MATLAB to simulate EMA. The proposed simulation methodology can be easily employed
in the
framework of an integrated circuit design flow.
Testing has been performed on synchronous and asynchronous processors and the
results
have demonstrated that DPA and DEMA of direct emissions reveal about the same level
of
leakage. While DEMA of amplitude demodulated emissions reveals greater leakage,
suggesting better chances of success in differential EM analysis attacks. The
comparison between the
EMA on synchronous and asynchronous processors indicates that the synchronous
processor
has data dependent EM emissions, while the asynchronous processor has data
dependent
timing which is visible in DEMA.
52
Chapter 5
Simulating Optical Fault Injection
As introduced in Chapter 2, secure microcontrollers and smart cards are
cryptographic devices widely used for applications demanding confidentiality and
integrity of sensitive information. They are also used for services requiring
mutual authentication and non-repudiation
of the transactions. These devices generally have an embedded cryptographic
processor running cryptographic algorithms such as triple DES, AES or RSA. The
algorithms encrypt data
using secret keys, which should be kept safe in the devices so that attackers can
not directly
read out the key value or deduce it from side-channels [35, 37, 54].
However, it is not sufficient for the cryptographic processors to withstand the
above passive attacks. They should also endure attacks that inject faults into the
devices and thus cause
exploitable abnormal behaviour. The abnormal behaviour may be a data error setting
part
of the key to a known value, or a missed conditional jump reducing the number of
rounds
in a block cipher. A glitch inserted on the power or clock line was the most widely
known
fault injection technique [10], but many chips nowadays are designed to detect
glitch attacks.
Optical fault injection introduced by Skorobogatov [58] in 2002 appears to be a
more powerful and dangerous attack. It involves illumination of a target transistor
which causes the
transistor to conduct transiently, thereby introducing a transient logic error.
Such attacks are
practical as they do not require the expensive equipment that is needed in invasive
attacks1 .
This threat has become increasingly relevant as transistor dimensions and supply
voltages
are constantly scaling down. In deep submicron technologies2 , it is easier to
introduce and
propagate transient voltage disturbances as the capacitance associated with
individual circuit
nodes is very small, and large voltage disturbances can be produced from relatively
small
amounts of ionised charge. Also, due to the high speed of deep submicron circuits,
the voltage disturbances can propagate more easily.
To keep cryptographic devices secure against optical fault induction attacks,
various ideas
have been proposed for the design of cryptographic devices. To evaluate this
research effort,
a design-time security evaluation methodology is proposed to exhaustively examine
the response of secure processors under optical illumination by simulation, so as
to assess their
security level against optical fault injection attacks at design time.
1 Invasive attacks require decapsulation and deprocessing to get direct access to
the internal components of
the device.
2 Gate lengths below 0.35 µm are considered to be in the deep submicron region.
53
5.1
Background
Optical fault injection is not entirely new. After semiconductor devices were
invented, they
were found to be sensitive to ionising radiation in space, caused by protons,
neutrons, alpha
particles or other heavy ions [13]. Pulsed lasers were then used to simulate the
effects of
ionising radiation on semiconductors [15]. Depending on several factors, laser
illumination
may cause: no observable effect, a transient disruption of circuit operation, a
change of logic
state, or even permanent damage to the device under test [21].
where αn and α p are the laser radiation interaction with electrons and holes cross
sections, cm2 ; n and p are the concentration of free carriers, cm−3 . αiz (= α0iz
+ αbn · Nd ) is the
laser radiation interzoned absorption factor of semiconductor, cm−1 ; α0iz is the
laser radiation
interzoned absorption factor in lightly doped semiconductor, cm−1 ; αbn is the band
narrowing
effect factor caused by high doping concentration Nd , αbn is in cm2 and Nd is in
cm−3 .
Equation (5.1) can give us the free carriers generation rate [49] as:
G(x) = η · αiz ·
I1
· (1 − R) · e−αx
hν
54
(5.3)
where η is the photo-ionisation quantum efficiency (the free carriers pairs
quantity, generated by an absorbed quantum), with value at about 1 near the main
absorption band edge; hν
is the laser quantum energy, in Joules; R is the reflection coefficient (0.3 for
silicon substrates
when radiation performed from the back side; 0.1 ∼ 0.3 for various oxide thickness
when
radiation performed from top-side).
When the excited charge amount reaches the critical charge Qcrit , the charge
necessary to
flip a binary "1" to a "0" or vice-versa, a single event upset (SEU) occurs. Device
immunity
is determined by its threshold linear energy transfer (LET). The threshold LET
(LETth ) is
defined as the minimum LET required to produce a voltage change (∆V ) sufficient
for an
SEU, then mathematically:
Qcrit
LETth ∝ ∆V (=
)
(5.4)
C
Where C is the capacitance of the struck node.
(5.5)
where Pe (x) is the incident energy without metal shielding effect; Pem (x) is the
incident
energy with metal shielding effect; Km = Sm /S; S is the total top surface area
under illumination, while Sm is the metallisation area within.
A way to bypass metal shielding is to attack the chip from the back, if the target
device
allows this.
5.2
Simulation Methodology
The flow of designing and evaluating a test chip against optical fault injection
attacks is outlined in Figure 5.1. A major concern with this traditional approach
is that security evaluation
occurs too late in the design cycle to allow for efficient repair. The deficiencies
in the design
57
often result in costly and frequent design re-spins. As a comparison, the procedure
with evaluation incorporated in the design flow is demonstrated in Figure 5.2. This
design flow can
spot design oversights or errors at an early stage to avoid costly silicon re-
spins.
HDL design
Sythesis
Place and Route
Redesign/Modify
Circuit
Circuit Layout
HDL design
Sythesis
Place and Route
Redesign/Modify
Circuit
No
Small Cost
Security Evaluation
through simulation
against Laser Radiation
(passed?)
Circuit Layout
Yes
Manufacture of
Test Chip
No
Manufacture of
Test Chip
No
Security Evaluation on
Test Chip against Laser
Radiation (passed?)
Large Cost
Security Evaluation on
Test Chip against Laser
Radiation (passed?)
Yes
Yes
The design passes the
Security Test of Optical
Fault Injection Attack
Figure 5.2: Flow chart exhibiting the iterative process to design and evaluate a
test chip
against optical fault injection attacks with the
aid of design-time security evaluation
Figure 5.1: Flow chart exhibiting the traditional iterative process to design and
evaluate
a test chip against optical fault injection attacks, after [45]
Testbench
Verilog
netlist
Spice
netlist
HDL/SPICE
co-simulation
Stimulus
Layout
Technology
models
Figure 5.3: Simulation procedure for optical fault injection attack
The layout can be scanned with any size of laser illumination spot, which can
target from a
single transistor to hundreds of transistors, depending on the equipment used by
the attackers
as described in Section 5.1.3. The scans can be performed over a particular area
such as the
ALU, register file, or even the whole processor. Figure 5.4 illustrate scanning in
simulation,
where each scan (S11 , S12 ... Smn ) generates a list of logic cells under attack.
For example, in
a particular scan, exposed cells are listed as follows:
m/datapath/U355
m/datapath/alu/U33
m/datapath/fi_reg_4
m/U1458
m/U1490
FC_299
m/U1506
m/U1223
Among the selected cells, FC_299 is a filler cell and the rest are logic cell
instances. We
first discard the filler cells, then check the standard cell library, mapping the
logic cells to their
internal nodes, especially the nodes connected to n-type transistors3 . In addition
to what may
be considered a useful attack mechanism, negative effects are also possible. These
include the
possibility that latch-up may be induced by the generation of photocurrents in the
bulk (the
substrate and well). Of less concern when using readily available infra-red and
visible laser
light sources is the ionisation of gate- and field-oxides due to the large band gap
energy of
silicon dioxide (which would require a laser with a wavelength in the UV-C range).
Ionisation
of this type is common when higher energy forms of radiation are absorbed. The
subsequent
accumulation of positive charge results in a long term shift in transistor
characteristics.
Based on the fact that optical attack is substantially more effective at turning on
n-type
3 Or connect the nodes to p-type transistors depending on the process technologies,
especially the substrate
59
S11 S12
S1n
Sm1
Smn
transistors than their p-type counterparts4 , the laser radiation will result in
one of three behaviours in a given logic gate:
• The laser radiation is not strong enough to cause either the n-type or the p-type
CMOS
transistors to conduct, so no state change occurs at the logic cell output.
• The laser radiation switches on the n-type but not p-type transistors, so
abnormal behaviour may occur.
• The laser radiation is strong enough to cause both n-type and p-type CMOS
transistors to conduct in a logic gate. This results in large leakage current or
even a strong
VDD-to-GND short circuit, which may damage the circuit eventually if no current
limit
protection is provided
Of the three behaviours, only the second is considered as a successful attack as
opposed to
sabotage, and is therefore the focus of this simulation methodology. This allows us
to simply
focus on n-type transistors in the simulation of security evaluation targeting Type
I attackers.
Apparently, in the case where the laser can target a single p-type transistor and
successfully
switch it on, the attacker is able to manipulate the circuit more capably. This
situation falls
into the category of Type II and III attacks. The corresponding simulation requires
layout
scans over every single transistor.
After obtaining the list of exposed cells for each scan, we then supply the
internal nodes
with transient voltage pulses via tri-state buffers. The enable signals of the tri-
state buffers
are synchronised with the target instruction execution during a cryptographic
program operation. The co-simulation shown in Figure 5.3 integrates the voltage
pulses and illuminated
4 Or more effective at turning on p-type transistors than n-type, depending on the
process technologies,
especially the substrate type and the well type, see Appendix for details.
60
cells in SPICE, whilst the rest of the circuit remains in Verilog. Analysing the
response and
comparing it to that of the normal operation, we can evaluate the security of the
circuit design
against optical fault injection attacks. If it fails, modification or even redesign
of the circuit is
required as demonstrated in Figure 5.2. If it passes, then designers can continue
to have the
chip manufactured.
5.3
Results
ah,@(1,x)
ah,@(2,x)
; XOR operation
ah,@(3,x)
; save result
Figure 5.5: Fragment of the instruction program used for the evaluation
ALU and
decoder
Figure 5.6: Screen shot of scanning procedure over the layout of ALU and decoder of
processor
S-XAP: the region within the little square being illuminated
The exhaustive examination of the 120 simulation runs shows different results:
1. The processor results in deadlock in many cases, which is desirable in terms of
security,
provided this does not leak secret information.
2. Some other cases show normal program execution. This implies the introduced
fault
may be part of the “don’t care” state of the subsequent operation of the circuit
[21].
3. Two failures are also revealed:
(a) The first disrupts the XOR operation by changing the value in the AH register.
(b) The second failure causes a memory dump. Instead of executing a data write
to memory, the processor keeps reading the contents of the whole memory. We
suspect the memory dump occurs when the decoder was struck in the test, which
resulted the opcode being modified from “1101” (standing for XOR) to “0001”
(standing for LOAD).
Modifying register values implies that setting part of the key to a known value
becomes
feasible to the attackers. Dumping memory can be dangerous to designs implemented
with
an architecture where a single storage structure is used to hold both instructions
and data. If
the memory contains passwords and decryption keys, then by carefully analysing the
dumped
62
memory, one can break the cryptographic device. In contrast, a design implemented
with
Harvard architecture [1] could offer better protection against microprobing
attacks, as it uses
physically separate storage for instructions and data. The same trick applied to a
Harvard
microcontroller would reveal only the program code, whereas the data memory
containing
sensitive information will not be available.
It takes about 10 minutes to run the scanning process (containing 120 scans) with
Cadence
Silicon Ensemble™. Then it takes about 4 hours to complete the 120 runs of
HDL/SPICE cosimulation, with each run to have 14,000 transistors simulated in
Synopsys NanoSim™ and
the rest tens of logic gates simulated in Synopsys VCS™. All the simulation work is
done on
a 1.6 GHz AMD Athlon processor with 2GB memory.
5.4
Summary
VDD
n+
p+
p+
n-MOS
n-well
n+
n+
GND
p+
p-substrate
Source
Substrate
contact
p-well region
p-epitaxial region
p-substrate region
Figure 5.8: Cross-section of a n-MOS in p-well + p-epitaxial + p-substrate
process[22]
Vg = 0V
Gate
VDD = 5V
GND
Drain
-
p-substrate
constact
Source
-
e
b
p-well region
65
Threshold LET (MeV cm2/mg)
bipolar effect for the p-well case is simply because in p-well, the parasitic
bipolar is pnp
rather than npn. For identical structures, a pnp bipolar will have lower current
gain (∼1/3)
than an equivalent npn due to the lower mobility of holes compared to electrons.
35
30
25
20
15
10
5
0
0
0.5
1.5
2.5
Gate Length (µ
µm)
Figure 5.10: Simulated threshold LET vs. gate length in n-substrate technologies,
after [22]
80
70
60
50
40
30
20
10
0
0
0.5
1
1.5
2.5
Gate Length (µ
µm)
Figure 5.11: Simulated threshold LET vs. gate length in p-substrate technologies,
after [22]
According to the trend shown in Figure 5.10 and 5.11, a rule of thumb is
• for p-substrate, either p-substrate + n-well or p-substrate + twin-well:
n-MOS is easier to switch on
• for n-substrate, either n-substrate + p-well or n-substrate + twin-well:
above 1 µm technology node, p-MOS is easier to switch on, below 1 µm technology,
device simulation or experiment is required to determine the minimum upset LET for
66
n- and p-MOS respectively, before the proposed simulation methodology is applied on
a large scale IC.
With silicon-on-insulator, the situation will be different but this is not
discussed here as
all microcontrollers and smart cards nowadays use a bulk silicon design style.
67
68
Chapter 6
Conclusion and Future Work
6.1
Conclusion
This thesis has introduced the security hazards for consumer devices like smart
cards. Traditional industrial practise has been to evaluate the security of
hardware post manufacture. This
is an expensive and error prone process. Therefore I proposed a set of design-time
security
evaluation methodologies which provide systematic and exhaustive simulation at
design time
to evaluate the security of the design under test against various attacks.
The main contribution of this thesis is the design-time security evaluation
methodology
against differential power analysis (DPA) attacks, electromagnetic analysis (EMA)
attacks
and optical fault injection attacks.
• The simulation methodology for DPA of secure processors includes power simulation
of the logic circuitry and low-pass filtering caused by on-chip parasitics and
package
inductance.
• The simulation methodology for EMA involves simulation of current consumption
with
circuit simulators and extraction of IC layout parasitics with extraction tools.
Once
collected, the current consumption data is processed with MATLAB to implement
Differential EMA (DEMA) according to various sensor types and emission types.
• The simulation methodology for optical fault injection attacks involves
exhaustive scans
over the layout with a laser spot size chosen according to the attack scenario.
Once the
exposed cells for each scan are identified, they are mapped to their internal
nodes, especially the n-transistor output nodes or the p-transistor output nodes,
depending on
the process technologies. Then these nodes are driven by transient voltage sources
via
tri-state buffers, to mimic the effect of transistor conduction caused by laser
illumination. Finally, the response of the circuit is examined and compared to that
of the normal
circuit without a laser attack.
These simulation methodologies have covered side-channel analysis attacks that have
been threatening the smart card industry. Although the simulation examples
demonstrated
in the thesis are on simple microprocessors, the simulation methodologies are
applicable for
evaluating more complex processors including multiple pipelines, multiple cores and
multithreading implementations. The simulation methodologies are also applicable
for evaluating
69
advanced defence techniques, such as out-of-order execution, random-delay insertion
and
cryptographic algorithm transforming, by writing proper test benches to verify
these countermeasures. The DPA and DEMA simulation methodologies can be easily
extended to other
variants of side-channel analysis attacks, such as second-order differential power
analysis
suggested by Messerges [47] to defeat random masking [29]. Second-order
differential power
analysis requires the attacker to know the time before and after the random masking
operation, and compute the difference of the power consumption between the two time
instances
within the same power trace. This process can be easily performed in the proposed
simulation
flow.
Final comments
The proposed simulation methodologies have laid the cornerstones for building a
complete
suite of design-time security evaluation tools. Our design-time evaluation
methodology is
able to simulate all known circuit-level attacks and defence techniques. Such
techniques
are often complemented by barrier technologies, such as refractory chip coatings or
toplevel defence grids; these must still be evaluated by post-manufacture test.
However, our
techniques can replace the most tedious and expensive part of the security test
process.
6.2
Future Work
Finally we suggest some directions for further research into design-time security
evaluation.
71
72
List of Papers
The research work in this thesis was presented and published in the official
proceedings of
rigorously refereed conferences through the following research papers:
• Huiyun Li and Simon Moore, “Security Evaluation at Design Time Against Optical
Fault Injection Attacks”, accepted by IEE Proc. Information Security.
• Huiyun Li, A. Theodore Markettos and Simon Moore, “A Security Evaluation
Methodology for Smart Cards Against Electromagnetic Analysis”, in proceedings of
39th IEEE
International Carnahan Conference on Security Technology, Pages 208- 211, 2005
• Huiyun Li, A. Theodore Markettos and Simon Moore, “Security Evaluation Against
Electromagnetic Analysis at Design Time”, in proceedings of Workshop on
Cryptographic Hardware and Embedded Systems (CHES2005), LNCS Volume 3659, Pages
280 - 292, 2005
• J. Fournier, H. Li, S.W. Moore, R.D. Mullins and G.S. Taylor, “Security
Evaluation of
Asynchronous Circuits”, in proceedings of Workshop on Cryptographic Hardware and
Embedded Systems (CHES2003), LNCS Volume 2779, Pages 137 - 151, 2003
73
74
Bibliography
[1] The free dictionary encyclopedia: Harvard architecture.
thefreedictionary.com/harvard\%20architecture.
http://encyclopedia.
79