0% found this document useful (0 votes)
30 views13 pages

2016 Ccs Power

Uploaded by

fofoma5593
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views13 pages

2016 Ccs Power

Uploaded by

fofoma5593
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

On Code Execution Tracking via Power Side-Channel

Yannan Liu12 , Lingxiao Wei1 , Zhe Zhou3 , Kehuan Zhang3 , Wenyuan Xu4 and Qiang Xu12
1
CUhk REliable Computing Laboratory (CURE)
2
CUHK Shenzhen Research Institute
123
The Chinese University of Hong Kong
4
Department of Electronic Engineering, Zhejiang University
1
{ynliu, lxwei, qxu}@cse.cuhk.edu.hk,3 {zz113,khzhang}@ie.cuhk.edu.hk,4 [email protected]

ABSTRACT cation. Such a method enables us to answer two important


With the proliferation of Internet of Things, there is a grow- security related questions: (1) At a given time, which in-
ing interest in embedded system attacks, e.g., key extrac- struction in a code is being executed? (2) Given a source
tion attacks and firmware modification attacks. Code execu- code, has it been modified and is the microcontroller unit
tion tracking, as the first step to locate vulnerable instruction (MCU) executing a malicious code?
pieces for key extraction attacks and to conduct control-flow Here, we illustrate how to utilize code execution tracking
integrity checking against firmware modification attacks, is with two examples, but its applications are not limited to
therefore of great value. Because embedded systems, espe- these two. (1) Locate the vulnerable code section for extract-
cially legacy embedded systems, have limited resources and ing private information of a system during execution. For in-
may not support software or hardware update, it is impor- stance, key extraction attacks [1, 2] assume that adversaries
tant to design low-cost code execution tracking methods that are aware of the code of the cryptographic algorithms. They
require as little system modification as possible. In this work, analyze the source code to find vulnerable code sections, and
we propose a non-intrusive code execution tracking solution locate these code sections during execution for private infor-
via power-side channel, wherein we represent the code ex- mation extraction. Typically, prior work assumes locating
ecution and its power consumption with a revised hidden code sections during execution is achievable and focus on the
Markov model and recover the most likely executed instruc- code analysis part. Our work fills in the blank. (2) Detect
tion sequence with a revised Viterbi algorithm. By observing attacks that intend to hijack MCU’s control-flow to execute
the power consumption of the microcontroller unit during ex- malicious code [3–5]. One effective countermeasure to these
ecution, we are able to recover the program execution flow attacks is to enforce control-flow integrity (CFI) [6], which
with a high accuracy and detect abnormal code execution be- tracks code execution and prevents code execution deviating
havior even when only a single instruction is modified. from the control-flow graph (CFG) of the program. Over the
last decade, a large number of CFI techniques [7–12] have
been proposed. These techniques, despite their effectiveness,
1. INTRODUCTION are inapplicable for many embedded systems, because the
Embedded devices controlled by microcontroller units are imposed overhead will overwhelm the resource-constrained
deployed everywhere. They are not only widely spread in devices and they typically require software and/or hardware
our daily life with the proliferation of Internet of Things modification, which is impossible for most embedded de-
(IoT), but also extensively used in the global IT environ- vices, especially legacy devices. Our work enables to apply
ments and critical infrastructures. Consequently, there is a CFI on embedded systems.
growing interest in embedded system attacks and defense Code execution tracking via power side-channel is promis-
mechanisms. What makes both, especially defense, difficult ing yet challenging. The advantage is that power side-channel
is the limited capability of code execution monitoring on leaks information about instructions being executed on MCU
embedded systems, mainly caused by limited I/O interfaces and obtaining such information needs no modification on the
and constrained-resources. This situation is unlikely to be MCU itself. However, power measurement traces are quite
alleviated any time soon by adding extract features, since noisy and it is difficult to extract useful information out of
updating embedded systems, especially legacy systems, is them. Prior work on recovering the type of executed instruc-
hindered due to safety or cost concerns. Thus, in the paper, tion in a MCU via power-side channel [13] showed a rather
we design a method for code execution tracking of embed- low accuracy (about 60% in the best case). We manage to
ded systems without requiring software or hardware modifi- track code execution at a much higher accuracy by lever-
aging the control transfer information from CFG and using
Permission to make digital or hard copies of all or part of this work for personal or frequency analysis to reduce the noise in power side-channel.
classroom use is granted without fee provided that copies are not made or distributed To be specific, the main contributions of this work include
for profit or commercial advantage and that copies bear this notice and the full cita- the followings.
tion on the first page. Copyrights for components of this work owned by others than
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-
publish, to post on servers or to redistribute to lists, requires prior specific permission • We propose to recover the instruction sequence by hid-
and/or a fee. Request permissions from [email protected]. den Markov model (HMM). To increase the identifica-
CCS’16, October 24-28, 2016, Vienna, Austria tion accuracy, we take advantage of the fact that for a
c 2016 ACM. ISBN 978-1-4503-4139-4/16/10. . . $15.00 given program, instructions should be executed in se-
DOI: http://dx.doi.org/10.1145/2976749.2978299 quences obeying CFG, and identifying these sequences

1019
can increase the noise resilience than identifying every typically extract keys via side-channel attacks or fault in-
single instruction independently. Thus, we represent jection attacks. Both attacks require to precisely track code
the CFG as the state machine in HMM. To efficiently execution before launching the attacks.
utilize control transfer information from CFG, we use Side-channel attacks analyze MCU’s behavior on physical
basic blocks in CFG as states in HMM. Correspond- side-channels, e.g., acoustic emission [15] and power con-
ingly, we revise the classic HMM and Viterbi algorithm sumption [16], when vulnerable instruction pieces are exe-
to cope with the challenges that basic blocks contain cuted during encryption or decryption. Such vulnerable in-
different numbers of instructions and hence have dif- struction pieces are carefully chosen from the code, so that
ferent lengths. their operations are correlated to the key bits. For example,
differential power analysis (DPA) [1] on Data Encryption
• We propose signal extraction techniques to design the Standard requires to obtain the power traces for the 16th
observation symbols in HMM. By extracting high qual- encryption round; correlation power analysis (CPA) [17] on
ity signals from power measurement traces, the impact Advanced Encryption Standard (AES) requires to obtain the
of power measurement noises is dramatically reduced power traces just after the first AddRoundKey operation.
and we are able to further improve instruction recog- Fault injection attacks attempt to disturb cipher’s opera-
nition accuracy. tions and extract keys by analyzing the cipher’s faulty out-
• We apply our proposed code execution tracking tech- put [18, 19]. For each fault attack method, attackers must
niques for control-flow integrity checking, i.e., we ob- inject faults into MCU at precise timing when vulnerable in-
tain the likelihood of the reported instruction sequence struction pieces are executed. Take extracting the round key
from power side channels and its value reflects whether of AES-128 encryption algorithm [20] as an example. The
there exists abnormal execution behavior. vulnerable sections can be the code between the 6th and
the 7th MixColumns operations [21], in the last encryption
We evaluate the proposed code execution tracking solution round but before its SubBytes operation [2], in the previous
on a 8051 MCU, a popular choice for IoT, wearable devices, rounds of the targeted round [22,23], and in the penultimate
industrial sensors, etc, because of their ease of software de- round but before its MixColumns operation [24].
velopment, royalty-free licensing and low cost, and small sil- Thus, all these key extraction attacks require to locate
icon footprint. We select nine programs as our benchmark the vulnerable instruction pieces during execution. Conse-
suite. We demonstrate that our method can significantly im- quently, code execution tracking is an essential step in such
prove the tracking accuracy. For the benchmark programs, attacks and it is unfortunate that previous works in this
we are able to achieve 99.94% accuracy in recovering the domain often assume such method is readily available.
type of executed instruction, which is 42.55% higher than
that of the previous method. Besides recovering instruction 2.2 Control Flow Integrity
type, our method can identify which instruction in code is Code execution tracking is also the foundation for CFI
executed at a given moment, with the average accuracy of techniques, which is effective to cope with control flow hi-
98.56%. In addition, we demonstrate that our method is jacking attacks, e.g., return oriented programming [25], jump
able to detect abnormal execution behavior effectively, even oriented programming [26], buffer overflow [27] and firmware
for the case when a single instruction in the original code modification [3]. CFI tracks code execution and prevents
has been changed. any attempt to deviate execution flow from CFG [28].
The remainder of the paper is organized as follows. In Fine-grained CFI checking [7] requires adding a piece of
Section 2, we present background knowledge and our prob- CFI guard code before every control-flow instruction (e.g.,
lem formulation. Next, we give an overview of our method indirect jump) and allocating an additional shadow stack to
in Section 3, and we discuss our revised HMM and Viterbi track and validate every function call, return, and excep-
algorithm in Section 4. Section 5 describes how to design tion during execution. To improve the performance, many
observation symbol and emission distribution function. In CFI techniques require hardware support. For instance,
Section 6, we explore how our method can facilitate de- ROPecker [8] relies on last Branch Recording (LBR), which
tecting abnormal execution. We evaluate our method with is a hardware unit introduced in Intel’s Nehalem architec-
STC89C52 MCU in Section 7 and discuss the limitations of ture, for CFI, and hardware modification is needed when ap-
our method in Section 8. At last, we introduce related works plying it to other processors. Similarly, HAFIX [10] depends
in Section 9 and conclude this work in Section 10. on special hardware designs to track and verify function re-
turns during execution, and other studies [11, 12] introduce
dedicated hardware modules to track instruction execution
2. BACKGROUND AND PROBLEM sequence, calculate a signature for this sequence, and com-
FORMULATION pare it with golden values.
In this section, we first discuss the importance of code Thus, the aforementioned CFI techniques require software
execution tracking on key extraction attack and CFI. Then, or MCU modification and incur non-trivial overhead to the
we briefly describe CFG and HMM. Finally, we formulate system. For embedded systems with limited resources, es-
the problem to be investigated in this work. pecially legacy embedded systems, such intrusive solutions
are not applicable.
2.1 Key Extraction Attack
When cryptographic algorithms are implemented in soft- 2.3 Basics for CFG and HMM
ware, for security reasons, designers usually choose imple- Control Flow Graph [29] is a directed graph and repre-
mentations from open source libraries (e.g., OpenSSL [14]). sents how a program can transit between basic blocks dur-
Hence, adversaries are knowledgeable about the code, and ing execution. A basic block is a sequence of instructions

1020
Figure 1: (a) An example Control Flow Graph. (b) Illustration of a classic HMM.(c) Illustration of a revised HMM.

that has only one entry point at the beginning and one exit the contrary, control flow hijacking attacks usually introduce
point at the end. That is, a basic block can be considered invalid control transfers [3, 25–27], and the actual execution
as an execution primitive, and its instruction combination flow deviates from the CFG, namely abnormal execution.
always run in the same order. Figure 1(a) shows an exam- The objective of CFI is therefore to detect whether there is
ple of CFG: each node in CFG represents a basic block and abnormal execution in the system.
each edge in CFG represents a valid control transfer between Thus, we formulate two sub-problems in this work.
basic blocks.
A hidden Markov model [30] consists of three parts: state 1. Normal Execution Tracking: Given the source code
machine, emission distribution, and observation symbol. The and the power measurement traces during code execu-
visible observation depends on hidden states and the hidden tion, we would like to recognize which instruction in-
state transition is a Markov process. Each state has a prob- stance is executed at each moment within the power
ability distribution over the possible observation. There- traces.
fore, the sequence of observation provides some informa- 2. Abnormal Execution Tracking: Given the source
tion about the sequence of hidden states. Viterbi algo- code and the power measurement traces during code
rithm [30] is often used to find the most probable state execution, we would like to detect whether abnormal
sequence for a given observation sequence in HMM. Fig- execution is performed.
ure 1(b) shows an HMM example, which consists of three
states ({s1 , s2 , s3 }) and four possible observation symbol val-
ues ({v1 , v2 , v3 , v4 }). At time t, people can make an obser- 3. OVERVIEW
vation ot , where ot ∈ {v1 , v2 , v3 , v4 }. By continuously ob- In this section, we introduce the overall flow of our execu-
serving the HMM, an observation sequence O is obtained. tion tracking method and discuss the main challenges. To
O is generated by the HMM going through a state sequence simplify discussion, we consider every instruction costs one
Q, where qt (qt ∈ {s1 , s2 , s3 }) in Q is the state of HMM at unit of time. Practically, some instructions may cost mul-
time t. If HMM is in state si at time t, it will jump to state tiple units of time. In that case, we treat them as multiple
sj at time t + 1 with probability ai,j . The probability to one-unit instructions.
observe vk is ek,i = P r[vk |si ], which depends on the hidden
state si . 3.1 Overall Flow
While we formulated two problems to tackle in this work,
2.4 Problem Formulation both of them share the same code execution tracking frame-
work. We model code execution on MCU and its power side-
Term definition. For the sake of clarity, we define the fol- channel behavior as an HMM. To be specific, we consider a
lowing terms used throughout the paper before formulating basic block as an individual state, control transfer between
the problem. basic blocks as state transition, and the power consump-
Instruction Instance. We use an instruction instance to tion of MCU as observation. Then, tracking code execution
indicate a specific instruction, including both its machine is equivalent to recognizing how underlying state transition
code and location in the code. If two instructions in the happens for a given power trace.
code have the same machine code but with different PC
Workflow of Code Execution Tracking. The overall
values, we treat them as different instruction instances. For
flow of our execution tracking framework is illustrated in
the sake of simplicity, an instruction sequence in this paper
Figure 2, which contains an HMM construction phase and an
refers to a sequence of instruction instances.
execution tracking phase. The final output of our framework
Instruction Type. Instruction type of an instruction is
includes two sequences: an instruction sequence and the cor-
only determined by its operation code. We treat instruction
responding likelihood of each instance in the sequence.
instances with the same operation code belong to the same
The HMM construction phase determines the parameter
instruction type.
values of the HMM. We obtain substate (i.e., instruction
Formulation. Although both key extraction and CFI rely instance), state and state transition information from the
on code execution tracking, their requirements are differ- CFG of the given code, which can be derived by analyz-
ent. For key extraction, they only need to accurately track ing the disassembled binary [31]. Based on a set of power
the normal execution of the given code. In normal execu- traces when executing various instructions, the observation
tion, the actual execution flow always obeys the CFG. On symbols are obtained by performing signal extraction and

1021
Figure 2: Workflow of the proposed code execution tracking framework.

dimension reduction, and emission distribution is modeled classic Viterbi algorithm assumes the length of every state
with Gaussian distribution, as detailed in Section 5. is 1, and it cannot work for states with unequal lengths. To
In the execution tracking phase, we first obtain observa- tackle these problems, we introduce substates, which repre-
tion sequence from power trace, and then identify the most sent instruction instances (see Figure 1(c)) in basic blocks,
probable instruction sequence. In particular, we first divide and define emission distribution function on substate. By
power traces into chunks that map to individual instruc- doing so, we only need to divide power trace into chunks
tions. This is a straightforward procedure because power that correspond to instructions. We also revise Viterbi al-
trace exhibits periodical characteristics with periods map- gorithm to work with unequal length states and the sub-
ping to instructions [13,32]. Then, to obtain the observation states. By combining states and substates, we are able to
symbol value for each chunk, we conduct filtering and linear simultaneously preserve the CFG information in HMM and
transformation on the raw power trace within the chunk, dramatically reduce computational complexity.
which correspond to signal extraction and dimension reduc- To reduce the cost of building emission distribution func-
tion, respectively. With our revised Viterbi algorithm, we tion for every instruction instance, we build individual emis-
can recover the most probable instruction sequence from the sion distribution function for each instruction type and in-
obtained observation sequence. Then, based on recovered struction instances of the same type use the same distribu-
instruction sequence and observation sequence, we can cal- tion function. Such emission distribution function design
culate the likelihood sequence. suffices to track code execution, because different parts of
the code usually have different instruction type sequences
Normal and Abnormal Execution Tracking. The re-
and accurately recognizing instruction type enables us to
ported instruction sequence directly addresses the normal
recover the underlying state. To reduce the noise in instruc-
execution tracking problem. To approach the abnormal ex-
tion type recognition, we try to extract high-quality signals
ecution tracking problem, we can examine the likelihood se-
from power traces and use them for observation symbols. To
quence, as detailed in Section 6.
further reduce the computational overhead of distribution
function construction, we also exploit dimension reduction
3.2 Challenges when designing observation symbols.
To track code execution with HMM, we could define indi-
vidual instruction type or individual instruction instance as 4. REVISED HMM AND VITERBI
a state in HMM, but both have limitations. Using instruc- ALGORITHM
tion type as state only recovers the instruction type sequence
instead of instruction sequence, which cannot solve the nor- In this section, we discuss how to revise HMM and Viterbi
mal execution tracking problem. Using instruction instance algorithm for the problems investigated in this work.
as individual state can solve this problem, but its computa- 4.1 HMM Parameters
tional complexity is prohibitive because the given code usu-
ally contains a large amount of instruction instances, which Figure 1(c) shows an example of our revised HMM, and
creates a large number of states. For instance, the space formally, our HMM is characterized by the following param-
complexity of Viterbi algorithm is proportional to the num- eters:
ber of states and hence becomes inefficient. In addition, States, state lengths and substates. States are given
every state requires an emission distribution function, and by S = {si | 1 ≤ i ≤ N }, where N is the number of basic
building individual emission distribution function for every blocks in the CFG, and state si corresponds to the ith basic
instruction instance is impractical for large programs. To block in the CFG. We use li to represent the number of
reduce the number of states without sacrificing recognition substates, i.e., instruction instances, in state si . Then si
accuracy, we define a basic block in CFG as state in HMM. can be further represented as a sequence of li substates, i.e.,
Because the instruction instances in a basic block always si = {s1i , s2i , . . . , slii }, where sm
i is the mth substate in si .
run in the same order, if we know how basic block transi-
State transition probability and initial substate dis-
tion occurs during execution, the instruction sequence can
tribution. In our model, state transition represents con-
be determined as well.
trol transfer between basic blocks. However, such transition
The above state definition in HMM, however, incurs many
probability distribution can vary significantly with different
challenges. Classic HMM defines emission distribution func-
inputs to program, and the exact input for targeted execu-
tion on the entire state, and it needs to divide given power
tion is not available to us. Hence, we use a(si , sj ) to indicate
trace into chunks, where each chunk corresponds to one
whether there is a valid control transfer in CFG from si to
unknown state. However, basic blocks may contain var-
sj , that is
ious number of instruction instances and hence different 
states have unequal lengths in our case, which makes divid- 1 if transition is valid
a(si , sj ) = . (1)
ing power trace for unknown states non-trivial. Moreover, 0 otherwise

1022
be estimated as,


 0, R(Q) contains

 invalid transition
J(Q, O) = Y . (2)



 e(ot , qt ), otherwise
1≤t≤T
Figure 3: Example for Q and R(Q).
Then, the most probable substate sequence is simply the
one with the maximum J value.
Consequently, a(sP i , sj ) is no longer a probability value, be-
cause, for any i, 1≤j≤N a(si , sj ) could be larger than 1.
4.3 Revised Viterbi Algorithm
The first chunk in examined power trace can correspond to Next, let us discuss how to efficiently find the most prob-
any instruction instance in any basic block. To obtain the able substate sequence, given observation sequence O of
prior probability of instruction sequence, we also need the length T . Our algorithm follows the idea in classic Viterbi
probability of an instruction instance, i.e., substate, being algorithm. We first calculate the J value of most probable
the initial one. This probability distribution also varies with substate sequence by recurrence, and then reconstruct the
different input data, and we simply assume all substates have most probable substate sequence by backtracking.
equal probability to be the initial one. For each state sj and each time t, our revised Viterbi
Observation symbols and emission distribution. We algorithm calculates a quantity, denoted by δt (j). When
use V to represent the set of all possible observation sym- 1 ≤ t ≤ T , δt (j) represents the maximal J for substate se-
bol values. As mentioned above, our emission distribution quence that starts at time 1 and terminates at time t with
l
is defined on substate. Emission distribution for substate sjj (i.e., the last substate of sj ). So at time T, the calcu-
sm m m
i is given by {e(v, si ) = p(v|si ) | v ∈ V }. We detail
lated δT (j) corresponds to the maximal J value of substate
how to design observation symbol and emission distribution sequence that ends exactly at the last substate of sj . Given
function in Section 5. the observation can also end at any substate inside sj (e.g.,
the case shown in Figure 3), we also calculate sj ’s δ at time
T + 1 to T + lj − 1. When T + 1 ≤ t ≤ T + lj − 1, δt (j)
4.2 Likelihood Estimation represents the maximal J for substate sequence which starts
Then, finding the most probable instruction sequence is l −(t−T )
at time 1 and terminates at time T with sjj as the last
equivalent to finding the most probable substate sequence.
substate.
Next, we discuss how to estimate the probability for a sub-
Let us first give the basic idea about how to calculate δ
state sequence given the observation sequence.
by recurrence. For a given substate sequence, we can divide
When calculating the probability of a substate sequence
it into two parts: one part corresponds to its final state and
Q, we also need its corresponding state sequence R(Q). R(Q)
the other part is the substate sequence before its final state.
explicitly represents the state transitions in Q. For example,
For instance, we can divide the Q shown in Figure 3 into
in Figure 3, for substate sequence {s35 , s12 , s22 , s18 , s28 , s38 }, its
{s18 , s28 , s38 } and {s35 , s12 , s22 }. Then, according to Equation 2,
corresponding state sequence is {s5 , s2 , s8 }.
the J value of a valid substate sequence is the product of J
Formally, a substate sequence of lengthS T can be written value for its final state part and J value for the former part
as Q = {q1 , q2 , . . . , qT }, where qt ∈ 1≤i≤N si . R(Q) can
(i.e., the substate sequence part before the final state). It
be written as R(Q) = {r1 , r2 , ...., rK }, where rk ∈ S. Note l
means, if a substate sequence terminates with sjj at time
that, q1 ∈ r1 and qT ∈ rK . We name r1 as the initial
t(t > lj ) and its J value is δt (j), the J value of its former
state, and name rK as the final state. Because q1 and qT
part must be one of the δ values at time t − lj . Otherwise,
can be any intermediate substate between r1 and rK , the
there must be other substate sequence that also terminates
corresponding substate sequence parts for r1 and rK in Q l
can be incomplete. Nevertheless, substate sequence parts at time t with sjj , has larger J than it. Consequently, we can
for states between the initial state and the final state must calculate δt (j) based on δ values at time t−lj , and all δ values
be complete in Q. can be obtained by recurrence. When recurrently calculating
Then, given an observation sequence O = {o1 , o2 , . . . , oT } δt (j), we also use a quantity φt (j) to record which previous
of length T , a candidate substate sequence Q and its corre- state’s δ at time t − lj maximizes δt (j).
sponding state sequence R(Q), the probability of Q given O Next, we give the formal recurrence relation and initial-
is p(Q|O) = p(Q, O)/p(O). Because p(O) is the same for all ization step for state sj . For the sake of simplicity, we
candidate Qs, to find the most probable substate sequence, use Ω(sj , m, n) to represent the partial substate sequence
we only need to compare the p(Q, O) part, given as, in state sj , starting with sm n
j and terminating with sj , i.e.,

Ω(sj , m, n) = {sm m+1


j , sj , . . . , sn
j },
p(Q, O) = p(O|Q) · p(Q)
Y Y where 1 ≤ m ≤ n ≤ lj .
= e(ot , qt ) · [b(q1 ) · a(rk−1 , rk )],
1≤t≤T 2≤k≤K Recurrence. When t ≥ 1 + lj and t ≤ T , according to
Equation 2, we have
where b(q1 ) is the probability that
Q q1 is the initial substate. δt (j) = [max δt−lj (i)] · J(sj , {ot+1−lj , . . . , ot })
As indicated by Equation 1, 2≤k≤K a(rk−1 , rk ) equals 0 i

if R(Q) contains any invalid state transition, otherwise it is φt (j) = argmax δt−lj (i) (3)
i
b(q1 ), whose value is the same for different q1 . Consequently,
the likelihood value J for a substate sequence Q given O can s.t. a(si , sj ) = 1.

1023
When t ≥ 1 + lj and T < t ≤ T + lj − 1, we tackle substate 82 − 1) × 55 ≈ 3.9 × 105 . Hence, we can reduce the memory
sequences terminating in the middle of state sj at time T . overhead by about 96.1%.
In this case, δ is calculated by, The time complexity is mainly determined by the cost
of updating elements in the array. In both methods, the
δt (j) = [max δt−lj (i)] · J(s0j , {ot+1−lj , . . . , oT })
i cost of updating one element consists of two parts: one is
φt (j) = argmax δt−lj (i) evaluating the likelihood of the state given the observation,
i e.g., J(sj , {ot+1−lj , . . . , ot }) in Equation 3, and the other one
s.t. a(si , sj ) = 1, is enumerating previous states, e.g., maxi δt−lj (i) in Equa-
tion 3. Assume the complexity of calculating the likelihood
where s0j = Ω(sj , 1, T − t + lj ). for one instruction instance is O(1). In our method, let us
Initialization. δt (j) and φt (j) for 1 ≤ t ≤ lj are set by consider the worst case that every element is updated with
initialization. Because lj may be greater than T , we have Equation 3. Then at time t, Y states need to evaluate the
likelihood for X instructions in total and each state needs
• when 1 ≤ t ≤ lj and 1 < t ≤ T to enumerate Y previous states. Hence, the total time com-
δt (j) = J(s0j , {o1 , . . . , ot }), φt (j) = 0, plexity is O(T ×(X +Y 2 )). With the naive method, there are
X states. At time t, it also needs to evaluate X instructions,
where s0j = Ω(sj , lj − t + 1, lj ). but each state needs to enumerate X states. Therefore, the
total time complexity is O(T × (X + X 2 )), which is much
• when 1 ≤ t ≤ lj and t > T larger than ours.
δt (j) = J(s0j , {o1 , . . . , oT }), φt (j) = 0,
where s0j = Ω(sj , lj − t + 1, lj − t + T ). 5. OBSERVATION SYMBOL AND EMIS-
SION DISTRIBUTION FUNCTION
φt (j) = 0 indicates sj , terminating at t, is the initial state.
A good observation symbol design should enable us to
With above, if a substate sequence is of length T and uses recover the instruction sequence accurately, and reduce the
sj as final state, its maximal J value is given by overhead of building emission distribution function at the
same time. In order to achieve the above objectives, we
max{δT (j), . . . , δT +lj −1 (j)}.
need to solve the following two problems.
Hence, the J value of the most probable substate sequence First, because we build individual emission distribution
is given by function for each instruction type, we should design the ob-
servation symbol in such a manner that it facilitates rec-
max{max{δT (j), . . . , δT +lj −1 (j)}}.
j ognizing instruction type. Consequently, signal extraction
techniques are employed to increase the correlation between
Once the J value of the most probable substate sequence
observation symbol and instruction type.
is located, we can reconstruct the most probable substate
Second, because a chunk of power trace that corresponds
sequence by backtracking the φ value accordingly, similar to
to one instruction instance could contain hundreds of sample
classic Viterbi algorithm.
points, there is significant overhead to model the distribution
4.4 Complexity Analysis of such a high-dimensional variable. Therefore, dimension
reduction technique is used to reduce computational com-
In this subsection, we analyze our method’s complexity,
plexity.
by comparing it to the naive method that treats each in-
In the following, we first discuss our signal extraction tech-
struction instance as individual state and uses classic Viterbi
nique, and then present the overall design flow of our obser-
algorithm to solve it. Because the instructions inside a basic
vation symbol.
block always run in the same order, it means most instruc-
tion instances in the code only have single possible previous 5.1 Signal Extraction
instruction. However, classic Viterbi algorithm updates the
From the viewpoint of frequency domain, power signal is
δ value for a state at time t by enumerating all states’ δ val-
synthesized from different frequency components. The ob-
ues at time t − 1 and records the φ value for every state at
jective of signal extraction in this work is to select those
every moment, which is unnecessary for most instructions.
frequency components that are highly correlated to instruc-
Suppose a program has X instruction instances and Y ba-
tion type and filter out other components.
sic blocks. Because we usually observe MCU execution for
a long time, the observation sequence length, denoted by T , Frequency Components Selection. The raw power sig-
should be much larger than the length of the longest basic nal represents the total power consumption of the MCU,
block, denoted by lmax . Both classic Viterbi algorithm and and there are at least four factors that affect MCU power
our revised one can be implemented in a dynamic program- consumption when an instruction is executed. First, instruc-
ming manner. tion type affects power consumption by designating the mi-
The space complexity is mainly determined by the size cro operations of the processor. Next, when executing an
of the array used in dynamic programming, which records instruction, the instruction operands and the instruction ex-
φ and δ for every state at every moment. Then, the space ecuted prior to it affect the low-level switching activities of
complexity of the naive solution is O(T × X), and ours is the circuit. Finally, environment noise would also have some
O((T + lmax − 1) × Y ) = O(T × Y ). Let us take aes case impact on the obtained power trace. The last three factors
shown in Table 1 as an example. lmax in aes is 82. If T would interfere with instruction type recognition, and their
is 7065, then the size of the array with the naive method impact can be mitigated by increasing the correlation be-
is 7065 × 1472 ≈ 107 and that with our method is (7065 + tween observation symbol and instruction type.

1024
Usually, a frequency component can be represented by its
amplitude value Acom . As raw power signal is determined
by four factors, we simply model Acom as linear combination
of two parts, given by
Acom = Atype + Aother , (4)
where Atype is determined by instruction type, and Aother
represents the part determined by instruction operand, pre-
vious executed instruction and environmental noise together.
We assume Atype and Aother in Equation 4 are indepen-
dent. Ideally, different instruction types have different Atype Figure 4: Observation Symbol Design Flow.
values and the same type of instruction has the same Atype
value. In this case, to evaluate the correlation between a zero out the amplitude values of those inappropriate fre-
frequency component and instruction type, we can use the quency components, and generate the filtered power trace
correlation between Atype and Acom instead. by Inverse Fast Fourier Transformation.
We use Pearson’s correlation coefficient to evaluate the
correlation. Then the correlation between Atype and Acom
5.2 Overall Design Flow
is given by Figure 4 shows our observation symbol design flow. The
input is a set of power traces with various instruction in-
cov(Acom , Atype ) stances. Among these instruction instances, the instruction
ρ(Acom , Atype ) = p ,
Dcom Dtype type, the instruction operand and instruction executed prior
to the sampled instruction are all randomly changed.
where cov(Acom , Atype ) is the covariance between Atype and First, we conduct signal extraction according to the given
Acom , Dcom is the variance of Acom , and Dtype is the vari- set of power traces and generate the filtered power traces.
ance of Atype . Next, we conduct dimension reduction with principle com-
Because Atype and Aother are independent, we have ponent analysis (PCA) [33]. PCA can generate a linear
cov(Atype , Atype ) + cov(Aother , Atype ) transformation function that maps high-dimensional power
ρ(Acom , Atype ) = p signal to a lower-dimensional signal, while the transforma-
Dcom Dtype
r tion preserves useful information as much as possible. When
Dtype + 0 Dtype applying PCA, we need to decide the dimensionality of the
= p = .
Dcom Dtype Dcom obtained lower-dimensional signal. To solve this problem, we
evaluate how dimensionality affects instruction type recog-
Therefore, we should select those frequency components nition rate with statistical classifiers, e.g., Naive Bayes clas-
with the following two characteristics, sifier, and use the smallest dimensionality contributing to
the highest recognition rate.
1. Dtype /Dcom should be as large as possible in order to
Finally, we use the low-dimensional signal obtained after
obtain larger correlation ρ(Acom , Atype ).
applying PCA as our observation symbol. For each instruc-
2. In addition, the magnitude of Dtype should be as large tion type, we fit its emission distribution with Multivari-
as possible. Larger Dtype means the difference on Atype ate Gaussian Distribution Model, based on the above power
among different instruction types is more significant, trace set.
which is easier to be captured.
6. ABNORMAL EXECUTION TRACKING
Evaluating Dcom and Dtype . First, evaluating Dcom is
Till now, we have shown how to recover the instruction
simple, because Acom value can be obtained by transforming
sequence, and solve the normal execution tracking problem.
the raw power signal from time domain to frequency domain.
In this section, we discuss how to detect abnormal execu-
Second, although we cannot measure Atype directly, Dtype
tion, based on the fact that in abnormal execution cases,
can be evaluated as follows. Because Aother and Atype are
the most probable sequences typically have a reduced like-
independent, if Aother keeps constant during sampling, the
lihood compared to the normal execution cases. Then, we
conditional variance of Acom in this case is equal to Dtype ,
discuss the possibility that attackers can evade our tracking
according to Equation 4. As a result, to evaluate Dtype , we
method.
can sample Acom by randomly changing instruction types
while keeping instruction operand and previous executed in- 6.1 Detection via Likelihood Sequence
struction fixed. To make environmental noise constant, we
When invalid control transfers are introduced by control
can measure the power trace for every instruction instance
flow hijacking attack, our revised Viterbi algorithm would
multiple times and use the averaged power trace instead.
recognize the actual instruction sequence, deviating from
To further improve the accuracy, we can calculate Dtype
CFG, as another valid instruction sequence that obeys CFG
multiple times with different configurations of instruction
and has the largest probability to generate the observation
operands and previous executed instruction, and use the av-
sequence. Because the actual instruction sequence, contain-
eraged value when comparing different components.
ing abnormal execution, intends to implement a malicious
Filtering. Once the appropriate frequency components are function that does not exist in the original CFG, the actual
selected, it is straightforward to obtain the filtered power instruction sequence’s instruction type sequence is usually
signal. That is, we can obtain the frequency amplitude spec- different from that of any valid instruction sequence defined
trum of one power trace with Fast Fourier Transformation, by the CFG. Therefore, the type sequence of the actual

1025
instruction sequence is different from that of the reported by introducing invalid transitions among basic blocks. We
instruction sequence. With the above, when tracking ab- assume attackers know the power fingerprint of every basic
normal execution, some instruction instances in the actual block in the code.
sequence would be incorrectly recognized as the wrong type We assume that the target program has m basic blocks,
of instruction instances in the reported sequence. This phe- each basic block has n instructions (excluding the final con-
nomenon can thus be used for abnormal execution tracking. trol transfer instruction), and each basic block has v (v ≤ m)
valid next basic blocks in CFG (i.e., outdegree of any node in
Ideal Case. Ideally, when tracking normal execution, in-
CFG is v). Given different instruction types may have the
struction instances should be all correctly recognized. Hence,
same power emission, we assume that the instruction set
if we could distinguish between correctly recognized instruc-
contains a instruction types in total and can be divided into
tion instances and incorrect ones, we are able to detect ab-
b groups, where each group contains a/b instruction types on
normal execution. In order to achieve this objective, we
average and the instruction types in the same group have the
examine the likelihood of the reported instruction instance
same power emission. We can only distinguish instruction
m given the corresponding observation v, i.e., e(v, m), which
types from different groups via power side-channel.
is also the conditional probability of v given m. When m is
When CPROP wants to insert a malicious basic block af-
reported with incorrect recognition, the corresponding ob-
ter a legitimate basic block, she creates an invalid transition.
servation, denoted by vinc , is actually generated by another
To evade detection, CPROP should create the malicious ba-
instruction of different type. Because different instruction
sic block so that its power fingerprint is the same as one of
types usually generate different observations, m is not likely
the original v valid ones. In the worst case, CPROP can use
to generate vinc . If m is reported with correct recognition,
one of the m−v basic blocks (i.e., the ones that create invalid
the corresponding observation, denoted by vc is generated
transition) as the malicious one and the resulting adversarial
by m itself. Hence, we have e(vinc , m) < e(vc , m).
instruction sequence only deviates from the valid instruction
Calibrated Likelihood. Motivated by the above, for each sequence by a single basic block. Let us denote the prob-
instruction instance in the code, we record its average like- ability that such adversarial sequence evades detection by
lihood value in normal execution. When detecting abnor- Pevade . If the instructions in basic blocks are randomly and
mal execution, for each instruction instance in the reported independently distributed,
instruction sequence, we subtract the recorded average like-
1
lihood for this instruction instance from its current likeli- Pevade = 1 − [1 − ( )n ]v(m−v) . (5)
hood value, and we name the obtained difference as cal- b
ibrated likelihood. Then, the calibrated likelihood se- The second term in Equation 5 gives the probability that
quence is given by any of the m − v malicious candidates has a different power
fingerprint from those of v valid ones, i.e., the probability
{e(o1 , q1 ) − h(q1 ), e(o2 , q2 ) − h(q2 ), . . . , e(oT , qT ) − h(qT )},
of detecting the adversarial sequence. We can expand this
where h(qt ) is the average likelihood value of the instruction term in Binomial series, then
instance qt in normal execution. k
If the instruction instance is correctly recognized, its cal- X k(k − 1) . . . (k − t + 1)xt
Pevade = kx − [(−1)t ], (6)
ibrated likelihood should be around zero. Otherwise, the t!
t=2
calibrated likelihood for incorrectly recognized instruction
1
instance should be biased to be negative. where x = ( )n and k = v(m − v).
Note that, although an additional average likelihood num- b
t
ber is recorded for each instruction instance, this overhead is When kx < 1, the absolute value of (−1)t k(k−1)...(k−t+1)x
t!
much smaller than that of building and recording individual decreases as t increases. Hence, the second term on the right
emission distribution function for each instruction instance. hand side of Equation 6 is always positive. Then we have
6.2 Security Analysis v(m − v)
Pevade < kx = .
Given different instruction types may have the same power bn
emission model, an attacker may try to launch a mimic at-
For the code size m and the basic block size n that are
tack, which evades our detection by constructing adversarial
typical for mid-size embedded devices, the magnitude of kx
instruction sequence whose power consumption fingerprint
is small and Pevade is close to 0 with the following reasons.
happens to be valid. In this section, we discuss the proba-
First, bn is exponentially proportional to n. Second, differ-
bility of such attack.
ent instruction types’ power emissions usually provide suf-
Threat Model. To construct a malicious sequence to ful- ficient diversity and b is large. For example, b ≈ 102 for
fill an adversary’s hidden agenda from scratch is challenging. the MCU used in our experiments, because a = 152 and the
We imagine that an adversary will utilize the well-known at- accuracy of classifying instruction types can be estimated
tack (i.e., Call-Preceded Return-Oriented- Programming, in by b/a, which is 70% in our case as shown in Section 7.2.3.
short CPROP [25]) to reuse the existing code to accomplish Third, k ≤ 0.25m2 , kx ≤ m2 /4bn . Hence, kx is small for a
this goal. Thus, for illustration purpose, we analyze the like- typical code size m and a basic block size n. For instance,
lihood of mimic attackers that utilize CPROP. CPROP mali- for a target code of 106 basic blocks, we only need n > 7
ciously redirects the target of the ret instruction to a wrong to guarantee Pdetect > 99.75% with the MCU used in our
instruction whose preceding instruction is a call instruc- experiments.
tion. Without loss of generality, suppose all control trans- Thus, the probability of constructing an adversarial in-
fers in a program are caused by function calls or returns. struction sequence with a valid power consumption finger-
Then CPROP constructs adversarial instruction sequence print is close to 0 in general.

1026
7. EVALUATION Name Description # of # of Measured
Inst. BB. Inst.
In this section, we conduct various experiments to evalu- aes AES-128 1427 55 7065
ate the proposed solution. First, we describe the hardware sqroot Square root 1002 98 3800
and software platforms used for evaluation and introduce sort Bubble sort 233 37 4430
the performance metrics used in this work. The evaluation matrix Matrix 413 30 7065
results are divided into four parts: designing observation multiplication
symbol, tracking normal execution, tracking abnormal exe- pid Simulate cruise 1572 199 7065
control in car
cution, and tracking execution on different chips.
dct Discrete cosine 560 51 7065
transform
7.1 Experimental Setup gcd Euclidean 69 11 135
algorithm
MCU under test. The method proposed in this paper fib Fibonacci 159 24 782
actually can be applied to any MCU model, as long as the sequence
execution time of every instruction is a constant. Many csumex Cumulative 89 12 665
MCU architectures in current market satisfy this require- sum chart
ment, such as PIC12 [13] , 8bit AVR [32] and Intel’s 8051 [34].
STC89C52, an implementation of 8051 architecture, is used Table 1: Benchmark suite.
in this evaluation. Since there is no external RAM on its
evaluation board, instructions relevant to external RAM
(e.g., MOVX) are excluded from evaluation. Most instructions
in this MCU cost only one machine cycle. For instructions
costing 2 or 4 machine cycles, we treat them as 2 or 4 dif-
ferent single-cycle instructions. As a result, the effective
instruction set contains 152 different single-cycle instruction
types. This MCU is clocked at 11.0592M Hz using an ex-
ternal oscillator.
Power measurement. To measure the power consumption
of the MCU under test, a resistor of 46.7Ω is placed between Figure 5: (a) Normalized Dtype /Dcom and Dtype for differ-
VCC pin of the MCU and its power supply, and the volt- ent frequency components. (b) Classifying instruction type
age drop over it is measured using a Tektronix MDO3034 after PCA, when signal extraction is used (E) and not used
oscilloscope with sampling rate of 1.25GS/s. (NOE).
Benchmark programs. Our benchmark suite consists
of 9 programs, in which eight of them are from Dalton
Project [35] that is used to evaluate the performance of 8051 We denote the HMM defined in [13] with prefix TYPE and
MCUs. The remaining one is an implementation of AES-128 our proposed HMM with prefix BB (means Basic Block).
encryption algorithm migrated to our MCU. The details of We use suffix E and NOE to indicate whether signal extrac-
these nine programs are shown in Table 1, including the tion technique is used or not in designing observation sym-
number of instruction instances (# of Inst.), the number of bol. Therefore, there are four configurations to be evalu-
basic blocks (# of BB.), and the length of instruction se- ated: TYPE NOE, TYPE E, BB NOE and BB E, where
quence tracked during program execution (Measured Inst.). TYPE NOE corresponds to the method in [13]. All the four
For the programs matrix, aes, pid and dct, we only mea- configurations run on the same server with Intel Xeon E5-
sure 7065 executed instructions, because their power traces 2609 CPU and 16GB RAM.
for a complete execution will go beyond the maximal length
that can be measured with our experimental setup. For all 7.2 Observation Symbol Design
the other programs, power traces of a complete execution In this section, we demonstrate how to design observa-
are recorded. tion symbol and its impact on instruction type recognition
Evaluation Metrics. We evaluate our method with two accuracy. In our experiment, the set of power traces used
metrics. The first one is Instruction Sequence Accuracy for designing observation symbol consists of about 180,000
(ISA), which demonstrate the accuracy of the recognized in- power traces from various instruction instances, measured
struction instance in the reported instruction sequence. The on the same chip.
second metric is Type Sequence Accuracy (TSA), which only
measures the accuracy of the recognized instruction types. 7.2.1 Signal Extraction
TSA is a more important metric, because the performance Let us first estimate Dtype /Dcom and Dtype . We obtain
of some configurations (to be introduced below) cannot be the frequency amplitude spectrum of the power traces with
measured by ISA and our method relies on the instruction Fast Fourier Transofrmation. Figure 5(a) shows Dtype /Dcom
type information to track program execution. and Dtype within frequency range 0 ∼ 100M Hz, where the
We compare our work with the method proposed in [13], value of Dtype /Dcom and Dtype are normalized for presen-
which recovers the instruction type sequence by treating ev- tation. Based on our discussion in Section 5.1, we only select
ery instruction type as a state in classic HMM, and instruc- frequency components within range (0M Hz, 11.38M Hz), be-
tion type transition probabilities are extracted from code cause both Dtype /Dcom and Dtype within this range are
under test. Their observation symbol is obtained by con- larger than those outside of this range and they are used
ducting dimension reduction on raw power signal directly. for instruction type recognition.

1027
Configuration TSA(%) Average
aes csumex dct fib gcd matrix pid sort sqroot
TYPE NOE 48.09 99.52 73.26 56.59 81.18 86.69 37.04 92.20 56.42 70.11
TYPE E 57.72 99.61 70.77 93.48 78.53 95.62 56.57 97.89 57.62 78.65
BB NOE 99.89 100.00 100.00 100.00 98.24 99.97 93.43 100.00 99.75 99.03
BB E 99.88 100.00 100.00 100.00 100.00 99.98 99.80 100.00 99.78 99.94
ISA(%)
BB NOE 90.55 100.00 100.00 100.00 98.24 99.97 92.81 100.00 97.04 97.62
BB E 90.49 100.00 100.00 100.00 100.00 99.98 99.48 100.00 97.09 98.56

Table 2: TSA and ISA in normal execution tracking. The last column shows the average value for each row.

7.2.2 Dimension Reduction with PCA achieves 37.04% TSA for pid, while it achieves 99.52% TSA
To decide the dimensionality of the final low-dimensional for csumex. On average, our most powerful method BB E
signal, we examine the instruction type recognition rate with can outperform TYPE NOE by 42.55%. This is consistent
Naive Bayes classifier and Gaussian Bayes classifier. with our expectation, because BB model preserves more
Figure 5(b) shows the instruction type recognition rates knowledge from CFG and hence it has a higher probabil-
for different classifiers after applying PCA, when signal ex- ity to track the execution correctly.
traction is used and not used, respectively. For both clas- Second, on average, signal extraction technique can im-
sifiers that we have tested, cases with signal extraction re- prove TSA by 12.18% for TYPE model, and 0.92% for BB
quire only about 10 dimensions to achieve the maximum model. Earlier we observed that the maximum recognition
recognition rate, meanwhile the non-filtered cases require rate with Gaussian classifier almost keeps unchanged, no
much more dimensions (about 35 shown in the figure) to matter whether signal extraction is used or not. This is not
achieve the same value. Given this, if signal extraction is contradictory to our observation here. The difference lies
used, we use signals consisting of the first 10 dimensions af- in experimental setup, i.e., in instruction type classification
ter applying PCA as observation symbol, otherwise we use experiment, the instruction type is uniformly distributed,
signal consisting of first 35 dimensions after applying PCA which is different from the distributions in actual programs.
as observation symbol. For both cases, emission distribution Third, configurations with BB model can also achieve very
function of each instruction type is built with Multivariate high ISA, which is over 97% on average. It demonstrates
Gaussian Model. that, by precisely recovering the executed instruction’s type,
it is sufficient for our method to track which instruction in-
7.2.3 Effectiveness of Signal Extraction stance in the code is executed. On some programs, ISA is
lower than TSA, such as aes. We manually check the results
We have another two observations from Figure 5(b). First,
of aes case, and find there are two basic blocks in the pro-
as fewer dimensions are required when signal extraction is
gram which only differ at one location in their instruction
used, it means our signal extraction technique can reduce
type sequences. At this location, one block uses XRL A,R0
the complexity in building the emission distribution func-
instruction and the other one uses XRL A,direct instruc-
tion with Multivariate Gaussian Model. Second, with Naive
tion. These two instructions both implement exclusive-or
Bayes classifier, the maximal recognition rate for case with
function, differ in addressing mode, and belong to differ-
signal extraction is larger than that of case without signal
ent types. When one of them is incorrectly recognized as
extraction. This means, by selecting frequency components
the other one, TSA only treats this XRL instruction is incor-
of larger Dtype /D and Dtype , we can recognize instruction
rectly recognized while ISA regards all the instructions in
types more accurately. But the improvement on the max-
the basic block are incorrectly recognized. Therefore, ISA is
imum recognition rate almost disappears for the Gaussian
much smaller than TSA in this case.
Bayes classifier case, where the maximum recognition rate
To sum up, our BB model outperforms the original TYPE
is about 70% for both cases. One possible reason is that,
model significantly. The signal extraction technique further
Gaussian Bayes classifier considers the dependency among
improves execution tracking accuracy.
different dimensions that can help instruction type recogni-
tion, and the improvement introduced by signal extraction
is thus much smaller. 7.4 Abnormal Execution Tracking
Because designing a full-fledged CFI method is beyond
7.3 Normal Execution Tracking the scope of this work, in this subsection, we mainly demon-
Table 2 lists the accuracy of normal execution tracking for strate that abnormal execution could decrease the reported
different configurations with different programs. For each calibrated likelihood values, compared to that of normal ex-
configuration and each program, we track its execution for ecution cases.
five times and the average accuracy value is reported in the We use firmware modification attack as an example. In-
table. From table 2, we have the following observations. tuitively, less modification on the original code is more dif-
First, using BB model can always achieve higher TSA ficult to be detected. Given this, we first study single in-
than using TYPE model. No matter whether signal extrac- struction replacement, insertion and deletion cases on aes
tion technique is used or not, with BB model, the TSA for program, which do not change the control transfers after
tracking all 9 programs is over 93%. In particular, when modification. In these three cases, we respectively replace
BB E configuration is used, the TSA is always over 99.7%. one NOP instruction with an ADD A,0x00 instruction, insert a
For configurations with TYPE model, TSA varies a lot and new NOP instruction, and delete an existing NOP instruction.
is quite low in some cases. For example, TYPE NOE only All these modifications are conducted at the beginning of

1028
Figure 6: Calibrated log likelihood of the first 4000 instruction instances in reported instruction sequence for (a) normal
execution, (b) single instruction replacement, (c) single instruction insertion, and (d) multi-instruction modification.
SubByte function in aes program, which is called 16 times tion flow may be incorrectly recognized as one state, which
within one measurement. Next, to study multi-instruction affects the following basic block.
modification case, we simulate an attack that replaces aes For the multi-instruction modification case (Figure 6(d)),
with dct during execution. Each power measurement covers the calibrated likelihood for most instruction instances is bi-
7065 instructions and BB E configuration is used as execu- ased to be negative, and the mean value is -87.7743, which
tion tracking method. is much smaller than that in the normal execution case. The
Figure 6 shows the calibrated log likelihood sequences in degradation of the calibrated likelihood here is more signifi-
the attack cases and the normal execution case. Because cant than the single instruction modification cases, which is
the magnitude of the original likelihood value is sometimes consistent with the intuition that multi-instruction modifi-
quite small, we perform calibration on log likelihood instead. cation is easier to be detected.
From the results, we have the following observations.
For the normal execution case (Figure 6(a)), the cali- 7.5 Execution Tracking on Different Chips
brated likelihood for most instruction instances are close
to zero and the mean value is -0.0284. Although several Chip TSA(%)
calibrated likelihood values in the sequence deviate from No. TYPE NOE TYPE E BB NOE BB E
zero a lot, indicated by green circle, all these values cor- Chip1 44.84 79.04 93.66 99.94
Chip2 75.28 79.57 99.62 99.93
respond to the same instruction instance whose type is MOVC
Chip3 67.94 79.51 99.60 99.93
A,@A+DPTR, and their mean value is -43. Most registers Chip4 73.26 71.50 99.62 99.92
in 8051 are 8-bit registers, but DPTR consists of 16 bits.
Avg. 65.33 77.40 98.13 99.93
Hence, the power consumption of MOVC is more sensitive to STD 14.007 3.943 2.976 0.007
operands than other instructions, and its calibrated likeli-
hood varies more significantly. Table 3: Average TSA for four configurations on different
For the replacement case (Figure 6(b)), the distribution of chips. The last two rows show the average and standard
the calibrated likelihood is significantly biased with a large deviation for each column.
negative mean value. Based on the reported instruction se-
quence, we can group the multiple occurrences of the same In this subsection, we demonstrate that emission distri-
instruction instance and observe its calibrated likelihood’s bution function built with power traces from one chip (e.g.,
distribution. The ADD A,0x00 instruction after replacement Chip0) can be used to track code executions on other chips
is incorrectly recognized as NOP. All sixteen occurrences of (e.g., Chips 1∼4) from the same architecture family. Results
the replaced NOP, indicated by blue cross in the figure, have for normal execution tracking on Chip0 are listed in Table 2,
negative calibrated likelihood value, and their mean is - and the TSA results on Chip1∼4 using the same emission
287. This replaced NOP can be easily distinguished from distribution model derived from Chip0 are shown in Table 3.
the above-mentioned MOVC instruction. The calibrated like- These data can lead us to the following observations.
lihood values of MOVC instruction instance can be both neg- First, the emission distribution model derived from Chip0
ative and positive, indicated by green circle, and their mean work very well on Chips 1∼4. When comparing the average
value is -56. TSA accuracy of all chips (e.g., the Avg. row in table 3 and
For the insertion and deletion cases, the observations are the last column of table 2), we found that they are very close
similar and the result for the insertion case is given in Fig- to each other, even though applying the model to different
ure 6(c) as an example. We observe that the calibrated like- chips can still introduce small accuracy loss (the largest TSA
lihood for many instruction instances around the inserted degradation is 6.82% with TYPE NOE, and the degradation
(or deleted) NOP, indicated by blue dash line, is biased to for BB E is almost zero). There are two possible reasons
a large negative value. This is because, instruction inser- for such similarity. First, the power consumption of each
tion/deletion can cause multiple instruction instances around instruction is largely determined by its instruction types,
it to be incorrectly recognized. For example, if we delete the bacause instructions of the same type share many on-chip
first instruction from a 4-instruction basic block, the second hardware modules that contribute most of the overall power
instruction in this basic block can be incorrectly considered consumption, as shown in Figure 5. Second, different chips
as the start of this state during tracking, and the remain- used in this experiment have the same architecture and sim-
ing three instructions in this basic block together with one ilar layouts, so power consumptions of each instruction type
instruction from the next basic block in the actual execu- are very close among different chips. Although they may
still have some unique features, the variances introduced by
such differences are small.

1029
Another observation is that the TSA values for configura- as a state, and extracts transition probabilities between in-
tions with signal extraction techniques are more stable. This struction types. However, solely recovering instruction type
is shown by the standard deviation (i.e., the STD row) in is not able to locate instruction instance. Msgna et al. [32]
Table 3 where BB E and TYPE E have much smaller stan- tried to track execution flow by modeling one basic block in
dard deviation than BB NOE and TYPE NOE does. This is CFG as a state with classic HMM. However, their method
because signal extraction facilitates to eliminate certain fre- cannot tackle the general case where basic blocks have un-
quency components that are more sensitive to the difference equal length. Compared to the above works, our code ex-
between multiple chips (e.g., the static power corresponding ecution tracking method can locate the exact instruction
to frequency 0). instance during execution accurately.
Abnormality Detection via Side-Channel. Some works
8. LIMITATIONS AND FUTURE WORK detect abnormal execution by calculating the cross correla-
tion between examined execution’s side-channel trace, e.g.,
Though our method has significantly reduced the compu-
power trace [37, 38] and RF trace [39, 40], and the corre-
tational complexity compared to the naive solution, it can
sponding side-channel trace of golden execution. In prac-
still induce undesired overhead when the target program is
tice, however, it is difficult to determine the exact golden
large and contains a large number of instructions and basic
execution flow because embedded system’s execution usu-
blocks. Such a situation can get exacerbated when inter-
ally interacts with changeable environment and varies a lot
rupts are enabled during execution, because interrupts can
in different runs. By contrast, the detection technique based
be triggered at any time during code execution and cre-
on our tracking method has no such requirement. WattsUp-
ate various valid control transfers from every instruction in-
Doc [41] uses statistical tools to classify every 5-second power
stance to the beginning of interrupt service routines. Under
trace chunk’s corresponding execution to be normal or ab-
such circumstances, every instruction instance in the code
normal, where features, such as mean and variance, are used
becomes one individual basic block and it results in a larger
for classification. We have shown calibrated likelihood is a
number of states. To tackle this problem, a hierarchical
good feature for abnormal execution detection, and it can
code execution tracking method can be used. For instance,
be used to enhance WattsUpDoc.
when an interrupt is triggered, a processor needs to perform
special operations, e.g., context switching. It is possible to Side-channel Based Code Reverse Engineering. These
first identify such operations from power traces, determine methods focus on recovering the code in the system, instead
the power traces corresponding to the execution of interrupt of tracking the execution flow for a given code. Vermon et
service routines, and remove them. Then, we can concate- al. [42] recovered the bytecodes running on a Java smart
nate the remaining power trace segments, and conduct code card. However, this method requires calculating the av-
execution tracking on the newly-constructed power trace. erage power trace of the targeted sequence of bytecodes,
In our experiments, the execution of the benchmark pro- which is impractical for the general cases. Novak [43] and
grams do not use peripheral devices, and hence the mea- Clavier [44] showed how to recover the substitution tables
sured power trace is mainly contributed by the MCU itself. of secret A3/A8 algorithm, but their method is limited to
When peripheral devices are used, however, the correlation recovering the look-up table part. Goldack and Paar [45]
between instruction type and measured power trace would proposed to recover the type of single instruction instance
decrease and it may result in reduced accuracy in code ex- by building power consumption templates for every instruc-
ecution tracking. As a direction for our future work, one tion type. However, their template models the distribution
can increase the measuring points of power traces. That of raw power signal after simple dimension reduction. We
is, instead of measuring the overall power consumption of have demonstrated that dimension reduction itself does not
the system, we would collect power traces from multiple lead to high recognition accuracy, but it can be improved by
power pins on the MCU and investigate their correlations our signal extraction technique.
with the executed instructions, thereby mitigating the im-
pact of the peripheral devices on proposed code execution 10. CONCLUSION
tracking method. This paper proposes a non-intrusive yet highly-accurate
code execution tracking method for embedded systems uti-
9. RELATED WORKS lizing power-side channel. This is achieved with signal ex-
traction scheme to improve instruction type recognition and
In this section, we briefly discuss related works, includ- a revised Viterbi algorithm for effective instruction sequence
ing execution tracking via digital channels, normal execu- extraction. Experimental results show that our method is
tion tracking with side-channels, abnormality detection, and able to track code execution accurately in normal execution
code reverse engineering. tracking and effectively capture code modification in abnor-
Execution Tracking via Digital Channel. Some ARM- mal execution tracking.
based MCUs (e.g., Cortex-M3) contain a dedicated hard-
ware unit for code execution tracking, namely embedded 11. ACKNOWLEDGMENTS
trace macrocell (ETM) [36]. However, it is usually not prac- This work was supported in part by the Chinese Univer-
tical to cycle-accurately track code execution at normal CPU sity of Hong Kong internal grant No. 4055049, Hong Kong
speed with ETM. Moreover, many MCUs do not have such S.A.R. Research Grants Council (RGC) under Early Career
hardware support for execution tracking. Scheme No. 24207815, in part by National Natural Science
Normal Execution Tracking via Side-channel. Eisen- Foundation of China (NSFC) under Grant No. 61432017,
barth et al. [13] utilized HMM to recover the instruction type 61532017, 61572415, and 61472358, and in part by National
sequence during code execution. It treats an instruction type Science Foundation (CNS-1513107).

1030
12. REFERENCES [24] A. Moradi, et al. A generalized method of differential fault
attack against AES cryptosystem. In Proc. of
Cryptographic Hardware and Embedded Systems (CHES),
[1] P. C. Kocher, et al. Differential power analysis. In Proc. of 2006.
Advances in Cryptology (CRYPTO), 1999. [25] N. Carlini and D. Wagner. ROP is still dangerous:
[2] P. Dusart, et al. Differential fault analysis on A.E.S. In breaking modern defenses. In Proc. of USENIX Security
Proc. of Applied Cryptography and Network Security Symposium (USENIX Security), 2014.
(ACNS), 2003. [26] T. K. Bletsch, et al. Jump-oriented programming: a new
[3] A. Cui, et al. When firmware modifications attack: A case class of code-reuse attack. In Proc. of Symposium on
study of embedded exploitation. In NDSS, 2013. Information, Computer and Communications Security
(ASIACCS), 2011.
[4] A. Francillon and C. Castelluccia. Code injection attacks
on harvard-architecture devices. In Proc. of Conference on [27] A. One. Smashing the stack for fun and profit. Phrack
Computer and Communications Security (CCS), 2008. magazine, 1996.
[5] T. Goodspeed. Exploiting wireless sensor networks over [28] N. Carlini, et al. Control-flow bending: On the
802.15. 4. In Texas Instruments Developper Conference, effectiveness of control-flow integrity. In Proc. of USENIX
2008. Security Symposium (USENIX Security), 2015.
[6] M. Abadi, et al. Control-flow integrity. In Proc. of [29] F. E. Allen. Control flow analysis. In ACM Sigplan
Conference on Computer and Communications Security Notices, 1970.
(CCS), 2005. [30] L. R. Rabiner. A tutorial on hidden markov models and
[7] Ú. Erlingsson, et al. XFI: software guards for system selected applications in speech recognition. Proceedings of
address spaces. In Proc.s of Symposium on Operating the IEEE, 1989.
Systems Design and Implementation (OSDI), 2006. [31] C. Zhang, et al. Practical control flow integrity and
[8] Y. Cheng, et al. Ropecker: A generic and practical randomization for binary executables. In Proc. of
approach for defending against ROP attacks. In Proc. of Symposium on Security and Privacy (SP), 2013.
Network and Distributed System Security Symposium [32] M. Msgna, et al. The b-side of side channel leakage:
(NDSS), 2014. Control flow security in embedded systems. In Proc. of
[9] V. Pappas, et al. Transparent ROP exploit mitigation Security and Privacy in Communication Networks (ICST),
using indirect branch tracing. In Proc. of USENIX Security 2013.
Symposium (USENIX Security), 2013. [33] I. Jolliffe. Principal component analysis. 2002.
[10] L. Davi, et al. HAFIX: hardware-assisted flow integrity [34] I. S. MacKenzie. The 8051 microcontroller. 1998.
extension. In Proc. of Design Automation Conference
(DAC), 2015. [35] UCR Dalton Project. http://www.cs.ucr.edu/˜dalton/.
[11] M. Milenkovic, et al. Hardware support for code integrity [36] Embedded Trace Macrocells. http://www.arm.com/
in embedded processors. In Proc. of International products/system-ip/debug-trace/trace-macrocells-etm/.
Conference on Compilers, Architecture, and Synthesis for [37] C. R. A. González and J. H. Reed. Detecting unauthorized
Embedded Systems (CASES), 2005. software execution in sdr using power fingerprinting. In
[12] F. A. T. Abad, et al. On-chip control flow integrity check MILITARY COMMUNICATIONS CONFERENCE,
for real time embedded systems. In Proc. of Cyber-Physical 2010-MILCOM 2010, 2010.
Systems, Networks, and Applications (CPSNA), 2013. [38] C. R. A. Gonzalez and J. H. Reed. Power fingerprinting in
[13] T. Eisenbarth, et al. Building a side channel based sdr integrity assessment for security and regulatory
disassembler. Transactions on Computational Science, compliance. Analog Integrated Circuits and Signal
2010. Processing, 2011.
[14] OpenSSL. https://www.openssl.org/. [39] S. Stone and M. Temple. Radio-frequency-based anomaly
detection for programmable logic controllers in the critical
[15] D. Genkin, et al. RSA key extraction via low-bandwidth infrastructure. International Journal of Critical
acoustic cryptanalysis. In Proc. of Advances in Cryptology Infrastructure Protection, 2012.
(CRYPTO), 2014.
[40] S. J. Stone, et al. Detecting anomalous programmable logic
[16] N. Benhadjyoussef, et al. The research of correlation power controller behavior using rf-based hilbert transform features
analysis on a aes implementations. Journal of Intelligent and a correlation-based verification process. International
Computing Volume, 2011. Journal of Critical Infrastructure Protection, 2015.
[17] E. Brier, et al. Correlation power analysis with a leakage [41] S. S. Clark, et al. Wattsupdoc: Power side channels to
model. In Proc. of Cryptographic Hardware and Embedded nonintrusively discover untargeted malware on embedded
Systems (CHES), 2004. medical devices. In 2013 USENIX Workshop on Health
[18] J. Balasch, et al. An in-depth and black-box Information Technologies, HealthTech ’13, 2013.
characterization of the effects of clock glitches on 8-bit [42] D. Vermoen, et al. Reverse engineering java card applets
mcus. In Proc. of Workshop on Fault Diagnosis and using power analysis. In Proc. of Information Security
Tolerance in Cryptography (FDTC), 2011. Theory and Practices (WISTP), 2007.
[19] A. Dehbaoui, et al. Electromagnetic transient faults [43] R. Novak. Side-channel attack on substitution blocks. In
injection on a hardware and a software implementations of Proc. of Applied Cryptography and Network Security
AES. In Proc. of Workshop on Fault Diagnosis and (ACNS), 2003.
Tolerance in Cryptography (FDTC), 2012.
[44] C. Clavier. Side channel analysis for reverse engineering
[20] NIST FIPS Pub. Advanced encryption standard (AES). (SCARE) - an improved attack against a secret A3/A8
Federal Information Processing Standards Publication, GSM algorithm. IACR Cryptology ePrint Archive, 2004.
2001.
[45] M. Goldack and I. C. Paar. Side-channel based reverse
[21] P. Derbez, et al. Meet-in-the-middle and impossible engineering for microcontrollers. Master’s thesis,
differential fault analysis on AES. In Proc. of Cryptographic Ruhr-Universität Bochum, Germany, 2008.
Hardware and Embedded Systems (CHES), 2011.
[22] Y. Liu, et al. DERA: yet another differential fault attack
on cryptographic devices based on error rate analysis. In
Proc. of Design Automation Conference (DAC), 2015.
[23] R. Lashermes, et al. A DFA on AES based on the entropy
of error distributions. In Proc. of Workshop on Fault
Diagnosis and Tolerance in Cryptography (FDTC), 2012.

1031

You might also like