0% found this document useful (0 votes)
25 views

Usenixsecurity24 Wu Yuhao

Uploaded by

philodean
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Usenixsecurity24 Wu Yuhao

Uploaded by

philodean
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Your Firmware Has Arrived: A Study of

Firmware Update Vulnerabilities


Yuhao Wu, Jinwen Wang, Yujie Wang, Shixuan Zhai, and Zihan Li, Washington
University in St. Louis; Yi He, Tsinghua University; Kun Sun, George Mason University;
Qi Li, Tsinghua University; Ning Zhang, Washington University in St. Louis
https://www.usenix.org/conference/usenixsecurity24/presentation/wu-yuhao

This paper is included in the Proceedings of the


33rd USENIX Security Symposium.
August 14–16, 2024 • Philadelphia, PA, USA
978-1-939133-44-1

Open access to the Proceedings of the


33rd USENIX Security Symposium
is sponsored by USENIX.
Your Firmware Has Arrived: A Study of Firmware Update Vulnerabilities

Yuhao Wu† , Jinwen Wang† , Yujie Wang† , Shixuan Zhai† ,


Zihan Li† , Yi He§ , Kun Sun‡ , Qi Li§ , Ning Zhang†
† Washington University in St. Louis,
§ Tsinghua University, ‡ George Mason University

Abstract ber attacks. However, with the increasing connectivity and


complexity in modern embedded systems, there is an increas-
Embedded devices are increasingly ubiquitous in our soci-
ing number of cyber attacks targeting specifically the firmware
ety. Firmware updates are one of the primary mechanisms
update procedures, allowing the adversary to execute arbitrary
to mitigate vulnerabilities in embedded systems. However,
code or roll back the firmware version to expose prior vulnera-
the firmware update procedure also introduces new attack
bilities [21, 70]. Recent vulnerabilities in update mechanisms
surfaces, particularly through vulnerable firmware verifica-
of Jeep Cherokee [52], Samsung SmartThings Hub [8], and
tion procedures. Unlike memory corruption bugs, numerous
Asus Router [9] have raised significant concerns, highlighting
vulnerabilities in firmware updates stem from incomplete or
the need for automatic identification of firmware update vul-
incorrect verification steps, to which existing firmware analy-
nerabilities. Existing firmware vulnerability detection meth-
sis methods are not applicable. To bridge this gap, we propose
ods often focus on identifying invocations to unsafe sinks
ChkUp, an approach to Check for firmware Update vulner-
from user-controlled inputs [19, 25, 27, 59, 65] or finding bugs
abilities. ChkUp can resolve the program execution paths
using common vulnerable patterns or deviations from known
during firmware updates using cross-language inter-process
specifications [24, 33, 49, 50, 64]. However, firmware update
control flow analysis and program slicing. With these paths,
vulnerabilities pose a unique challenge as they often arise
ChkUp locates firmware verification procedures, examining
from a combination of issues across multi-stage update proce-
and validating their vulnerabilities. We implemented ChkUp
dures, exacerbated by the absence of comprehensive specifica-
and conducted a comprehensive analysis on 12,000 firmware
tions or systematic categorizations of vulnerable patterns. To
images. Then, we validated the alerts in 150 firmware images
gain a better understanding of the landscape of firmware up-
from 33 device families, leading to the discovery of both zero-
date vulnerabilities, we categorize and systematize firmware
day and n-day vulnerabilities. Our findings were disclosed
update-related CVEs in the past decade. Each type of vul-
responsibly, resulting in the assignment of 25 CVE IDs and
nerability is placed in different abstract phases of a general
one PSV ID at the time of writing.
firmware update procedure, which is detailed in Section 2.

1 Introduction Our Solution - ChkUp. In this paper, we propose ChkUp,


a novel approach to Check for firmware Update vulnerabil-
The rapid increase of embedded devices, ranging from ities, including missing verification (e.g. a lack of version
portable devices, such as smartwatches, to large machines check) and improper verification (e.g., use of MD5 for in-
like transport vehicles, brings more connectivity and conve- tegrity check). Intuitively, ChkUp extracts the program exe-
nience to our lives. The market size of embedded devices is cution paths for a firmware update procedure and then iden-
expected to reach 116.2 billion dollars in 2025 [10]. Firmware tifies the chain of verification steps in the update procedure.
update plays an important role in fixing vulnerabilities and We then summarize the vulnerable patterns across multiple
improving functionalities. However, a poorly implemented firmware update phases for vulnerability detection. While
firmware update mechanism can diminish the benefit or even working on this, we found that existing implementations of
introduce new attack surfaces. In fact, vulnerabilities related firmware update mechanisms present unique challenges that
to software update has been recognized as one of the top five require new techniques. Specifically, there are three main
security risks for embedded devices [12]. technical challenges:
Firmware Update Security. Software or firmware updates C1. Diverse System Components Supporting Firmware Up-
are currently one of the most effective techniques against cy- date: In many Linux-based firmware images, various types of

USENIX Association 32nd USENIX Security Symposium 5627


programs such as front-end programs, scripts, and binaries are Then, we performed vulnerability validation on 150 randomly
invoked in the software update execution path that spans both selected firmware images: For those emulatable firmware im-
the front end and back end of the web server. Additionally, the ages, we performed dynamic validation by creating PoCs; for
diversity of components involved in firmware updates leads the remaining firmware images, we conducted manual analy-
to the heterogeneity of inter-process communication (IPC) sis to validate their alerts. Our results showed a true positive
mechanisms. To tackle these challenges, we develop tech- rate (TPR) of 86.7% and a false positive rate (FPR) of 5.3%,
niques to generate the inter-process update flow graph (UFG). leading to the discovery of both zero-day and n-day vulner-
The entry program is first identified using common code pat- abilities in firmware images from 33 device families. These
terns and semantic information that interconnects the front findings were responsibly disclosed, and 25 CVE IDs and
end and the back end in a firmware update procedure. Then, one PSV ID have been assigned. Finally, to demonstrate the
cross-language control flows are extracted by connecting the exploitability of the vulnerabilities, we showcased firmware
control flows of individual programs (i.e., front-end programs, downgrade and firmware modification attacks.
scripts, and binaries) and resolving the corresponding IPCs. Contributions. Our contributions are outlined as follows:
With such a cross-language cross-program control flow graph,
firmware update execution paths are resolved using backward • A systematization of firmware update security: We frame
program slicing with firmware update-specific semantics. general firmware update procedures into four phases and
examine security issues in each phase by analyzing and
C2. Verification Procedure Recognition: A firmware update is categorizing 381 firmware update-related CVE reports.
a complex procedure that includes a variety of checks and ver- • A new approach for update vulnerability detection: We
ifications, from the verification of cryptographic signatures propose a new firmware update vulnerability identifica-
to the comparison of versions and device IDs. An update tion approach, ChkUp, which addresses three technical
procedure is considered secure only when every verification challenges: diverse components in update paths, verifica-
step (e.g., signature verification or compliance with version tion procedure recognition, and vulnerability validation.
update policies) and their composition are properly imple- • Vulnerabilities on real-world firmware: We ran ChkUp
mented. However, firmware updates do not have standard- on 12,000 firmware images and validated alerts for 150
ized specifications and are often implemented in multiple of them using a combination of proof-of-concept (PoC)
programming languages. This leads to diverse verification generation and manual analysis. The results demonstrate
implementations, making it challenging to identify them from ChkUp’s capability to identify zero-day and n-day vul-
numerous functions in execution paths. To tackle this chal- nerabilities. Following responsible disclosure, 25 CVE
lenge, ChkUp recognizes firmware verification procedures IDs and one PSV ID have been assigned.
using data flow graph (DFG) isomorphism-based semantic
similarity matching. To reduce the overhead of this process,
functions in execution paths are first ranked by a similarity 2 Firmware Update Security Systematization
score, which is calculated using both syntactic and structural
features. Then, functions with higher similarity scores are 2.1 Challenges of Firmware Updates
prioritized for DFG isomorphism-based analysis.
To gain a better understanding of the threat landscape, 381
C3. Vulnerability Validation: Static analysis can produce a
firmware update-related CVEs from the past decade are an-
number of false positives (FP), which require further valida-
alyzed and then systemized. While most CVEs stem from
tion. However, a firmware update procedure often involves
vulnerable firmware update mechanisms, others simply have
a chain of verification steps, each with its unique functions
an impact on the security of the update. Figure 1 shows
and invocation parameters. To test a step later in the chain,
the annual distribution of Common Vulnerability Scoring
it is necessary to create an input and an environment that
System (CVSS) v3 metrics since 2015. There is a steady
are capable of passing the first several steps. To simplify
trend of increasing, with the number of CVEs doubling from
validation, we present a semi-automated dynamic method
2020 to 2021. High and critical vulnerabilities are nearly
to validate alerts for emulatable firmware images. Specifi-
cally, we employ firmware patching to ensure the execution 120
of potentially vulnerable procedures. The validity of the corre- CVSS v3 Score
100
Number of CVEs

Low (0.1 - 3.9)


sponding alerts is then checked based on the update behaviors
80 Medium (4.0 - 6.9)
after inputting malicious firmware images. High (7.0 - 8.9)
60 Critical (9.0 - 10.0)
Evaluation and Findings. To gain a deeper understanding of
40
vulnerabilities in the wild, we ran ChkUp1 on 12,000 firmware
20
images. We found that weak verification algorithms, such
as the use of MD5 for integrity verification, were prevalent. 0
2015 2016 2017 2018 2019 2020 2021 2022

1 Our artifacts are available at https://fw-chkup.github.io Figure 1: CVSSv3 scores of firmware update-related CVEs.

5628 32nd USENIX Security Symposium USENIX Association


Generation Phase Delivery Phase Verification Phase Installation Phase
2a0a2c75300
Supply 74d023fa142
Sign Chain 2ddedfa638 Write Reboot
Signature Digest

Vxx.xx.xx Device
Author Firmware Server Device User/
Delivery Interfaces Maintainer Version Device ID

Figure 2: A common firmware update workflow.

four times of low and medium ones. This increase can be firmware component, known as the update agent [46], han-
attributed to not only the general trend of increasing CVEs dles downloading, verifying, and storing the new image in
but also the newly arising challenges specific to embedded persistent memory. Typically, new firmware can be delivered
systems [15, 30, 62, 72–74, 84, 86]. These challenges include in three ways: 1) The firmware server directly pushes the
1) an expanding attack surface due to increased connectiv- firmware and manifest to the device’s update agent; 2) The
ity, 2) increased complexity of embedded systems, 3) long update agent polls for updates and downloads them when
product life cycle, and 4) limited resources on embedded de- they become available; 3) The firmware server notifies the
vices. These factors contribute to diverse firmware update device user/maintainer, who manually downloads and uploads
mechanisms and increased vulnerabilities. updates to the update agent. In terms of communication chan-
nels, common methods include application-layer protocols
(e.g., HTTP, FTP), wireless media (e.g., Wi-Fi, Bluetooth Low
2.2 Update Workflow and Vulnerabilities
Energy), and physical interfaces (e.g., USB, removable mem-
A typical firmware update workflow has four phases as shown ory cards). For some low-end, bare-metal devices, companion
in Figure 2: generation, delivery, verification, and installation. apps on smartphones can assist with firmware updates. The
Generation Phase. The goal of the generation phase is to new firmware can either be bundled within or fetched by the
create firmware images and make them available. Specifically, app from the firmware server through application-layer proto-
an author first develops a new firmware image and a mani- cols. Then, these apps generally communicate with the device
fest. The manifest contains firmware metadata, including the via wireless media for notification, polling, and downloading.
firmware digest, version, and device ID. subsequently, the It is worth noting that while the apps act as intermediaries
firmware image and manifest are signed with a digital sig- for transferring firmware, they may also pre-verify updates.
nature and then transferred to a firmware server through the Such early verification can filter out invalid updates, to avoid
software supply chain. Usually, the author uploads firmware unnecessary, subsequent on-device processing.
to the server through trusted parties in the supply chain. A secure communication channel is important for the con-
During this phase, external attackers can conduct supply fidentiality and integrity of the delivered firmware images.
chain attacks to steal certificates and compromise software Insecure delivery mainly arises from a lack of cryptographic
development tools or infrastructure. The root cause of such protocols or the use of hard-coded keys, exposing systems to
attacks is inappropriate access control for critical assets and in- machine-in-the-middle (MITM) attacks. For example, CVE-
frastructures. There has been an increasing number of supply 2020-9544 involves plain HTTP without authentication, while
chain-related vulnerabilities reported in recent years, demon- CVE-2020-25233 derives from the use of a hard-coded RSA
strating their security impact in the real world. Reports indi- key for communication. Mobile apps can also have insecure
cate that the U.S. Government and more than 30,000 public communication with either firmware servers or devices, as
and private organizations such as Microsoft, Intel, and Fire- shown by CVE-2018-3928, where insufficient communica-
Eye suffered from a large-scale software supply chain attack tion security checks can lead to code execution vulnerabilities.
known as the SolarWinds hack in 2020 [55]. Specifically, Importantly, the primary concern is not just the communica-
cybercriminals compromised Orion IT management software tion channel but also the lack of proper security verification.
and then distributed malicious software updates containing For example, even with a leaked communication key, if a ro-
backdoors to users through the supply chain. bust firmware verification mechanism is in place, malicious
firmware replacements during updates can be prevented.
Takeaway 1: Supply chain vulnerabilities pose a significant Takeaway 2: Firmware delivery security mainly relies on
risk, often from inadequate access controls. Without proper the communication channel and device user/maintainer. If
on-device verification, compromised firmware can be in- either is insecure, the device may receive compromised
stalled in devices, leading to a loss of control over them. firmware unless proper on-device verification is in place.

Delivery Phase. The delivery phase involves transmitting the Verification Phase. The verification phase ensures the authen-
new firmware image from the server to the target device. A ticity, integrity, freshness, and compatibility of the received

USENIX Association 32nd USENIX Security Symposium 5629


Table 1: Top ten firmware update-related CWE vulnerability types.

CWE ID Name Example Proportion


CWE-345 Insufficient Verification of Data Authenticity CVE-2020-10831, CVE-2020-24395, CVE-2022-36360 5.47%
,→CWE-347 Improper Verification of Cryptographic Signature CVE-2020-27540, CVE-2021-37160, CVE-2022-21134 10.16%
CWE-287 Improper Authentication CVE-2018-6294, CVE-2020-27488, CVE-2022-2503 6.25%
,→CWE-306 Missing Authentication for Critical Function CVE-2019-16243, CVE-2020-9544, CVE-2020-29379 3.91%
,→CWE-295 Improper Certificate Validation CVE-2018-15476, CVE-2020-15498, CVE-2021-22909 3.13%
CWE-20 Improper Input Validation CVE-2018-3891, CVE-2019-11103, CVE-2021-25437 10.94%
CWE-434 Unrestricted Upload of File with Dangerous Type CVE-2019-10959, CVE-2021-37160, CVE-2022-28372 5.47%
CWE-798 Use of Hard-coded Credentials CVE-2019-5158, CVE-2019-14926, CVE-2020-24215 2.34%
CWE-119 Improper Restriction of Operations within the Bounds of a Memory Buffer CVE-2017-11082, CVE-2017-14444, CVE-2017-14445 1.56%
CWE-639 Improper Neutralization of Special Elements used in an OS Command CVE-2018-3890, CVE-2019-5157, CVE-2019-15310 0.78%
Note: CWE-347 is the child class of CWE-345; CWE-306 and CWE-395 are the child classes of CWE-287.

firmware. Specifically, the update agent performs a series of Specifically, firmware update-related commands executed dur-
verification procedures before storing the image in persistent ing this phase may accept parameters from user inputs. If an
memory: firmware authenticity is ensured by verifying the attacker manipulates these parameters and they are subse-
digital signature of the firmware; firmware integrity is veri- quently used by vulnerable functions (e.g., system, strcpy),
fied by checking the digest contained in the manifest; and the it can lead to command injection (e.g., CVE-2019-5155) or
freshness and compatibility are confirmed by examining the memory corruption (e.g., CVE-2021-22675) attacks.
metadata, along with version and device ID in the manifest.
Takeaway 4: Incomplete firmware inspection procedures in
Table 1 lists the top ten firmware update-related vulnera-
bootloaders during firmware installation are common, thus
bilities in the Common Weakness Enumeration (CWE) cat-
making the security of firmware updates dependent on the
egory, based on our CVE analysis. The top eight categories,
verification mechanisms in the update agent.
accounting for 47.67%, predominantly involve either missing
verification or improper verification methods for firmware up- Summary. Security vulnerabilities can arise during any phase
dates. These issues can enable attackers to replace the benign of the firmware update process. Nevertheless, robust firmware
firmware with a malicious one during updates. For instance, verification mechanisms by the device’s update agent can
the issue with CVE-2018-10988 stems from a lack of digital mitigate the majority of vulnerabilities originating from other
signature verification in the shell script used for firmware phases. Hence, our research primarily focuses on identifying
updates. Missing or improper integrity verification may lead vulnerabilities within the verification phase.
to firmware corruption. For example, using easily bypassable
internal checksums for firmware integrity checks is problem-
atic (e.g., CVE-2018-5441). Missing or improper freshness 3 Threat Model and Overview
verification can lead to firmware downgrade attacks, while
inadequate compatibility verification can expose the device Threat Model. ChkUp aims to uncover firmware update vul-
to DoS attacks. For instance, the root cause of CVE-2018- nerabilities in OS-based firmware (integrated with file sys-
3891 is a logic flaw in performing version verification, where tems), particularly in the dominant Linux-based firmware [69].
integer comparison operators are incorrectly used for string It can detect the most prevalent firmware update vulnerabil-
comparison. Similarly, in the case of CVE-2020-10831, arbi- ities including missing or improper verification of authen-
trary firmware can be installed due to insufficient verification. ticity, integrity, freshness, and compatibility. Aligned with
existing research [19, 34, 36, 59, 83], we assume no firmware
Takeaway 3: Either missing or improper implementation source code access, making ChkUp a binary-based vulnera-
of any steps in the verification procedure can lead to the bility detection approach. Potential users of ChkUp could be
installation of unintended firmware on the embedded device. security researchers seeking to notify vendors, or end-users
Installation Phase. The installation phase is a process of in- trying to obtain additional security information about their
stalling and executing the new firmware. After verification, the devices. Even vendors with access to the source code can
new firmware is stored in the persistent memory of the device benefit, especially in investigating the exploitability of vul-
and is activated upon reboot. Specifically, a bootloader first nerabilities, since source code analysis can overlook binary
moves the new firmware image to the right offset in the device and runtime-level details. It is worth noting that, similar to
memory when the device is starting up. Then, the bootloader intrusion detection systems or malware detectors, in-depth
executes the new firmware image after conducting a firmware domain expertise proves valuable in further refining alerts.
inspection. However, this inspection is often incomplete and Overview of ChkUp. The high-level idea of ChkUp is to stat-
insecure, commonly relying on internal checksums [46]. ically extract the firmware update program execution paths
Most vulnerabilities in this phase are typical software bugs from the firmware codebase and to pinpoint potential vulner-
such as command injection and memory corruption bugs. abilities along these paths based on summarized vulnerability

5630 32nd USENIX Security Symposium USENIX Association


Execution Path Recovery Verification Procedure Vulnerability Discovery Vulnerability Validation
Recognition

Verification Procedures
Execution Paths
Corpus

Node
Entry
Unpacked Firmware

HTML
1 1 Emulator

Vulnerabilities
UFG

Signatures
JS

Function

Alerts
0 1 0
0
FW
Missing or
2 0
2 Improper FW PoC
Reboot Verification

Figure 3: Overview of ChkUp.

patterns. Then, dynamic vulnerability validation is performed edges between the BBs, and each edge e ∈ E is represented as
to reduce false alerts. However, as discussed in Section 1, e = ([v1 , p1 ], [v2 , p2 ], c). This indicates that BB v1 in program
three primary challenges need to be addressed to implement p1 either transfers execution flow or shares data with BB v2
this idea: C1. Diverse Programs in Update Paths during the in program p2 , and c is a flag indicating the type of edge: c is
extraction of firmware update execution paths, C2. Verifica- 0 for intra-process control flow edges, c is 1 for IPC relations,
tion Procedure Recognition for matching vulnerability pat- and c is 2 for program invocation relations.
terns, and C3. Vulnerability Validation to reduce false alerts.
Update Entry Finding. Receiving firmware is typically the
To address these challenges, we propose ChkUp (illustrated
first step in an update procedure on the device side. There-
in Figure 3). Specifically, to address C1, we first create a UFG
fore, the entry node in a UFG is the node responsible for
that captures the control flow information across programs
this task, and the program that includes this entry node is re-
written in different programming languages. Next, we per-
ferred to as the entry program. For firmware containing a web-
form backward program slicing to determine the firmware
based update interface, the entry program could be a front-end
update execution paths (Section 4.1). To tackle C2, we extract
firmware upload utility. To identify the entry program, we
syntactic and structural features for function matching, then
use a static pattern-matching approach. This is non-trivial as
employ more sophisticated DFG isomorphism to recognize
firmware update mechanisms vary greatly among different
the verification chains in the firmware update execution paths
vendors. To address this issue, we manually analyze numer-
(Section 4.2). With the execution paths and the associated
ous firmware images (details are provided in Appendix B.2)
verification procedures, we examine them to discover vulner-
and identify distinct patterns that differentiate firmware up-
abilities based on defined criteria (Section 4.3). Finally, we
date entry programs from others. Specifically, such an entry
address C3 by a patching-based method where the vulnerable
program always contains recognizable code patterns. For ex-
procedure is tested using the generated PoCs after its execu-
ample, front-end programs to upload firmware images might
tion dependencies are bypassed via patching (Section 4.4).
use the <input type="file"...> pattern, while scripts or bi-
naries that download firmware images might use the wget ...
4 Design of ChkUp pattern. Additionally, the entry program often displays prompt
messages containing common informative words, as well as
function and variable names (e.g., fw_version and fw_upload)
4.1 Execution Path Recovery related to firmware updates. Built upon these observations,
we identify the program that matches the highest number of
UFG Definition. The binary dependency graph (BDG) in
predefined patterns as the entry program.
Karonte [59] can model data dependencies between bina-
ries within firmware images, which is crucial for firmware Cross-language Control Flow Analysis. After identifying
update vulnerability detection. However, accurate firmware the entry program, the next step is to locate the programs for
update execution path recovery requires additional infor- processing the received firmware image. These programs can
mation, including control flows, IPCs, and program invoca- take different forms, such as binaries and shell scripts [48].
tions across programs in various languages to determine the To gain complete insight into the control flow of the programs
intra- and inter-process control flows of firmware update- executed during a firmware update, cross-language control
related programs. Therefore, building upon BDG, we intro- flow analysis is necessary. However, current path exploration-
duce UFG to accommodate these requirements. A UFG, rep- based vulnerability detection methods [19, 59] lack this ca-
resented by G, is a directed graph that captures the intra- pacity. To address this challenge, we interpret the control flow
and inter-process control flow information at the basic block logic, IPC paradigms, and program invocation paradigms of
(BB) level of firmware update-related programs. The UFG various program types (i.e., HTML paired with JavaScript,
is defined as G = (V, E), where V is a set of BBs extracted shell script, and binary) commonly used in firmware updates
from front-end programs, scripts, and binaries involved in to construct UFGs. Specifically, we build the call graph (CG)
the firmware update procedure. E represents the directed and inter-procedure control flow graph (CFG) of the entry

USENIX Association 32nd USENIX Security Symposium 5631


FUN_0001aa68 in binary_a 01.int divideUpLoadFile(char *file_name,UPGRADE_FILE_HEADER
01.void FUN_0001aa68(undefined4 param_1,FILE *param_2, *pFileHeader){...
uint param_3){... 02.iVar2 = md5_verify_digest(fileMd5Checksum,input,uVar4);
02. // IPC set function invocaton 03.if (iVar2 == 0){
03. nvram_set_int("upgrade_fw_status",0);... 04. pcVar5 = "[%s:%d]md5 error\n"; ...}...}
04. // library function invocation 05.
06.int md5_verify_digest(uchar *digest,uchar *input,int len){
IPC

05. pcStack60 = "/sbin/ejusb";...


06. // program execution function invocation 07. int iVar1;
07. _eval(&pcStack60,0,0,0);... 08. uchar digst [16];
09. md5_make_digest(digst,input,len);
Program Invocation

nvram_get in script_a 10. iVar1 = memcmp(digst,digest,0x10);


01.// IPC get function invocation 11. return (uint)(iVar1 == 0);}
02.var upgrade_fw_status = '<% nvram_get
("upgrade_fw_status"); %>';...
Listing 2: An integrity verification procedure example.
_start in binary_b
01.void _start(undefined4 param_1){...
02. // main function invocation
03. __uClibc_main(FUN_0000e998,in_stack_00000000, tegrity verification. Our manual analysis reveals that verifica-
&stack0x00000004,_init,_fini,param_1);...}
tion procedures often use a similar set of key functions since
Listing 1: IPC and program invocation examples. they all try to accomplish a similar set of functions. Moreover,
values from functions, either as return variables or pass-by-
program and identify IPC (i.e., sockets, files, signals, environ- reference arguments, often flow into conditional expressions.
ment variables, NVRAM, and shared memory) and program These expressions then influence which conditional branch
invocation paradigms in the entry program. Note that the sup- is taken. An example can be seen when verifying firmware
ported program types and IPC paradigms are determined by integrity: the return value of a hash function is used in a
our preliminary manual firmware analysis, and their gener- conditional expression to determine the verification result, as
alizability is empirically measured using a large number of demonstrated in Listing 2. To identify verification procedures,
firmware images. Then, dependency edges representing con- we construct a corpus of function signatures from common
trol dependencies between the entry program and other related key functions. Next, we recognize verification procedures
programs are created. The procedure is repeated in a recursive from the execution paths in two stages: efficient function sim-
manner, where the CFG of each newly added program is built, ilarity matching, which quickly filters out irrelevant functions;
and IPC and program invocation paradigms are identified. and verification procedure chain identification, where we use
Listing 1 illustrates the connection of two programs through advanced semantic analysis for precise identification.
the discovery of IPC paradigms (nvram) and program invoca- Efficient Function Similarity Matching. The goal of the
tion paradigms (eval) between them. The UFG construction is first stage is to improve the efficiency of the identification
completed once no more update-related programs are found. of verification procedure chains by ranking functions with
Function-level Backward Slicing. With the constructed syntactic and structural features. Kim et al. [39] show that the
UFG, the next step is to determine the possible program exe- use of numeric syntactic and structural features can efficiently
cution paths for a firmware update procedure using function- achieve comparable accuracy to more complex deep learning-
level backward program slicing. Typically, a firmware up- based methods. Therefore, a combination of numeric features
date procedure concludes with a reboot to execute the new (see Table 4 in Appendix A.2) is selected for two reasons:
firmware. Thus, we locate calls to the reboot function, espe- 1) These features can be efficiently extracted without requir-
cially those that trigger the reboot binary, in the UFG and ing complex semantic code analysis; 2) The combination
set them as the target of backward slicing. The backward of these features can achieve high discriminative capability
slicing follows the inter-procedural control flow within and while being robust across different architectures, compiler
across programs to determine all possible execution paths types, and code stripping. It is worth noting that the impact of
starting from the firmware receiving function and ending with compiler optimizations on these features becomes less signifi-
the reboot function. These paths represent potential firmware cant in embedded systems, where stable and industry-standard
update execution paths. However, as with any path-based ex- compiler options are commonly used, thereby ensuring the ro-
ploration analysis, this execution path recovery method may bustness of features. Specifically, we obtain syntactic features
encounter the issues of path explosion and path missing. We from the intermediate representation (IR) of code, includ-
tackle path explosion with five strategies, including skipping ing attributes from its abstract syntax tree (AST) such as the
standard libraries, skipping built-in utilities, using a timeout, number of function calls and key strings. Structural features
filtering paths with prompts, and merging paths. Further de- come from CFGs, including the number of BBs, edges, and
tails on these strategies can be found in Appendix A.1. branches. Using these extracted features, similarity scores
between functions in execution paths and those in the corpus
are calculated based on the relative difference between their
4.2 Verification Procedure Recognition feature values [39]. These scores improve efficiency in the
second stage by prioritizing functions with higher similarity
Two-stage Approach Overview. Key functions are those scores. Function pairs with similarity scores below a specific
used in firmware verification, such as hash functions for in- threshold (0.5 in this work) are filtered due to the low like-

5632 32nd USENIX Security Symposium USENIX Association


lihood of similarity. More details on the extracted features, engineering of firmware images from various major vendors.
mathematical construction of similarity scores, and similarity We categorize all functions into two: those employing proper
score threshold determination can be found in Appendix A.2. cryptographic algorithms or assessing protected verification
information and those that do not (see Section 4.3 for details).
Verification Procedure Chain Identification. A data-flow
The next stage involves extracting function signatures, which
graph (DFG) can represent the relationship between the data
consist of feature vectors and graph representations. The fea-
and the arithmetic and logical operations, and it is used in
ture vector is derived through syntactic and structural analysis,
cryptographic primitive identification techniques [47, 51]. To
while the graph representation is generated after constructing,
further identify key functions with accurate semantics anal-
normalizing, and pruning the DFG. Normalization serves to
ysis, we employ DFG subgraph isomorphism. In addition
eliminate redundant nodes and edges, followed by pruning to
to function semantics, the usage pattern of the return vari-
remove elements irrelevant to the verification process. This
ables and pass-by-reference arguments of key functions is
approach is commonly used in existing research [47, 51, 78],
also considered to reduce the FPs of verification procedure
ensuring accurate function matching across a wide array of
matching. The insight is that the return variables or arguments
possible implementations. Specifically, the process requires
from a key function often feed into a conditional expression.
pinpointing the variables or arguments holding the essential
This expression then determines the conditional branch to
verification data. After setting these as targets, program slic-
execute and indicates the pass or fail status of the verification
ing is executed to maintain all relevant nodes and edges. For
procedure. An example can be seen in Listing 2, where the
more detailed corpus statistics, please refer to Appendix A.3.
return variable of the function md5_verify_digest is used to
determine whether the integrity verification has passed.
We search for verification procedure chains from all exe- 4.3 Vulnerability Discovery
cution paths. For each execution path, various types of ver-
ification procedures are identified successively. To identify Vulnerability Discovery Criteria. We focus on inspecting
verification procedures, such as authenticity verification, we the implementation deviation of proper verification proce-
first select a function pair ( f , f ′ ) with the highest similarity dures for the four properties, i.e., authenticity, integrity, fresh-
score, where f is from the execution path and f ′ is a key func- ness, and compatibility. Despite the lack of verification ac-
tion for authenticity verification in the corpus. Then, the DFG counts for most verification stage vulnerabilities, improper
of the function f is constructed and normalized to preserve its implementation of the verification procedure can also lead
underlying semantics while eliminating variations introduced to an insecure or exploitable verification procedure. In our
by developers, compiler optimizations, or machine code trans- work, the primary concern of improper authenticity verifi-
lation. With the DFG, we can verify whether the functions f cation arises from the use of symmetric cryptographic algo-
and f ′ are functionally equivalent by searching for subgraphs rithms (e.g., HMAC, CMAC, Poly1305). Likewise, we iden-
in the DFG of f that are isomorphic to the graph signature tify that the root cause of improper integrity verification often
of f ′ , using Ullmann’s algorithm [71]. If a match is found, lies in the usage of weak digest verification algorithms such
function f is considered semantically equivalent to function as CRC, SHA1, and MD5. Finally, for freshness and compati-
f ′ . A reaching-definition analysis is then performed on the re- bility verification, our work emphasizes that the verification
turn variable and pass-by-reference arguments of f to obtain can be insecure when the verification information, including
data-flow slices. If the return variable or arguments flow into firmware version and compatible device ID, is extracted from
conditional expressions, the verification procedure is consid- unprotected data sources (e.g., filenames of firmware images).
ered identified. The process continues with the next function
Vulnerability Discovery Process. We identify vulnerabilities
pair having the next highest similarity score if no match is
by examining both execution paths and associated verifica-
found unless there are no remaining function pairs exceeding
tion procedures. If a firmware image contains only one execu-
the similarity score threshold. Upon completion of the second
tion path, we focus on that path. For firmware with multiple
stage, verification procedure chains in all execution paths are
paths, we select the path with the most proper verification
identified and ready to be further examined.
procedures, as this often indicates thorough verification. This
Corpus Creation. Key functions are either standard library strategy can reduce FPs, ensuring more conservative vulnera-
functions or proprietary functions. Functions responsible for bility identification. Next, vulnerability detection is performed
authenticity and integrity checks often encapsulate standard on the sole or chosen path based on the previously defined
cryptographic routines from well-established libraries [85]. criteria. Specifically, to identify missing verification vulnera-
To this end, we collect open-source cryptographic functions bilities, we check for the absence of verification procedures
from libraries commonly used in embedded systems. Though for authenticity, integrity, freshness, or compatibility in the
functions used for freshness and capability checks are usually execution path. To detect improper verification vulnerabilities,
proprietary, they share similarities within the same device fam- we examine the use of improper functions in the correspond-
ily based on our analysis. These are obtained through reverse ing verification procedures in the execution path. It is worth

USENIX Association 32nd USENIX Security Symposium 5633


Table 2: Firmware modification or selection for PoCs. comparisons ineffective. To address this, we first conduct a
Type Firmware Modification or Selection Method
successful firmware update with a benign image and record
Auth.* Select a firmware image without signature from the same vendor the values of variables involved in conditional expressions.
Integ. Alter bits in an update firmware image Next, we perform static value-flow analysis, tracing backward
Miss.
Fresh. Select an outdated firmware image for the same device
Comp.* Select an incompatible image from the same device family from these variables to identify the instructions where their
Auth. Alter bits in an update image and resign it with the same key values are defined. Our investigation indicates that such in-
Integ.* Alter bits in an update image and replace its digest structions are typically within the same function, obviating
Improp.
Fresh. Select an outdated image and alter its version field
Comp.* Select an incompatible image and alter its device ID field the need for complex control-flow recovery [66]. Finally, with
Note: Miss.: Missing; Improp.: Improper; Auth.: Authenticity; Integ.: Integrity; the recorded values and identified instructions, we employ
Fresh.: Freshness; Comp.: Compatibility; Vulnerability types marked with an aster- in-place binary rewriting techniques [43] to make use of the
isk (*) most likely require patching during test environment generation.
previously recorded values, while including conditionals to
ensure that the replacement applies only to the firmware exe-
noting that some firmware images perform a quick integrity cution phase of interest.
check using a weak algorithm, followed by a more thorough
verification for authenticity and integrity based on digital sig-
nature/MAC algorithms. We only report an alert for improper 5 Implementation
integrity verification when the digest algorithm used in the
corresponding digital signature/MAC is also insecure. ChkUp is prototyped to support both Linux-based and other
embedded OS-based firmware images (equipped with file sys-
tems) across various architectures, including ARM, MIPS,
4.4 Vulnerability Validation and PowerPC. Specifically, the Execution Path Recovery mod-
ule efficiently constructs a UFG for each firmware image
PoC Creation. We dynamically validate an alert by feeding a
within a 300s timeout, representing each UFG as a di-graph
PoC input that specifically violates a security property under
using NetworkX [13]. The control flow of various programs is
test and observing whether the firmware update procedure
analyzed with tools [22,61,66] like angr [66] for binaries. IPC
still reaches the reboot stage. If it does, a vulnerability exists;
and program invocation paradigms are identified based on
if not, the alert is an FP. The PoC image varies based on alert
the CPF module from KARONTE [59], extended to support
type, as shown in Table 2. For example, to validate a missing
both JavaScript and shell scripts. After constructing UFGs,
compatibility verification, we replace the benign firmware
execution path recovery is conducted using the Simple Paths
with an incompatible version from the same device family.
module of NetworkX at the function level. In the Verifica-
However, this process may also alter other properties, compli-
tion Procedure Recognition module, numerical features are
cating root cause analysis. For example, the selected firmware
extracted from source code (specifically for JavaScript and
image may also contain an incorrect version number. To miti-
shell scripts), CFG, and both disassembled and decompiled
gate the issue, ChkUp patch and repack the testing firmware
code (leveraging Ghidra [14]). Upon constructing and nor-
to align the execution path of the verification procedure with
malizing DFGs using the normalization rules by Meijer et
the property under test.
al. [51], Ullmann’s algorithm is employed for DFG subgraph
Patch Generation. We patch to skip verification procedures isomorphism to identify verification procedures. The Vulnera-
that may hinder the vulnerability validation of a specific prop- bility Discovery module is based on the previous two modules
erty under test. The patching is conducted either at the source and performs vulnerability discovery using the described cri-
code or binary level, depending on the program type. Typi- teria and process. In the Vulnerability Validation module,
cally, we invert conditional expressions without adding extra Firmadyne [18] is initially used for dynamic analysis, and if
code, essentially altering comparisons to their opposites (e.g., emulation is unsuccessful, the more advanced FirmAE [42] is
equal to not equal). Listing 4 in Appendix A.4 presents the employed. Additionally, Ghidra and Firmware-mod-kit [11]
disassembly and decompiled code of the compatibility verifi- are used to patch and repack firmware images.
cation procedure of a TP-Link firmware image. This proce-
dure fetches the current product ID using the getProductVer()
function and extracts the compatible product ID of the new 6 Evaluation
firmware from its file body. The ID comparison is conducted
by a bne (branch if not equal) instruction. To bypass this veri- In this section, we evaluated the effectiveness three key mod-
fication with the PoC firmware, we can substitute bne with a ules of ChkUp, i.e., the Execution Path Recovery (Section 6.1),
beq instruction. Although most verification procedures em- Verification Procedure Recognition (Section 6.2), and Vulner-
ploy straightforward conditional expressions, some include ability Validation (Section 6.3).
complex conditions involving various logical operators and Datasets. We collected 157,141 firmware images from the
multiple variables, some of which are only known at run- websites of IoT vendors and successfully unpacked 111,958
time. This complexity makes simple methods like negating of them. To enable large-scale analysis, a dataset, DL , was

5634 32nd USENIX Security Symposium USENIX Association


100 45 800
Ratio of Correct Entries(%)

TP FN FP Execution Path Recovery

Number of Procedures
75
30 Verification Procedure Recognition

Analysis Time(s)
50 600
15
25
0 0
NG TPL DL TN Authenticity Integrity 400
100 45
75
W/O P1 30
50 200
W/O P2
W/O P3 15
25
W/ All
0 0 0
AS ZX LS UI Freshness Compatibility NG TPL DL TN AS ZX LS UI

(a) Accuracy of update entry identification (b) Metrics of procedure recognition (c) Performance overhead
Figure 4: Evaluation results (NG, TPL, DL, TN, AS, ZX, LS, and UI represent Netgear, TP-Link, D-Link, TRENDnet, Asus,
Zyxel, Linksys, and Ubiquiti, respectively).

created by randomly sampling 12,000 firmware images from the Execution Path Recovery module on firmware images
eight major vendors, including Netgear, TP-Link, D-Link, from DG , ChkUp takes an average of 126.0 seconds for each
TRENDnet, Asus, Ubiquiti, Zyxel, and Linksys. To evaluate image (see Figure 4c). As a result, of the 150 generated UFGs,
the effectiveness of ChkUp and validate its alerts, a ground- 122 are both firmware update sound and complete. 136 UFGs
truth dataset, DG , was created by sampling from DL . The are sound since since every edge in UFGs represents control
construction of the ground truth for the firmware images in flows or IPC paradigms in the update procedures. However,
DG was undertaken by four security experts through manual UFGs from 7 Asus, 4 D-Link, and 3 Zyxel firmware images
analysis. For details on the dataset construction and manual are unsound, yielding unrelated IPC paradigms. 133 UFGs
analysis, refer to Appendix B.1 and Appendix B.2. are complete, containing all related control flows and IPC
Experimental Environment. The evaluation was conducted paradigms. However, UFGs from 9 Netgear firmware images
on a server with an Ubuntu 18.04 LTS OS and an AMD EPYC are incomplete due to misidentified update entries. Note that
7302P CPU with 64GB of RAM. the 3 above-mentioned UFGs from Zyxel firmware images are
also incomplete due to a mismatch between update entries and
back-end handlers. The rest, from 5 TP-Link firmware images,
6.1 Effectiveness of Execution Path Recovery are incomplete due to timeouts during UFG construction.
The effectiveness of the Execution Path Recovery module is We found that sound and complete UFGs always lead to
assessed by the accuracy of the update entry finding as well the recovery of correct paths, while unsound or incomplete
as the correctness of recovered execution paths. UFGs might introduce incorrect paths or overlook the correct
ones during backward slicing. For example, the 3 unsound
Accuracy of Update Entry Finding. An update entry pro- and incomplete UFGs of Zyxel firmware only contain reboot
gram is identified using three types of patterns: prompt mes- function invocations that were intended for other device man-
sage (P1), variable and function name (P2), and common code agement functionalities (e.g., applying new configurations).
(P3). To evaluate the effectiveness of each type of pattern, we Despite this, most incorrect paths do not influence the vulnera-
performed an ablation study, assessing the correctness of the bility discovery process, as they contain relatively incomplete
identified entry programs under different settings, specifically, verification procedures and are filtered out as long as correct
without P1, without P2, without P3, and with all patterns. paths are also found during the vulnerability discovery. Also,
Evaluation results for firmware from each vendor in DG are all the overlooked correct paths still contain a reboot step and
shown in Figure 4a. The highest accuracy is achieved when can be identified once the complete UFGs are constructed.
all patterns are used, and the lowest when only P1 and P3
are employed. This indicates that P2 is the most essential for
this process, as its absence often leads to significant perfor- 6.2 Effectiveness of Procedure Recognition
mance drops across firmware images from most vendors. The Upon evaluating the Verification Procedure Recognition mod-
impact of P1 and P3 varies by vendor. For instance, P1 signif- ule on DG , ChkUp recognizes verification procedures for
icantly impacts firmware images from TP-Link, TRENDnet, each firmware image in an average of 216.1 seconds (see
and Zyxel, while P3 is vital for those from Netgear and Asus. Figure 4c). The results of recognizing different categories
Importantly, when using all patterns, the identification demon- of verification procedures are shown in Figure 4b. Note that
strates its robustness by accurately identifying update entry the eight columns in each category represent the results of
programs of most firmware images. While 9 Netgear firmware firmware images from Netgear, TP-Link, D-Link, TRENDnet,
images had their update entries misidentified due to limited Asus, Zyxel, Linksys, and Ubiquiti, respectively. In summary,
semantic information, most of these misidentified programs there are 461 true positives (TPs), 45 false negatives (FNs),
are still related to firmware updates but handle update roles and 17 FPs. Fewer authenticity verification procedures are rec-
other than firmware delivery. ognized because some execution paths indeed lack a firmware
Correctness of Execution Path Recovery. Upon executing authenticity verification procedure, based on our analysis.

USENIX Association 32nd USENIX Security Symposium 5635


ChkUp has the highest integrity verification accuracy, as ven- during firmware patching. Specifically, patch generation in
dors often use standard functions such as MD5_Update with ChkUp involves three steps: 1) identification of verification
minor or even no customization. procedures to bypass, 2) selection of code segments to modify,
False Result Discussion. The FPs and FNs primarily arise and 3) deployment of patched firmware. If the identification
from inaccurate execution paths and misidentification of key of the first step is incorrect, the subsequent patch might not
functions. Specifically, our analysis reveals that inaccurate enable further exploration of the program space, leading to
execution paths lead to 23 FNs and 6 FPs since either ver- an FN. Of the 15 failed PoC cases, 9 were due to this prob-
ification procedures are not included or unrelated code is lem. Although not implemented in our current prototype, this
mistakenly identified as verification procedures. For instance, issue can be mitigated by monitoring program execution to
device management functionalities, different from firmware differentiate between known correct and incorrect running
updates, are included in the UFGs of 2 D-Link DAP firmware firmware. Even with the correct verification procedure iden-
images and are falsely recognized as authenticity verifica- tified, issues like heuristically negating logic in an incorrect
tion procedures. Other FNs and FPs mainly stem from using code location can arise. Similar to the issue of misidentifi-
uncommon key functions or from including functions with cation, this can be addressed. Yet, we did not encounter any
similar semantics in execution paths. Notably, 6 FNs arise of these cases, likely due to the use of heuristics that work
from deviating from the heuristics guiding the identification well for known firmware types. Lastly, the deployment of the
of verification procedures: 3 for Zyxel NBG-series images patched firmware is not always feasible due to challenges
because return variables and arguments of their digest calcu- with firmware repacking and emulation. A deeper analysis of
lation functions do not feed into conditional expressions. A these challenges is provided later.
similar issue is seen in the version parsing of Netgear R-series Scalability of Vulnerability Validation. The scalability of
images, resulting in 3 FNs. Further analysis shows that these the Vulnerability Validation module is closely tied to its ability
images display the digests or parsed firmware versions on to emulate and repack firmware images. To assess this, we an-
user interfaces, requiring manual verification. Despite some alyzed a random sample of 1,200 firmware images, which rep-
FNs and FPs, the module remains effective in most cases. resent 10% of DL . Initial emulation tests showed that 44.1%
of these images were successfully emulated and thus appli-
cable to this module. Notably, D-Link firmware exhibited
6.3 Effectiveness of Vulnerability Validation the highest emulation success rate, while Ubiquiti firmware
To evaluate the effectiveness of the Vulnerability Validation had the lowest. For the emulatable images, we achieved a
module, we assessed its success rate on DG and analyzed repacking success rate of 72.2%. The most significant factor
firmware patching results. Moreover, we measured its scala- hindering successful repacking was the presence of file sys-
bility on 1,200 additional firmware images from DL . tems that were not as widely supported as common ones like
SquashFS and CramFS. Finally, of the repacked firmware im-
Success Rate of Vulnerability Validation. After running the ages, 82.7% were successfully emulated. The primary reason
Vulnerability Discovery module on DG , a total of 271 alerts for most failures was runtime firmware signature checks.
were raised. Of these, 119 alerts were raised for 72 emulat-
able firmware images in DG , which are compatible with the
Vulnerability Validation module. Among the 119 alerts, 90 7 Vulnerability Discovery Results
require patching to create a testing environment for conduct-
ing PoCs. Applying the Vulnerability Validation module on 7.1 Alerts on Real-world Firmware
the corresponding firmware images resulted in 69 successful
generations of patched firmware images. Obstacles in creating Alert Analysis. We used ChkUp to identify potential vulner-
patched firmware images stem mainly from diverse firmware abilities on DL and analyzed the alert distribution. In terms
implementations, including the use of uncommon file sys- of performance, ChkUp processed 93.4% of firmware images
tems, which are not supported by the state-of-the-art firmware within 600 seconds each. Timeouts during UFG construction
repacking tool. We then emulated these repacked firmware contributed to the extended analysis time observed in 3.4% of
images and found only 10 failed to run due to violations of the firmware images. ChkUp resolved the execution paths for
runtime firmware signature checks. In total, 88 testing envi- 10,670 firmware images and generated 15,132 alerts. Figure 5
ronments from both patched and original firmware images displays the distribution of alerts by vulnerable verification
were created successfully. After undertaking PoC creation, type for major device types in DL . Notably, a significant por-
73 PoCs were successfully conducted and the corresponding tion of alerts arise from firmware of various network devices
alerts are considered TPs. Investigation of the failure cases (e.g., routers and switches) and cameras due to two primary
reveals that 6 are indeed FPs and 9 are supposed to be TPs. reasons: 1) Network devices and cameras, which often have
Patching Result Analysis. Unsuccessful PoC generation can publicly accessible firmware images, dominate a substantial
result from either the alert being an FP or issues encountered share of the IoT market; 2) Many devices reuse vulnerable

5636 32nd USENIX Security Symposium USENIX Association


Table 3: Vulnerability discovery results of ChkUp on DG dataset.

Authenticity Verification Integrity Verification Freshness Verification Compatibility Verification


Vendor # FW Missing Improper Missing Improper Missing Improper Missing Improper
TP FN FP TP FN FP TP FN FP TP FN FP TP FN FP TP FN FP TP FN FP TP FN FP
Netgear 40 27 5 0 0 0 0 0 0 3 29 3 0 0 0 4 6 0 1 0 0 3 0 0 1
TP-Link 17 0 0 3 10 3 0 0 0 0 17 0 0 0 0 2 0 0 0 0 0 3 0 0 0
D-Link 15 9 2 0 0 0 0 0 0 0 9 0 0 0 2 0 9 0 0 0 2 4 5 4 0
TRENDnet 10 10 0 0 0 0 0 0 0 2 8 2 0 0 0 0 0 0 0 0 0 0 0 0 0
Asus 23 0 0 0 13* 0 0 0 0 0 0 0 0 0 0 0 4* 0 3 0 0 0 4* 0 4
Zyxel 6 3 0 3 0 0 0 0 0 3 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0
Linksys 21 12 4 0 0 0 0 0 0 2 13 2 2 3 0 0 0 0 0 0 0 3 0 0 0
Ubiquiti 18 11 2 2 0 0 0 0 0 0 16 0 0 0 0 0 0 0 0 0 0 2 0 0 0
Summary 150 72 13 8 23 3 0 0 0 10 95 10 2 3 2 6 19 0 4 0 2 15 9 4 5
Note: Numbers marked with an asterisk (*) indicate that the TPs correspond to n-day vulnerabilities.

Authenticity Integrity Freshness Compatibility


programming languages, like PHP used for specific firmware
3553
r

update steps. We found no cases where, despite correct UFG


er a em pte oin em de er itc ute

2620
O Fir Sy Ada ss Mo xte Ca Sw Ro

r a h

958 construction, the use of reboot functions as slicing targets


k
Device Type

738

Total Alerts
or
m

caused path identification to fail. Moreover, we inspected the


w
et

716
n
N

verification procedures of 59 firmware images with four alerts


E

479
d
k

t
or

to check their validity. Out of these, 39.0% were found to have


w

277
P
et

r
N

h k ce

277
insecure update mechanisms, while the remaining firmware
es or Ac

168
th ew st

contained FPs that primarily resulted from misidentified exe-


s ll
M tw

884
e
N

0 20 40 60 80 100 cution paths and verification procedures. Importantly, the key


Alert Distribution by Verification Procedure (%)
reason for the misidentification of verification procedures was
Figure 5: Distribution of alerts for various device types. the mismatch of functions between firmware images and the
corpus. While most of the procedures that ChkUp overlooked
do adhere to the value-to-condition heuristic, some do not.
code for their firmware foundation and have seen a lack of For these outliers that we confirmed as genuine, we provided
effective security enhancements, resulting in persistent vul- exceptions, exempting them from the heuristic matching.
nerabilities across both new and old versions.
The most prevalent security issue reported by ChkUp in-
volved vulnerable integrity verification procedures. For in-
7.2 Real-world Vulnerabilities
stance, of the alerts for network switch firmware, over half re- Validating alerts is both time-consuming and labor-intensive,
late to this issue. The majority of these alerts highlight the use so our focus was on 150 firmware images from DG . We em-
of weak algorithms like CRC and MD5. Vulnerable authen- ployed the Vulnerability Validation module for emulatable
ticity verification also threatens various device types, notably firmware and manually validated the rest with our ground
routers. The primary concern is the lack of verification, but the truth. To pinpoint FNs, all firmware images were assessed
use of symmetric digital signatures is also widespread. While using the ground truth. The results are displayed in Table 3.
alerts related to freshness and compatibility verification are Overall, the TPR is 86.7% and the FPR is 5.3%, demonstrating
less common, cameras and network extenders exhibit more the effectiveness of ChkUp in detecting vulnerabilities. Most
of these alerts than other devices. The most frequent issue FPs and FNs arose from the Execution Path Recovery and
involves using unprotected data, such as filenames, to verify Verification Procedure Recognition modules as we discussed
versions or device IDs in the firmware update interfaces. in our evaluation. TPs for Asus firmware images are n-day
Invalid and Outlier Result Discussion. In the results vulnerabilities (i.e., CVE-2014-2718, CVE-2020-15498, and
from ChkUp, the invalid cases primarily consisted of 1,330 CVE-2021-3166), while those for 29 other device families
firmware images for which ChkUp could not resolve the ex- were undisclosed. We have also reported our findings to ven-
ecution paths. The outlier cases were those 589 firmware dors and received acknowledgment. At the time of writing, 25
images that raised four alerts, which was uncommon as most CVE IDs and one PSV ID have been assigned. It is worth not-
images typically triggered up to three alerts. We delved deeper ing that the majority fall into the categories of CWE-345 and
into these results by examining a random 10% sample. We CWE-20. These findings align with our systematization, indi-
found that two primary reasons emerged for the 133 failure cating that these two categories are the most prevalent. More
cases in execution path recovery: 32.3% of cases stemmed CVE/PSV details are available at our project website [76].
from misidentification of the entry program, and 29.3% from Vulnerability Analysis. Regarding the vulnerability type, im-
timeouts during UFG construction. The remaining issues were proper integrity verification emerged as the most frequent
mainly due to unusual IPC implementations or unsupported issue, with 39.0% using CRC and 23.8% using MD5. Issues

USENIX Association 32nd USENIX Security Symposium 5637


1. upgrade.js
01.function clickUpgrade(form){...
update mechanism susceptible to real-world exploits.
02. var file_array=file_name.split(‘-V’);...
03. // check the firmware compatibility with filename Real-world Exploits. We showcase the exploitability of these
04. var file_module=file_array[0];
05. if(file_module.toUpperCase()!=netgear_module.toUpperCase()){ vulnerabilities by crafting two exploits, specifically a firmware
06. alert(error_module); return false;}...
07. // check the firmware freshness with filename downgrade attack and a firmware modification attack, target-
08. if(netgear_num>file_num){
09. if(!confirm(oldver1+file_version+oldver2+netgear_version+oldver3)) ing a Netgear WNR-series router. In these attack scenarios,
10. return false;}... return true;} Front-end
2. webupgrade.sh Back-end
both attackers and the target device share the same network
01.imginstall(){...
02. # check the firmware compatibility with firmware header
environment. Attackers can either directly access the firmware
03. module_name=$(cat /module_name)
04. new_name=$(sed -n '1{p; q}' $INFO_HEAD | sed 's/.*://')
update interface or sniff the network and initiate MITM at-
05. if [ "$module_name" != "$new_name" ]; then...
06. giveup_webupgrade_in_imginstall $STATUS_CHKINFO_ERROR fi
tacks, given that the communication for firmware delivery
07. # check the firmware integrity using CRC checksum uses unencrypted HTTP. Notably, during our preliminary man-
08. CHECKSUM=$(/sbin/mychecksum -o $offset -i $IMPORT_FILE_NAME|sed
's/,.*$//')
09. if [ "$CHECKSUM" != "checksum = 0x00" ]; then...
ual firmware analysis, we found a significant number of public
10. giveup_webupgrade_in_imginstall $STATUS_CHKSUM_ERROR... fi...}
firmware images lack TLS for update interfaces, highlighting
3. mychecksum (decompiled code)
01.bool calcsum(undefined4 param_1,__off_t param_2){... the feasibility of these attacks in the real world.
02. while (sVar3 = read(__fd,local_a8,0x80), 0 < sVar3){...
03.
04.
// calculate the CRC checksum
iVar6 = iVar6 + sVar3;}
A1. Firmware Downgrade Attack: Since the firmware fresh-
05. printf("checksum = 0x%02X, len = %d\n",~uVar5 & 0xff,iVar6);...} ness is determined by checking the filename of the uploaded
Listing 3: Firmware update code in the case studies. file, attackers can craft a malicious firmware image using a
legacy firmware image by changing its version field in the
filename to match a legitimate one. Then, they can either up-
with missing and improper authenticity verification are preva- load the malicious firmware image through the web interface
lent. For example, in the firmware from TP-Link WR-series or substitute the benign firmware image through MITM. Sub-
devices, although there exists an RSA signature for authen- sequently, the firmware is replaced with an insecure legacy
ticity verification in firmware updates, the signature does not version with vulnerabilities, which can be further exploited.
protect the firmware header. Moreover, the MD5 sum value
A2. Firmware Modification Attack: As the firmware integrity
is also stored in the header. Therefore, to craft a malicious
verification is based on CRC checksum, attackers can cre-
firmware image that can bypass the verification, an attacker
ate a modified firmware image while keeping the checksum
could modify header content and recompute the hash value
value consistent. However, some fields, such as device ID
to replace the original one. Furthermore, fewer vulnerabil-
in the header for compatibility verification, need to remain
ities were identified in freshness and compatibility, mostly
unchanged to ensure other verification procedures proceed
rooted in the use of unprotected information. For instance,
successfully. Attackers can introduce the malicious firmware
the firmware version verification of D-Link DAP-series de-
to the device in the same way mentioned in firmware down-
vices is to extract the version from the filename of the new
grade attack. With this attack, if the malicious firmware is
firmware image and compare it with the current version. As
carefully crafted, various further attacks can be introduced,
the filename lacks authenticity and integrity protections, at-
including backdoors, malware, and DoS attacks.
tackers could alter the filename to bypass the verification and
introduce vulnerable firmware images to devices.
8 Discussion
7.3 Case Studies Security Impacts of Heuristics. Heuristic approaches cap-
Listing 3 shows the firmware verification flows in a firmware ture common patterns discovered via manual reverse engineer-
image from Netgear WNR-series routers. These devices pro- ing. While effective for firmware images with similar patterns,
vide a web interface for manual firmware updates. In the front they cannot capture the foundational problem that can be gen-
end of the interface, as seen in line 3 to line 10 of upgrade.js, eralized across different systems. In the context of ChkUp,
both compatibility and freshness are verified by examining our approach that attempts to heuristically identify the en-
the filename of the uploaded file, a method we have already try and end of firmware update paths can fail on customized
identified as vulnerable. After passing these verifications, the firmware, preventing the analysis from starting. Furthermore,
back-end shell script webupgrade.sh proceeds with further ver- our heuristic approach that uses information flow from the
ification. From line 2 to line 6 of webupgrade.sh, the firmware crypto function to identify the update phase could fail to ex-
header is examined to ensure compatibility. Therefore, com- tract the correct update phase code. From the evaluation, we
patibility can be ensured as long as firmware integrity is en- found that such heuristics can fail on unconventional designs.
sured. Integrity verification occurs in webupgrade.sh (see line Limitations of Static and Dynamic Analysis. There is a
7 to line 10) through the binary mychecksum. Upon inspection, trade-off between static analysis and dynamic analysis. How-
mychecksum uses the weak CRC algorithm for this verifica- ever, when used appropriately in combination, they provide a
tion. Notably, authenticity verification is absent. As a result, good balance between completeness and soundness of analy-
these vulnerable verification procedures make the firmware sis. In the context of ChkUp, a key advantage of static analysis

5638 32nd USENIX Security Symposium USENIX Association


is the elimination of the need to emulate firmware, which re- edge. The second category uses methods like taint analy-
mains an open research challenge. For example, out of 150 sis [19, 27, 59, 83, 85] and symbolic execution [25, 32, 34,
firmware images in DG , only 72 are emulatable using state-of- 36, 65] to explore vulnerable execution paths. For instance,
the-art firmware emulators. The emulation rate is even lower Karonte [59] targets memory-corruption vulnerabilities by
for bare-metal firmware. Another advantage of static analysis pinpointing unsafe user-controlled input sinks. The third cat-
is the ability to detect deeper bugs where crafting an input egory involves pattern matching [20, 38, 44, 81] and code
to reach the bug is extremely challenging. For example, 39 similarity checks [23, 24, 28, 33, 39, 41, 49, 50, 64, 75, 79, 88],
firmware images check the cryptographic signature before detecting vulnerabilities by matching known patterns or vul-
proceeding to the buggy verification procedures. Yet, static nerable code, such as FirmUP [24] that uses procedure simi-
analysis often results in many false alarms. To mitigate this, larity. Our research specifically targets firmware update vul-
ChkUp leverages dynamic analysis to attempt to provide a nerabilities, which are distinct due to their multi-stage nature
level of validation, albeit a basic one. Besides the challenge of and the unique semantics of firmware updates.
emulation, dynamic analysis only allows the validation of one Cryptographic Misuse Detection: Cryptographic API mis-
path at a time, limiting the breadth of exploration. For ChkUp, use detection is a technique for identifying potential vulnera-
success also depends on the patching being performed cor- bilities stemming from incorrect usage of cryptographic primi-
rectly, which is not always true as discussed. tives [16]. The general idea is to verify that parameters passed
Extensibility of ChkUp. Beyond firmware update vulnera- to cryptographic APIs meet pre-defined rules [26, 29, 31, 45,
bilities, our techniques can detect vulnerabilities like faulty 56,58,67,85]. ChkUp differs from existing cryptographic mis-
firmware function implementations (e.g., in cryptographic use detection studies in terms of system goals, security impact
functions). By adding faulty function signatures to the cor- and technical standpoints. Regarding security goals, ChkUp
pus, we ensure that even if there are correctly implemented focuses on firmware update security, providing a fundamen-
counterparts, the vulnerable function will match its signature tally distinct focus. Instead of analyzing individual crypto-
based on a higher similarity score, enabling us to pinpoint graphic vulnerabilities, our approach involves systematizing
the use of flawed implementations. ChkUp is tailored for the firmware update ecosystem and conducting a security
multi-architecture Linux and other embedded OS firmware analysis to map the attack surface, uncovering a significant
with file systems. While bare-metal firmware analysis poses amount of non-crypto attack vectors. From the perspective
challenges, like addressing base resolution, most do not re- of security impact, ChkUp provides new ways to automate
quire cross-language control/data flow analysis. By incorpo- the search for firmware update vulnerabilities, covering both
rating methods from FirmXRay [77] and updating the corpus, crypto misuses and non-crypto-related vulnerabilities such as
ChkUp could handle bare-metal firmware too. Our technique downgrade attacks. From the technical perspective, ChkUp
is not vendor-specific; if ChkUp supports the firmware type, it tackles new challenges, including identifying long sequences
can analyze it. Though FPs can occur, our system can still of- of inter-process invocations as well as discovering and vali-
fer insights like firmware update paths. Extending the corpus dating firmware update-specific semantic bugs.
can further improve the accuracy of ChkUp.

10 Conclusion
9 Related Work
In this paper, we present ChkUp, a novel approach for detect-
Firmware Update Security. Recent studies have revealed ing firmware update vulnerabilities, including missing and
security concerns in firmware or software update mecha- improper verification during updates. Specifically, ChkUp
nisms [15, 17, 21, 57, 63, 70]. Notably, the prevalent use of resolves firmware update execution paths through cross-
insecure protocols like HTTP can expose update processes language inter-process control flow analysis and program
to MITM and backdoor attacks [17, 63]. Moreover, there are slicing. Then, firmware verification procedures are identified
demonstrated firmware modification attacks by exploiting through syntactic, structural, and semantic program analysis.
update procedure weaknesses [21, 70]. Both academics and These procedures along with the corresponding execution
the Internet Engineering Task Force (IETF) are addressing paths are further examined based on our defined criteria to
these concerns by developing secure firmware update strate- detect vulnerabilities. To reduce false positives, alerts for
gies [46, 53] and efficient hotpatching solutions [35, 54]. emulatable firmware images are validated dynamically with
Vulnerability Detection in Firmware. Firmware vulner- a patching-based method, while others are validated manu-
ability detection is broadly divided into three categories. ally. ChkUp is implemented and employed to analyze 12,000
The first group [37, 40, 68, 77, 82, 87] detects vulnerabili- firmware images, with subsequent validation of alerts for 150
ties by identifying discrepancies between specifications and firmware images from 33 device families. The results show
actual implementations. For example, FirmXRay [77] uncov- that ChkUp can identify zero-day and n-day vulnerabilities,
ers Bluetooth layer vulnerabilities using specification knowl- leading to the assignment of 25 CVE IDs and one PSV ID.

USENIX Association 32nd USENIX Security Symposium 5639


References [20] Andrei Costin et al. A {Large-Scale} analysis of the
security of embedded firmwares. In USENIX Security,
[1] LibCRC – Open Source CRC Library in C. https: 2014.
//www.libcrc.org/.
[21] Ang Cui et al. When firmware modifications attack: A
[2] Libcrypto API. https://wiki.openssl.org/index. case study of embedded exploitation. NDSS, 2013.
php/Libcrypto_API.
[22] Piotr Dabkowski. Js2py: Javascript to python transla-
[3] LibTom. https://www.libtom.net/LibTomCrypt/.
tor. https://github.com/PiotrDabkowski/Js2Py,
[4] Mbed Crypto. https://os.mbed.com/docs/ 2022.
mbed-os/v6.16/apis/mbed-crypto.html.
[23] Yaniv David et al. Similarity of binaries through re-
[5] Nettle - a low-level cryptographic library. https:// optimization. In PLDI, 2017.
www.lysator.liu.se/~nisse/nettle/.
[24] Yaniv David et al. Firmup: Precise static detection of
[6] wolfSSL. https://www.wolfssl.com/doxygen/. common vulnerabilities in firmware. ASPLOS, 2018.
[7] zlib. https://www.zlib.net/.
[25] Drew Davidson et al. {FIE} on firmware: Finding vul-
[8] CVE-2018-3926. https://nvd.nist.gov/vuln/ nerabilities in embedded systems using symbolic execu-
detail/CVE-2018-3926, 2018. tion. In USENIX Security, 2013.

[9] CVE-2021-3166. https://nvd.nist.gov/vuln/ [26] Manuel Egele et al. An empirical study of cryptographic
detail/CVE-2021-3166, 2021. misuse in android applications. In CCS, 2013.

[10] Embedded devices market size report. https: [27] Mohamed Elsabagh et al. {FIRMSCOPE}: Automatic
//www.marketsandmarkets.com/Market-Reports/ uncovering of {Privilege-Escalation} vulnerabilities in
embedded-system-market-98154672.html, 2022. {Pre-Installed} apps in android firmware. In USENIX
Security, 2020.
[11] Firmware-mod-kit. https://github.com/rampageX/
firmware-mod-kit, 2022. [28] Sebastian Eschweiler et al. discovre: Efficient cross-
[12] Internet of things (iot) top 10 2018. https: architecture identification of bugs in binary code. In
//wiki.owasp.org/index.php/OWASP_Internet_ NDSS, 2016.
of_Things_Project#tab=IoT_Top_10, 2022.
[29] Sascha Fahl et al. Why eve and mallory love android:
[13] Network analysis in python. https://github.com/ An analysis of android ssl (in) security. In CCS, 2012.
networkx/networkx, 2022.
[30] Andrew Fasano et al. Sok: Enabling security analyses
[14] National Security Agency. Ghidra software re- of embedded systems via rehosting. In ASIACCS, 2021.
verse engineering framework. https://github.com/
NationalSecurityAgency/ghidra, 2022. [31] Johannes Feichtner et al. Automated binary analysis
on ios: a case study on cryptographic misuse in ios
[15] Omar Alrawi et al. Sok: Security evaluation of home- applications. In WiSec, 2018.
based iot deployments. In S&P, 2019.
[32] Farhaan Fowze et al. Proxray: Protocol model learning
[16] Amit Seal Ami et al. Why crypto-detectors fail: A
and guided firmware analysis. IEEE Trans. Softw. Eng.,
systematic evaluation of cryptographic misuse detection
2019.
techniques. In S&P, 2022.
[17] Anthony Bellissimo et al. Secure software updates: [33] Jian Gao et al. Vulseeker: A semantic learning based
Disappointments and new challenges. In HotSec, 2006. vulnerability seeker for cross-platform binary. In ASE,
2018.
[18] Daming D Chen et al. Towards automated dynamic
analysis for linux-based embedded firmware. In NDSS, [34] Fabio Gritti et al. Heapster: Analyzing the security of
2016. dynamic allocators for monolithic firmware images. In
S&P, 2022.
[19] Libo Chen et al. Sharing more and checking less: Lever-
aging common input keywords to detect bugs in embed- [35] Yi He et al. Rapidpatch: Firmware hotpatching for real-
ded systems. In USENIX Security, 2021. time embedded devices. In USENIX Security, 2022.

5640 32nd USENIX Security Symposium USENIX Association


[36] Grant Hernandez et al. Firmusb: Vetting usb device [51] Carlo Meijer et al. Where’s crypto?: Automated identi-
firmware using domain informed symbolic execution. fication and classification of proprietary cryptographic
In CCS, 2017. primitives in binary code. In USENIX Security, 2021.

[37] Grant Hernandez et al. {BigMAC}:{Fine-Grained} pol- [52] Charlie Miller et al. Remote exploitation of an unaltered
icy analysis of android firmware. In USENIX Security, passenger vehicle. Black Hat USA, 2015.
2020.
[53] Brendan Moran et al. A firmware update architecture
[38] Grant Hernandez et al. Firmwire: Transparent dynamic for internet of things. Internet Requests for Comments,
analysis for cellular baseband firmware. NDSS, 2022. RFC Editor, RFC 9019, 2021.

[39] Dongkwan Kim et al. Revisiting binary code similar- [54] Christian Niesler et al. Hera: Hotpatching of embedded
ity analysis using interpretable feature engineering and real-time applications. In NDSS, 2021.
lessons learned. IEEE Trans. Softw. Eng., 2022. [55] U.S. Government Accountability Office. Solarwinds
cyberattack demands significant federal and private-
[40] Eunsoo Kim et al. Basespec: Comparative analysis
sector response (infographic). https://www.gao.
of baseband software and cellular specifications for l3
gov/blog/solarwinds-cyberattack-demands\
protocols. In NDSS, 2021.
-significant-federal-and-private-sector-\
[41] Geunwoo Kim et al. Improving cross-platform binary response-infographic, 2021.
analysis using representation learning via graph align- [56] Luca Piccolboni et al. Crylogger: Detecting crypto
ment. In ISSTA, 2022. misuses dynamically. In S&P, 2021.
[42] Mingeun Kim et al. Firmae: Towards large-scale emu- [57] Vijay Prakash et al. Inferring software update practices
lation of iot firmware for dynamic analysis. In ACSAC, on smart home iot devices through user agent analysis.
2020. In SCORED, 2022.
[43] Taegyu Kim et al. Revarm: A platform-agnostic arm [58] Sazzadur Rahaman et al. Cryptoguard: High precision
binary rewriter for security applications. In ACSAC, detection of cryptographic vulnerabilities in massive-
2017. sized java projects. In CCS, 2019.
[44] Taegyu Kim et al. {PASAN}: Detecting peripheral ac- [59] Nilo Redini et al. Karonte: Detecting insecure multi-
cess concurrency bugs within {Bare-Metal} embedded binary interactions in embedded firmware. In S&P,
applications. In USENIX Security, 2021. 2020.

[45] Stefan Krüger et al. Crysl: An extensible approach to [60] ReFirmLabs. binwalk. https://github.com/
validating the correct usage of cryptographic apis. IEEE ReFirmLabs/binwalk/, 2022.
Trans. Softw. Eng., 2019.
[61] Yann Régis-Gianas et al. Morbig: A static parser for
[46] Antonio Langiu et al. Upkit: An open-source, portable, posix shell. In SLE, 2018.
and lightweight update framework for constrained iot
[62] Michael Rushanan et al. Sok: Security and privacy in
devices. In ICDCS, 2019.
implantable medical devices and body area networks.
[47] Pierre Lestringant et al. Automated identification of In S&P, 2014.
cryptographic primitives in binary code with data flow [63] Justin Samuel et al. Survivable key compromise in
graph isomorphism. In ASIACCS, 2015. software update systems. In CCS, 2010.
[48] Wen Li et al. Understanding language selection in [64] Paria Shirani et al. Binarm: Scalable and efficient detec-
multi-language software projects on github. In ICSE- tion of vulnerabilities in firmware images of intelligent
Companion, 2021. electronic devices. In DIMVA, 2018.
[49] Bingchang Liu et al. αdiff: cross-version binary code [65] Yan Shoshitaishvili et al. Firmalice-automatic detec-
similarity detection with dnn. In ASE, 2018. tion of authentication bypass vulnerabilities in binary
firmware. In NDSS, 2015.
[50] Andrea Marcelli et al. How machine learning is solving
the binary function similarity problem. In USENIX [66] Yan Shoshitaishvili et al. Sok:(state of) the art of war:
Security, 2022. Offensive techniques in binary analysis. In S&P, 2016.

USENIX Association 32nd USENIX Security Symposium 5641


[67] David Sounthiraraj et al. Smv-hunter: Large scale, auto- [84] Zhiyuan Yu et al. Security and privacy in the emerging
mated detection of ssl/tls man-in-the-middle vulnerabil- cyber-physical world: A survey. IEEE Commun. Surv.
ities in android apps. In NDSS, 2014. Tutor., 2021.

[68] Lin Tan et al. Autoises: Automatically inferring secu- [85] Li Zhang et al. {CryptoREX}: Large-scale analysis of
rity specification and detecting violations. In USENIX cryptographic misuse in {IoT} devices. In RAID, 2019.
Security, 2008.
[86] Ruide Zhang et al. Augauth: Shoulder-surfing resistant
[69] EE Times. 2019 embedded markets study. authentication for augmented reality. In ICC, 2017.
https://www.embedded.com/wp-content/
uploads/2019/11/EETimes_Embedded_2019_ [87] Yue Zhang et al. When good becomes evil: Tracking
Embedded_Markets_Study.pdf, 2019. bluetooth low energy devices via allowlist-based side
channel and its countermeasure. In CCS, 2022.
[70] Ryan Tsang et al. Fandemic: Firmware attack construc-
tion and deployment on power management integrated [88] Binbin Zhao et al. A large-scale empirical analysis of
circuit and impacts on iot applications. In NDSS, 2022. the vulnerabilities introduced by third-party components
in iot firmware. In ISSTA, 2022.
[71] Julian R Ullmann. An algorithm for subgraph isomor-
phism. J. ACM, 1976.
A Additional Design Details
[72] Jinwen Wang et al. Rt-tee: Real-time system availability
for cyber-physical systems using arm trustzone. In S&P, A.1 Path Explosion Reduction
2022.
Five path explosion reduction strategies are utilized in ChkUp.
[73] Jinwen Wang et al. Ari: Attestation of real-time mission The initial three are applied during UFGs construction, and
execution integrity. In USENIX Security, 2023. the last two during backward slicing. 1) Excluding non-crypto
standard libraries: Standard libraries are omitted from UFG
[74] Jinwen Wang et al. IP Protection in TinyML. In DAC, construction to reduce complexity, with an exception for cryp-
2023. tographic libraries for vulnerability identification without sig-
nificantly increasing UFG complexity. 2) Omitting built-in
[75] Shuai Wang et al. In-memory fuzzing for binary code
utilities: During firmware updates, known built-in utility pro-
similarity analysis. In ASE, 2017.
grams (e.g., mtd, reboot) are executed, negating the need for
[76] [Website]. ChkUp. https://fw-chkup.github.io. control flow analysis. 3) Implementing a timeout: If UFG gen-
eration takes excessive time, a timeout strategy limits UFG
[77] Haohuang Wen et al. Firmxray: Detecting bluetooth complexity and the inclusion of FP paths. 4) Applying path
link layer vulnerabilities from bare-metal firmware. In filtering: Execution paths are refined by filtering based on
CCS, 2020. error messages from unsuccessful firmware updates causing
device reboots. 5) Path merging: Post backward slicing, paths
[78] Seongil Wi et al. Hiddencpg: large-scale vulnerable
with identical verification procedures and nodes are merged
clone detection using subgraph isomorphism of code
for their equivalent semantic meanings.
property graphs. In WWW, 2022.

[79] Yueming Wu et al. Detecting semantic code clones by A.2 Function Similarity Matching
building ast-based markov chains model. In ASE, 2022.
Similarity Score Calculation. With extracted features (see
[80] Yuhao Wu et al. Work-in-progress: Measuring security details in Table 4), we can calculate the similarity between
protection in real-time embedded firmware. In RTSS, a function f and a function in the corpus f ′ based on the
2022. relative difference between their feature values [39]. Given
[81] Fabian Yamaguchi et al. Modeling and discovering
Table 4: Syntactic and structural features of functions.
vulnerabilities with code property graphs. In S&P, 2014.
Category Feature
[82] Yuqing Yang et al. Detecting and measuring misconfig- Data Constant # constants, # strings
Instruction # all instructions, # operands, # each type of instructions1
ured manifests in android apps. In CCS, 2022.
CFG # BBs, # edges, # loops, avg. # edges per BB, * BBs, * loops
Function Call # imported calls, # incoming calls, # outgoing calls
[83] Jiawei Yin et al. Finding smm privilege-escalation vul- Misc. # arguments, # API callees, # library references, # code references
nerabilities in uefi firmware with protocol-centric static Note: #: The number of; *: The size of; avg.: average. 1 Instruction type: arithmetic,
analysis. In S&P, 2022. branch, data transfer, logic, and bit-oriented instructions.

5642 32nd USENIX Security Symposium USENIX Association


100 Table 5: Cryptographic functions in the corpus.
Integrity Authentication
80
Metric Value (%)
Library
Proper Improper Proper Improper
SHA256_Update, MD4_Update, RSA_verify, HMAC_Update,
Libcrypto [2] SHA3_absorb, MD5_Update, DSA_verify, CMAC_Update,
60 RIPEMD160_Update, etc. SHA1_Update, etc. ECDSA_do_verify, etc. Poly1305_Update, etc.
Sha256Update, Md4Update, RsaSSL_Verify, HmacUpdate,
wolfCrypt [6] Sha3_512_Update, Md5Update, DsaVerify, CmacUpdate,
RipeMdUpdate, etc. ShaUpdate, etc. ecc_verify_hash, etc. Poly1305Update, etc.
40
sha256_process, md4_process, rsa_verify_hash, hmac_process,
Recall LibTomCrypt [3] sha3_process, md5_process, dsa_verify_hash, omac_process,
Precision rmd160_process, etc. sha1_process, etc. ecc_verify_hash, etc. poly1305_process, etc.
20 sha256_update_ret, md4_update_ret, cipher_cmac_update,
0.0 0.2 0.4 0.6 0.8 1.0 rsa_pkcs1_verify,
Mbed Crypto [4] sha512_process, md5_update_ret, md_hmac_update,
Similarity Score Threshold ripemd160_update_ret, etc. sha1_update_ret, etc.
ecdsa_verify
poly1305_update, etc.
sha256_update, md4_update, rsa_sha512_verify, hmac_sha1_update,
Figure 6: Effect of threshold selection on function matching. Nettle [5] sha3_update, md5_update, dsa_verify, cmac_aes128_update,
ripemd160_update, etc. sha1_update, etc. ecdsa_verify, etc. poly1305_aes_update, etc.

Note: Function names in wolfCrypt, Mbed Crypto, and Nettle are prefixed with wc_,
M features, the feature vectors for f and f ′ are represented as mbedtls_, and nettle_, respectively; Omitted functions implement algorithms from the
f = [x1 , x2 , · · · , xM ] and f′ = [x1′ , x2′ , · · · , xM
′ ], respectively. The same family of algorithms implemented by the listed functions, for example, MD2
′ for improper integrity verification and SHA512 for proper integrity verification.
relative difference δ between xi and xi is
|xi − xi′ | selected an irrelevant function at random for each key function
δ(xi , xi′ ) = . (1)
Max(|xi |, |xi′ |) in every firmware image and calculated its similarity score
From this, the similarity score between the two features is with the corresponding function in our corpus. In total, we
1 − δ(xi , xi′ ). Then, the overall similarity score of f and f ′ is assessed 1,012 functions and generated their respective simi-
defined as the average similarity score for all features by larity scores. Subsequently, we defined a range of similarity
1 score thresholds, ranging from 0 to 1 in increments of 0.05,
γ(f, f′ ) = 1 − (δ(x1 , x1′ ) + δ(x2 , x2′ ) + · · · + δ(xM , xM

)). (2)
M and measured the recall and precision at each threshold. As
The value ranges of and δ(xi , xi′ ) γ(f, f′ )
are 0 to 1. The higher illustrated in Figure 6, a threshold of approximately 0.5 de-
the γ(f, f′ ), the more similar f and f ′ are considered to be. livers an optimal balance between precision and recall, with
Based on this similarity score definition, we assess the sim- both metrics exceeding 90%. Consequently, we selected a
ilarity between each function in the execution paths and each similarity threshold of 0.5.
function in the corpus. The corpus contains Q key functions
f
divided into four sets: San , Sin , Sn , and Scn . Given N recovered A.3 Corpus Statistics
execution paths of a firmware image, we calculate four simi-
f
larity matrices for each path including San , Sin , Sn , and Scn . Each Key functions in the corpus include the functions from open-
matrix represents the similarity scores between functions in source libraries and proprietary functions obtained during
the n-th path and those in the corpus for a specific verification the construction of the ground truth. Overall, the corpus con-
procedure. For instance, the similarity score matrix San for tains 129 functions: 76 from widely used libraries and 53
authenticity verification of the n-th path is defined as that are proprietary. Our observations on key functions used

γ(f1 , f′1 ) γ(f1 , f′2 ) ··· γ(f1 , f′Q )
 for integrity and authentication verification align with previ-
 γ(f2 , f′ )
1 γ(f2 , f′2 ) ··· ′
γ(f2 , fQ )  ous work [85], showing that executable programs generally
San = 
 
 .. .. .. .. ,
 (3) utilize either low-level cryptographic APIs from standard li-
 . . . .
braries or employ self-defined APIs that wrap these low-level

γ(fP , f′1 ) γ(fP , f′2 ) ··· ′
γ(fP , fQ )
APIs. Therefore, our corpus includes common cryptographic
where P is the number of functions in the execution path functions for digest algorithms (such as SHA family, MD
and γ(f p , f′ q ) is the similarity score between the p-th function family, and RIPEMD family), digital signature algorithms
in the execution path and the q-th function in the corpus, as (such as RSA, DSA, and ECDSA), and MAC algorithms (such
calculated using Equation 2. Note that when there is function as HMAC, CMAC, and Poly1305) from standard libraries,
overlap in the execution paths, the similarity score between namely Libcrypto [2], wolfCrypt [6], LibTomCrypt [3], Mbed
each function pair is only calculated once. Crypto [4], Nettle [5]. These functions are further classified
Similarity Score Threshold Selection. In the first stage of into two categories (proper and improper) based on their cor-
the Verification Procedure Recognition, the similarity score responding algorithms (see Table 5 for details). In addition,
threshold should effectively eliminate a substantial number of non-cryptographic digest functions based on CRC, considered
irrelevant functions while preserving the majority of essential weak for integrity verification, from zlib [7] and LibCRC [1]
key functions. This approach can ensure efficient and accurate are included. All these functions are compiled for ARM (both
recognition of the verification procedure in the second stage. 32-bit and 64-bit), MIPS (both 32-bit and 64-bit), and Pow-
To establish this threshold, we calculated numerous similarity erPC (both 32-bit and 64-bit) architectures using the GCC
scores between key functions found in the firmware images compiler with optimization levels ranging from O0 to O3.
from DG and their counterparts in the corpus. Moreover, we Proprietary functions in the corpus include those used for

USENIX Association 32nd USENIX Security Symposium 5643


Disassembly Code Decompiled Code Ground Truth Dataset. Thoroughly evaluating ChkUp and
01.lw s0,0x44(s2) 01.iVar5 = *(int *)(param_1
02.jalr t9=>getProductVer + 0x44); validating its alerts requires manual analysis to establish
- 03.bne s0,v0,LAB_004e9c4c 02.iVar1 = getProductVer(); ground truth, a labor-intensive process even for experts. To es-
+ 04.beq s0,v0,LAB_004e9c4c - 03.if (iVar5 != iVar1) {
+ 04.if (iVar5 == iVar1) {
tablish a ground-truth dataset, DG , we performed a weighted
05._li v1,0x4655
06.clear v1 05. return 0x4655; random sampling of 150 firmware images from DL , prioritiz-
07.LAB_004e9c4c: 06.} ing those from various device families. These selected images
08.lw ra,local_4(sp) 07.return 0;
09.move v0,v1 originate from 33 different device families of the eight leading
10.jr ra
vendors. Then, manual analysis is performed by four security
Listing 4: A compatibility verification patch example. research field experts to build ground truth for DG (for details,
refer to Appendix B.2). With the ground truth, DG is used to
freshness and compatibility verification and developer-defined evaluate the effectiveness of ChkUp in Section 6 and validate
functions for integrity and/or authenticity that significantly the generated alerts by ChkUp in Section 7.
differ from open-source alternatives. These functions were
collected through our manual analysis while constructing the
ground truth using firmware images from eight different ven- B.2 Manual Firmware Analysis
dors, as outlined in DG . During this process, we observed
that key functions in firmware images from the same device Assumption. Since a ground truth dataset is theoretically im-
family or vendor tend to be similar. For example, all firmware possible to obtain, we have to assume that our manual analysis
images from the Asus DSL-N families in DG feature sim- and cross-validation among different members provide a good
ilar key functions. As a result, the proprietary functions in approximation to the ground truth.
the corpus can be scaled across numerous firmware images Approach. A four-member team of experienced security re-
besides their source firmware images. To facilitate vulnera- searchers performed the analysis. One team member is a
bility discovery, all functions are classified into proper and senior computer security researcher with almost 20 years of
improper categories, based on whether the information used experience, while the other three members have about 7 years
for verification is protected as mentioned in Section 4.3. of exposure to computer security. Given a firmware image,
the objective is to figure out its firmware update mechanism
A.4 Patching Example by resolving all the invoked programs and their execution
sequences, as well as all the critical verification procedures.
Listing 4 shows the disassembled and decompiled code for To this end, we use a multi-step, documentation-supported ap-
the compatibility verification in a TP-Link firmware image. proach. First, we search all the programs suspected of contain-
This procedure can be bypassed with our patching method. ing firmware update functionalities based on firmware update-
related keywords like firmware update and firmware upgrade.
B Firmware Collection and Manual Analysis We perform a careful review of any available firmware docu-
mentation and README files. These resources are invaluable
B.1 Evaluation Datasets for validating hypotheses about the function of specific bina-
ries, effectively compensating for the absence of embedded
Firmware Collection and Unpacking. We developed a web information. For instance, if the screenshots of update in-
crawler to collect firmware images primarily from official terfaces are provided in the documents, the corresponding
websites of major vendors in areas like network devices, cam- front-end programs can be easily found.
eras, and smart home devices [80]. The collected 157,141 Secondly, we conduct a control flow analysis to understand
firmware images from 204 vendors were then unpacked using the program execution sequence using the binary analysis
Binwalk [60] to extract the file system, kernel, and bootloader, tool, Ghidra. Data flow analysis, facilitated by angr, reveals
successfully unpacking 111,958 firmware images (statistics how data moves and changes within the programs, provid-
can be found in our website [76]). Given the differences in ing insights into critical verification procedures. For emulat-
firmware update mechanisms across various vendors, building able firmware images, we also perform emulation to record
a ground truth for evaluating ChkUp with firmware images the firmware update execution paths. Finally, we perform
from all vendors demands substantial manual work. Thus, we cross-validation to mitigate the limitations in analysis accu-
built a large-scale dataset, DL , by randomly sampling 12,000 racy stemming from the varying experiences of researchers.
(just over 10%) firmware images consisting of eight leading Specifically, team members independently assess the same bi-
vendors (i.e., Netgear, TP-Link, D-Link, TRENDnet, Asus, naries and subsequently reconcile their findings. This process
Ubiquiti, Zyxel, and Linksys) from our collection of unpacked depends on internal peer reviews and external documentation
firmware images. The eight vendors have significant market to arbitrate discrepancies. Consequently, our approach guar-
share in the embedded device market, notably holding over antees a comprehensive examination, establishing a robust
60% market share in the wireless router market. foundation for a reliable ground truth.

5644 32nd USENIX Security Symposium USENIX Association

You might also like