Usenixsecurity24 Wu Yuhao
Usenixsecurity24 Wu Yuhao
1 Our artifacts are available at https://fw-chkup.github.io Figure 1: CVSSv3 scores of firmware update-related CVEs.
Vxx.xx.xx Device
Author Firmware Server Device User/
Delivery Interfaces Maintainer Version Device ID
four times of low and medium ones. This increase can be firmware component, known as the update agent [46], han-
attributed to not only the general trend of increasing CVEs dles downloading, verifying, and storing the new image in
but also the newly arising challenges specific to embedded persistent memory. Typically, new firmware can be delivered
systems [15, 30, 62, 72–74, 84, 86]. These challenges include in three ways: 1) The firmware server directly pushes the
1) an expanding attack surface due to increased connectiv- firmware and manifest to the device’s update agent; 2) The
ity, 2) increased complexity of embedded systems, 3) long update agent polls for updates and downloads them when
product life cycle, and 4) limited resources on embedded de- they become available; 3) The firmware server notifies the
vices. These factors contribute to diverse firmware update device user/maintainer, who manually downloads and uploads
mechanisms and increased vulnerabilities. updates to the update agent. In terms of communication chan-
nels, common methods include application-layer protocols
(e.g., HTTP, FTP), wireless media (e.g., Wi-Fi, Bluetooth Low
2.2 Update Workflow and Vulnerabilities
Energy), and physical interfaces (e.g., USB, removable mem-
A typical firmware update workflow has four phases as shown ory cards). For some low-end, bare-metal devices, companion
in Figure 2: generation, delivery, verification, and installation. apps on smartphones can assist with firmware updates. The
Generation Phase. The goal of the generation phase is to new firmware can either be bundled within or fetched by the
create firmware images and make them available. Specifically, app from the firmware server through application-layer proto-
an author first develops a new firmware image and a mani- cols. Then, these apps generally communicate with the device
fest. The manifest contains firmware metadata, including the via wireless media for notification, polling, and downloading.
firmware digest, version, and device ID. subsequently, the It is worth noting that while the apps act as intermediaries
firmware image and manifest are signed with a digital sig- for transferring firmware, they may also pre-verify updates.
nature and then transferred to a firmware server through the Such early verification can filter out invalid updates, to avoid
software supply chain. Usually, the author uploads firmware unnecessary, subsequent on-device processing.
to the server through trusted parties in the supply chain. A secure communication channel is important for the con-
During this phase, external attackers can conduct supply fidentiality and integrity of the delivered firmware images.
chain attacks to steal certificates and compromise software Insecure delivery mainly arises from a lack of cryptographic
development tools or infrastructure. The root cause of such protocols or the use of hard-coded keys, exposing systems to
attacks is inappropriate access control for critical assets and in- machine-in-the-middle (MITM) attacks. For example, CVE-
frastructures. There has been an increasing number of supply 2020-9544 involves plain HTTP without authentication, while
chain-related vulnerabilities reported in recent years, demon- CVE-2020-25233 derives from the use of a hard-coded RSA
strating their security impact in the real world. Reports indi- key for communication. Mobile apps can also have insecure
cate that the U.S. Government and more than 30,000 public communication with either firmware servers or devices, as
and private organizations such as Microsoft, Intel, and Fire- shown by CVE-2018-3928, where insufficient communica-
Eye suffered from a large-scale software supply chain attack tion security checks can lead to code execution vulnerabilities.
known as the SolarWinds hack in 2020 [55]. Specifically, Importantly, the primary concern is not just the communica-
cybercriminals compromised Orion IT management software tion channel but also the lack of proper security verification.
and then distributed malicious software updates containing For example, even with a leaked communication key, if a ro-
backdoors to users through the supply chain. bust firmware verification mechanism is in place, malicious
firmware replacements during updates can be prevented.
Takeaway 1: Supply chain vulnerabilities pose a significant Takeaway 2: Firmware delivery security mainly relies on
risk, often from inadequate access controls. Without proper the communication channel and device user/maintainer. If
on-device verification, compromised firmware can be in- either is insecure, the device may receive compromised
stalled in devices, leading to a loss of control over them. firmware unless proper on-device verification is in place.
Delivery Phase. The delivery phase involves transmitting the Verification Phase. The verification phase ensures the authen-
new firmware image from the server to the target device. A ticity, integrity, freshness, and compatibility of the received
firmware. Specifically, the update agent performs a series of Specifically, firmware update-related commands executed dur-
verification procedures before storing the image in persistent ing this phase may accept parameters from user inputs. If an
memory: firmware authenticity is ensured by verifying the attacker manipulates these parameters and they are subse-
digital signature of the firmware; firmware integrity is veri- quently used by vulnerable functions (e.g., system, strcpy),
fied by checking the digest contained in the manifest; and the it can lead to command injection (e.g., CVE-2019-5155) or
freshness and compatibility are confirmed by examining the memory corruption (e.g., CVE-2021-22675) attacks.
metadata, along with version and device ID in the manifest.
Takeaway 4: Incomplete firmware inspection procedures in
Table 1 lists the top ten firmware update-related vulnera-
bootloaders during firmware installation are common, thus
bilities in the Common Weakness Enumeration (CWE) cat-
making the security of firmware updates dependent on the
egory, based on our CVE analysis. The top eight categories,
verification mechanisms in the update agent.
accounting for 47.67%, predominantly involve either missing
verification or improper verification methods for firmware up- Summary. Security vulnerabilities can arise during any phase
dates. These issues can enable attackers to replace the benign of the firmware update process. Nevertheless, robust firmware
firmware with a malicious one during updates. For instance, verification mechanisms by the device’s update agent can
the issue with CVE-2018-10988 stems from a lack of digital mitigate the majority of vulnerabilities originating from other
signature verification in the shell script used for firmware phases. Hence, our research primarily focuses on identifying
updates. Missing or improper integrity verification may lead vulnerabilities within the verification phase.
to firmware corruption. For example, using easily bypassable
internal checksums for firmware integrity checks is problem-
atic (e.g., CVE-2018-5441). Missing or improper freshness 3 Threat Model and Overview
verification can lead to firmware downgrade attacks, while
inadequate compatibility verification can expose the device Threat Model. ChkUp aims to uncover firmware update vul-
to DoS attacks. For instance, the root cause of CVE-2018- nerabilities in OS-based firmware (integrated with file sys-
3891 is a logic flaw in performing version verification, where tems), particularly in the dominant Linux-based firmware [69].
integer comparison operators are incorrectly used for string It can detect the most prevalent firmware update vulnerabil-
comparison. Similarly, in the case of CVE-2020-10831, arbi- ities including missing or improper verification of authen-
trary firmware can be installed due to insufficient verification. ticity, integrity, freshness, and compatibility. Aligned with
existing research [19, 34, 36, 59, 83], we assume no firmware
Takeaway 3: Either missing or improper implementation source code access, making ChkUp a binary-based vulnera-
of any steps in the verification procedure can lead to the bility detection approach. Potential users of ChkUp could be
installation of unintended firmware on the embedded device. security researchers seeking to notify vendors, or end-users
Installation Phase. The installation phase is a process of in- trying to obtain additional security information about their
stalling and executing the new firmware. After verification, the devices. Even vendors with access to the source code can
new firmware is stored in the persistent memory of the device benefit, especially in investigating the exploitability of vul-
and is activated upon reboot. Specifically, a bootloader first nerabilities, since source code analysis can overlook binary
moves the new firmware image to the right offset in the device and runtime-level details. It is worth noting that, similar to
memory when the device is starting up. Then, the bootloader intrusion detection systems or malware detectors, in-depth
executes the new firmware image after conducting a firmware domain expertise proves valuable in further refining alerts.
inspection. However, this inspection is often incomplete and Overview of ChkUp. The high-level idea of ChkUp is to stat-
insecure, commonly relying on internal checksums [46]. ically extract the firmware update program execution paths
Most vulnerabilities in this phase are typical software bugs from the firmware codebase and to pinpoint potential vulner-
such as command injection and memory corruption bugs. abilities along these paths based on summarized vulnerability
Verification Procedures
Execution Paths
Corpus
Node
Entry
Unpacked Firmware
HTML
1 1 Emulator
Vulnerabilities
UFG
Signatures
JS
Function
Alerts
0 1 0
0
FW
Missing or
2 0
2 Improper FW PoC
Reboot Verification
patterns. Then, dynamic vulnerability validation is performed edges between the BBs, and each edge e ∈ E is represented as
to reduce false alerts. However, as discussed in Section 1, e = ([v1 , p1 ], [v2 , p2 ], c). This indicates that BB v1 in program
three primary challenges need to be addressed to implement p1 either transfers execution flow or shares data with BB v2
this idea: C1. Diverse Programs in Update Paths during the in program p2 , and c is a flag indicating the type of edge: c is
extraction of firmware update execution paths, C2. Verifica- 0 for intra-process control flow edges, c is 1 for IPC relations,
tion Procedure Recognition for matching vulnerability pat- and c is 2 for program invocation relations.
terns, and C3. Vulnerability Validation to reduce false alerts.
Update Entry Finding. Receiving firmware is typically the
To address these challenges, we propose ChkUp (illustrated
first step in an update procedure on the device side. There-
in Figure 3). Specifically, to address C1, we first create a UFG
fore, the entry node in a UFG is the node responsible for
that captures the control flow information across programs
this task, and the program that includes this entry node is re-
written in different programming languages. Next, we per-
ferred to as the entry program. For firmware containing a web-
form backward program slicing to determine the firmware
based update interface, the entry program could be a front-end
update execution paths (Section 4.1). To tackle C2, we extract
firmware upload utility. To identify the entry program, we
syntactic and structural features for function matching, then
use a static pattern-matching approach. This is non-trivial as
employ more sophisticated DFG isomorphism to recognize
firmware update mechanisms vary greatly among different
the verification chains in the firmware update execution paths
vendors. To address this issue, we manually analyze numer-
(Section 4.2). With the execution paths and the associated
ous firmware images (details are provided in Appendix B.2)
verification procedures, we examine them to discover vulner-
and identify distinct patterns that differentiate firmware up-
abilities based on defined criteria (Section 4.3). Finally, we
date entry programs from others. Specifically, such an entry
address C3 by a patching-based method where the vulnerable
program always contains recognizable code patterns. For ex-
procedure is tested using the generated PoCs after its execu-
ample, front-end programs to upload firmware images might
tion dependencies are bypassed via patching (Section 4.4).
use the <input type="file"...> pattern, while scripts or bi-
naries that download firmware images might use the wget ...
4 Design of ChkUp pattern. Additionally, the entry program often displays prompt
messages containing common informative words, as well as
function and variable names (e.g., fw_version and fw_upload)
4.1 Execution Path Recovery related to firmware updates. Built upon these observations,
we identify the program that matches the highest number of
UFG Definition. The binary dependency graph (BDG) in
predefined patterns as the entry program.
Karonte [59] can model data dependencies between bina-
ries within firmware images, which is crucial for firmware Cross-language Control Flow Analysis. After identifying
update vulnerability detection. However, accurate firmware the entry program, the next step is to locate the programs for
update execution path recovery requires additional infor- processing the received firmware image. These programs can
mation, including control flows, IPCs, and program invoca- take different forms, such as binaries and shell scripts [48].
tions across programs in various languages to determine the To gain complete insight into the control flow of the programs
intra- and inter-process control flows of firmware update- executed during a firmware update, cross-language control
related programs. Therefore, building upon BDG, we intro- flow analysis is necessary. However, current path exploration-
duce UFG to accommodate these requirements. A UFG, rep- based vulnerability detection methods [19, 59] lack this ca-
resented by G, is a directed graph that captures the intra- pacity. To address this challenge, we interpret the control flow
and inter-process control flow information at the basic block logic, IPC paradigms, and program invocation paradigms of
(BB) level of firmware update-related programs. The UFG various program types (i.e., HTML paired with JavaScript,
is defined as G = (V, E), where V is a set of BBs extracted shell script, and binary) commonly used in firmware updates
from front-end programs, scripts, and binaries involved in to construct UFGs. Specifically, we build the call graph (CG)
the firmware update procedure. E represents the directed and inter-procedure control flow graph (CFG) of the entry
Number of Procedures
75
30 Verification Procedure Recognition
Analysis Time(s)
50 600
15
25
0 0
NG TPL DL TN Authenticity Integrity 400
100 45
75
W/O P1 30
50 200
W/O P2
W/O P3 15
25
W/ All
0 0 0
AS ZX LS UI Freshness Compatibility NG TPL DL TN AS ZX LS UI
(a) Accuracy of update entry identification (b) Metrics of procedure recognition (c) Performance overhead
Figure 4: Evaluation results (NG, TPL, DL, TN, AS, ZX, LS, and UI represent Netgear, TP-Link, D-Link, TRENDnet, Asus,
Zyxel, Linksys, and Ubiquiti, respectively).
created by randomly sampling 12,000 firmware images from the Execution Path Recovery module on firmware images
eight major vendors, including Netgear, TP-Link, D-Link, from DG , ChkUp takes an average of 126.0 seconds for each
TRENDnet, Asus, Ubiquiti, Zyxel, and Linksys. To evaluate image (see Figure 4c). As a result, of the 150 generated UFGs,
the effectiveness of ChkUp and validate its alerts, a ground- 122 are both firmware update sound and complete. 136 UFGs
truth dataset, DG , was created by sampling from DL . The are sound since since every edge in UFGs represents control
construction of the ground truth for the firmware images in flows or IPC paradigms in the update procedures. However,
DG was undertaken by four security experts through manual UFGs from 7 Asus, 4 D-Link, and 3 Zyxel firmware images
analysis. For details on the dataset construction and manual are unsound, yielding unrelated IPC paradigms. 133 UFGs
analysis, refer to Appendix B.1 and Appendix B.2. are complete, containing all related control flows and IPC
Experimental Environment. The evaluation was conducted paradigms. However, UFGs from 9 Netgear firmware images
on a server with an Ubuntu 18.04 LTS OS and an AMD EPYC are incomplete due to misidentified update entries. Note that
7302P CPU with 64GB of RAM. the 3 above-mentioned UFGs from Zyxel firmware images are
also incomplete due to a mismatch between update entries and
back-end handlers. The rest, from 5 TP-Link firmware images,
6.1 Effectiveness of Execution Path Recovery are incomplete due to timeouts during UFG construction.
The effectiveness of the Execution Path Recovery module is We found that sound and complete UFGs always lead to
assessed by the accuracy of the update entry finding as well the recovery of correct paths, while unsound or incomplete
as the correctness of recovered execution paths. UFGs might introduce incorrect paths or overlook the correct
ones during backward slicing. For example, the 3 unsound
Accuracy of Update Entry Finding. An update entry pro- and incomplete UFGs of Zyxel firmware only contain reboot
gram is identified using three types of patterns: prompt mes- function invocations that were intended for other device man-
sage (P1), variable and function name (P2), and common code agement functionalities (e.g., applying new configurations).
(P3). To evaluate the effectiveness of each type of pattern, we Despite this, most incorrect paths do not influence the vulnera-
performed an ablation study, assessing the correctness of the bility discovery process, as they contain relatively incomplete
identified entry programs under different settings, specifically, verification procedures and are filtered out as long as correct
without P1, without P2, without P3, and with all patterns. paths are also found during the vulnerability discovery. Also,
Evaluation results for firmware from each vendor in DG are all the overlooked correct paths still contain a reboot step and
shown in Figure 4a. The highest accuracy is achieved when can be identified once the complete UFGs are constructed.
all patterns are used, and the lowest when only P1 and P3
are employed. This indicates that P2 is the most essential for
this process, as its absence often leads to significant perfor- 6.2 Effectiveness of Procedure Recognition
mance drops across firmware images from most vendors. The Upon evaluating the Verification Procedure Recognition mod-
impact of P1 and P3 varies by vendor. For instance, P1 signif- ule on DG , ChkUp recognizes verification procedures for
icantly impacts firmware images from TP-Link, TRENDnet, each firmware image in an average of 216.1 seconds (see
and Zyxel, while P3 is vital for those from Netgear and Asus. Figure 4c). The results of recognizing different categories
Importantly, when using all patterns, the identification demon- of verification procedures are shown in Figure 4b. Note that
strates its robustness by accurately identifying update entry the eight columns in each category represent the results of
programs of most firmware images. While 9 Netgear firmware firmware images from Netgear, TP-Link, D-Link, TRENDnet,
images had their update entries misidentified due to limited Asus, Zyxel, Linksys, and Ubiquiti, respectively. In summary,
semantic information, most of these misidentified programs there are 461 true positives (TPs), 45 false negatives (FNs),
are still related to firmware updates but handle update roles and 17 FPs. Fewer authenticity verification procedures are rec-
other than firmware delivery. ognized because some execution paths indeed lack a firmware
Correctness of Execution Path Recovery. Upon executing authenticity verification procedure, based on our analysis.
2620
O Fir Sy Ada ss Mo xte Ca Sw Ro
r a h
738
Total Alerts
or
m
716
n
N
479
d
k
t
or
277
P
et
r
N
h k ce
277
insecure update mechanisms, while the remaining firmware
es or Ac
168
th ew st
884
e
N
10 Conclusion
9 Related Work
In this paper, we present ChkUp, a novel approach for detect-
Firmware Update Security. Recent studies have revealed ing firmware update vulnerabilities, including missing and
security concerns in firmware or software update mecha- improper verification during updates. Specifically, ChkUp
nisms [15, 17, 21, 57, 63, 70]. Notably, the prevalent use of resolves firmware update execution paths through cross-
insecure protocols like HTTP can expose update processes language inter-process control flow analysis and program
to MITM and backdoor attacks [17, 63]. Moreover, there are slicing. Then, firmware verification procedures are identified
demonstrated firmware modification attacks by exploiting through syntactic, structural, and semantic program analysis.
update procedure weaknesses [21, 70]. Both academics and These procedures along with the corresponding execution
the Internet Engineering Task Force (IETF) are addressing paths are further examined based on our defined criteria to
these concerns by developing secure firmware update strate- detect vulnerabilities. To reduce false positives, alerts for
gies [46, 53] and efficient hotpatching solutions [35, 54]. emulatable firmware images are validated dynamically with
Vulnerability Detection in Firmware. Firmware vulner- a patching-based method, while others are validated manu-
ability detection is broadly divided into three categories. ally. ChkUp is implemented and employed to analyze 12,000
The first group [37, 40, 68, 77, 82, 87] detects vulnerabili- firmware images, with subsequent validation of alerts for 150
ties by identifying discrepancies between specifications and firmware images from 33 device families. The results show
actual implementations. For example, FirmXRay [77] uncov- that ChkUp can identify zero-day and n-day vulnerabilities,
ers Bluetooth layer vulnerabilities using specification knowl- leading to the assignment of 25 CVE IDs and one PSV ID.
[9] CVE-2021-3166. https://nvd.nist.gov/vuln/ [26] Manuel Egele et al. An empirical study of cryptographic
detail/CVE-2021-3166, 2021. misuse in android applications. In CCS, 2013.
[10] Embedded devices market size report. https: [27] Mohamed Elsabagh et al. {FIRMSCOPE}: Automatic
//www.marketsandmarkets.com/Market-Reports/ uncovering of {Privilege-Escalation} vulnerabilities in
embedded-system-market-98154672.html, 2022. {Pre-Installed} apps in android firmware. In USENIX
Security, 2020.
[11] Firmware-mod-kit. https://github.com/rampageX/
firmware-mod-kit, 2022. [28] Sebastian Eschweiler et al. discovre: Efficient cross-
[12] Internet of things (iot) top 10 2018. https: architecture identification of bugs in binary code. In
//wiki.owasp.org/index.php/OWASP_Internet_ NDSS, 2016.
of_Things_Project#tab=IoT_Top_10, 2022.
[29] Sascha Fahl et al. Why eve and mallory love android:
[13] Network analysis in python. https://github.com/ An analysis of android ssl (in) security. In CCS, 2012.
networkx/networkx, 2022.
[30] Andrew Fasano et al. Sok: Enabling security analyses
[14] National Security Agency. Ghidra software re- of embedded systems via rehosting. In ASIACCS, 2021.
verse engineering framework. https://github.com/
NationalSecurityAgency/ghidra, 2022. [31] Johannes Feichtner et al. Automated binary analysis
on ios: a case study on cryptographic misuse in ios
[15] Omar Alrawi et al. Sok: Security evaluation of home- applications. In WiSec, 2018.
based iot deployments. In S&P, 2019.
[32] Farhaan Fowze et al. Proxray: Protocol model learning
[16] Amit Seal Ami et al. Why crypto-detectors fail: A
and guided firmware analysis. IEEE Trans. Softw. Eng.,
systematic evaluation of cryptographic misuse detection
2019.
techniques. In S&P, 2022.
[17] Anthony Bellissimo et al. Secure software updates: [33] Jian Gao et al. Vulseeker: A semantic learning based
Disappointments and new challenges. In HotSec, 2006. vulnerability seeker for cross-platform binary. In ASE,
2018.
[18] Daming D Chen et al. Towards automated dynamic
analysis for linux-based embedded firmware. In NDSS, [34] Fabio Gritti et al. Heapster: Analyzing the security of
2016. dynamic allocators for monolithic firmware images. In
S&P, 2022.
[19] Libo Chen et al. Sharing more and checking less: Lever-
aging common input keywords to detect bugs in embed- [35] Yi He et al. Rapidpatch: Firmware hotpatching for real-
ded systems. In USENIX Security, 2021. time embedded devices. In USENIX Security, 2022.
[37] Grant Hernandez et al. {BigMAC}:{Fine-Grained} pol- [52] Charlie Miller et al. Remote exploitation of an unaltered
icy analysis of android firmware. In USENIX Security, passenger vehicle. Black Hat USA, 2015.
2020.
[53] Brendan Moran et al. A firmware update architecture
[38] Grant Hernandez et al. Firmwire: Transparent dynamic for internet of things. Internet Requests for Comments,
analysis for cellular baseband firmware. NDSS, 2022. RFC Editor, RFC 9019, 2021.
[39] Dongkwan Kim et al. Revisiting binary code similar- [54] Christian Niesler et al. Hera: Hotpatching of embedded
ity analysis using interpretable feature engineering and real-time applications. In NDSS, 2021.
lessons learned. IEEE Trans. Softw. Eng., 2022. [55] U.S. Government Accountability Office. Solarwinds
cyberattack demands significant federal and private-
[40] Eunsoo Kim et al. Basespec: Comparative analysis
sector response (infographic). https://www.gao.
of baseband software and cellular specifications for l3
gov/blog/solarwinds-cyberattack-demands\
protocols. In NDSS, 2021.
-significant-federal-and-private-sector-\
[41] Geunwoo Kim et al. Improving cross-platform binary response-infographic, 2021.
analysis using representation learning via graph align- [56] Luca Piccolboni et al. Crylogger: Detecting crypto
ment. In ISSTA, 2022. misuses dynamically. In S&P, 2021.
[42] Mingeun Kim et al. Firmae: Towards large-scale emu- [57] Vijay Prakash et al. Inferring software update practices
lation of iot firmware for dynamic analysis. In ACSAC, on smart home iot devices through user agent analysis.
2020. In SCORED, 2022.
[43] Taegyu Kim et al. Revarm: A platform-agnostic arm [58] Sazzadur Rahaman et al. Cryptoguard: High precision
binary rewriter for security applications. In ACSAC, detection of cryptographic vulnerabilities in massive-
2017. sized java projects. In CCS, 2019.
[44] Taegyu Kim et al. {PASAN}: Detecting peripheral ac- [59] Nilo Redini et al. Karonte: Detecting insecure multi-
cess concurrency bugs within {Bare-Metal} embedded binary interactions in embedded firmware. In S&P,
applications. In USENIX Security, 2021. 2020.
[45] Stefan Krüger et al. Crysl: An extensible approach to [60] ReFirmLabs. binwalk. https://github.com/
validating the correct usage of cryptographic apis. IEEE ReFirmLabs/binwalk/, 2022.
Trans. Softw. Eng., 2019.
[61] Yann Régis-Gianas et al. Morbig: A static parser for
[46] Antonio Langiu et al. Upkit: An open-source, portable, posix shell. In SLE, 2018.
and lightweight update framework for constrained iot
[62] Michael Rushanan et al. Sok: Security and privacy in
devices. In ICDCS, 2019.
implantable medical devices and body area networks.
[47] Pierre Lestringant et al. Automated identification of In S&P, 2014.
cryptographic primitives in binary code with data flow [63] Justin Samuel et al. Survivable key compromise in
graph isomorphism. In ASIACCS, 2015. software update systems. In CCS, 2010.
[48] Wen Li et al. Understanding language selection in [64] Paria Shirani et al. Binarm: Scalable and efficient detec-
multi-language software projects on github. In ICSE- tion of vulnerabilities in firmware images of intelligent
Companion, 2021. electronic devices. In DIMVA, 2018.
[49] Bingchang Liu et al. αdiff: cross-version binary code [65] Yan Shoshitaishvili et al. Firmalice-automatic detec-
similarity detection with dnn. In ASE, 2018. tion of authentication bypass vulnerabilities in binary
firmware. In NDSS, 2015.
[50] Andrea Marcelli et al. How machine learning is solving
the binary function similarity problem. In USENIX [66] Yan Shoshitaishvili et al. Sok:(state of) the art of war:
Security, 2022. Offensive techniques in binary analysis. In S&P, 2016.
[68] Lin Tan et al. Autoises: Automatically inferring secu- [85] Li Zhang et al. {CryptoREX}: Large-scale analysis of
rity specification and detecting violations. In USENIX cryptographic misuse in {IoT} devices. In RAID, 2019.
Security, 2008.
[86] Ruide Zhang et al. Augauth: Shoulder-surfing resistant
[69] EE Times. 2019 embedded markets study. authentication for augmented reality. In ICC, 2017.
https://www.embedded.com/wp-content/
uploads/2019/11/EETimes_Embedded_2019_ [87] Yue Zhang et al. When good becomes evil: Tracking
Embedded_Markets_Study.pdf, 2019. bluetooth low energy devices via allowlist-based side
channel and its countermeasure. In CCS, 2022.
[70] Ryan Tsang et al. Fandemic: Firmware attack construc-
tion and deployment on power management integrated [88] Binbin Zhao et al. A large-scale empirical analysis of
circuit and impacts on iot applications. In NDSS, 2022. the vulnerabilities introduced by third-party components
in iot firmware. In ISSTA, 2022.
[71] Julian R Ullmann. An algorithm for subgraph isomor-
phism. J. ACM, 1976.
A Additional Design Details
[72] Jinwen Wang et al. Rt-tee: Real-time system availability
for cyber-physical systems using arm trustzone. In S&P, A.1 Path Explosion Reduction
2022.
Five path explosion reduction strategies are utilized in ChkUp.
[73] Jinwen Wang et al. Ari: Attestation of real-time mission The initial three are applied during UFGs construction, and
execution integrity. In USENIX Security, 2023. the last two during backward slicing. 1) Excluding non-crypto
standard libraries: Standard libraries are omitted from UFG
[74] Jinwen Wang et al. IP Protection in TinyML. In DAC, construction to reduce complexity, with an exception for cryp-
2023. tographic libraries for vulnerability identification without sig-
nificantly increasing UFG complexity. 2) Omitting built-in
[75] Shuai Wang et al. In-memory fuzzing for binary code
utilities: During firmware updates, known built-in utility pro-
similarity analysis. In ASE, 2017.
grams (e.g., mtd, reboot) are executed, negating the need for
[76] [Website]. ChkUp. https://fw-chkup.github.io. control flow analysis. 3) Implementing a timeout: If UFG gen-
eration takes excessive time, a timeout strategy limits UFG
[77] Haohuang Wen et al. Firmxray: Detecting bluetooth complexity and the inclusion of FP paths. 4) Applying path
link layer vulnerabilities from bare-metal firmware. In filtering: Execution paths are refined by filtering based on
CCS, 2020. error messages from unsuccessful firmware updates causing
device reboots. 5) Path merging: Post backward slicing, paths
[78] Seongil Wi et al. Hiddencpg: large-scale vulnerable
with identical verification procedures and nodes are merged
clone detection using subgraph isomorphism of code
for their equivalent semantic meanings.
property graphs. In WWW, 2022.
[79] Yueming Wu et al. Detecting semantic code clones by A.2 Function Similarity Matching
building ast-based markov chains model. In ASE, 2022.
Similarity Score Calculation. With extracted features (see
[80] Yuhao Wu et al. Work-in-progress: Measuring security details in Table 4), we can calculate the similarity between
protection in real-time embedded firmware. In RTSS, a function f and a function in the corpus f ′ based on the
2022. relative difference between their feature values [39]. Given
[81] Fabian Yamaguchi et al. Modeling and discovering
Table 4: Syntactic and structural features of functions.
vulnerabilities with code property graphs. In S&P, 2014.
Category Feature
[82] Yuqing Yang et al. Detecting and measuring misconfig- Data Constant # constants, # strings
Instruction # all instructions, # operands, # each type of instructions1
ured manifests in android apps. In CCS, 2022.
CFG # BBs, # edges, # loops, avg. # edges per BB, * BBs, * loops
Function Call # imported calls, # incoming calls, # outgoing calls
[83] Jiawei Yin et al. Finding smm privilege-escalation vul- Misc. # arguments, # API callees, # library references, # code references
nerabilities in uefi firmware with protocol-centric static Note: #: The number of; *: The size of; avg.: average. 1 Instruction type: arithmetic,
analysis. In S&P, 2022. branch, data transfer, logic, and bit-oriented instructions.
Note: Function names in wolfCrypt, Mbed Crypto, and Nettle are prefixed with wc_,
M features, the feature vectors for f and f ′ are represented as mbedtls_, and nettle_, respectively; Omitted functions implement algorithms from the
f = [x1 , x2 , · · · , xM ] and f′ = [x1′ , x2′ , · · · , xM
′ ], respectively. The same family of algorithms implemented by the listed functions, for example, MD2
′ for improper integrity verification and SHA512 for proper integrity verification.
relative difference δ between xi and xi is
|xi − xi′ | selected an irrelevant function at random for each key function
δ(xi , xi′ ) = . (1)
Max(|xi |, |xi′ |) in every firmware image and calculated its similarity score
From this, the similarity score between the two features is with the corresponding function in our corpus. In total, we
1 − δ(xi , xi′ ). Then, the overall similarity score of f and f ′ is assessed 1,012 functions and generated their respective simi-
defined as the average similarity score for all features by larity scores. Subsequently, we defined a range of similarity
1 score thresholds, ranging from 0 to 1 in increments of 0.05,
γ(f, f′ ) = 1 − (δ(x1 , x1′ ) + δ(x2 , x2′ ) + · · · + δ(xM , xM
′
)). (2)
M and measured the recall and precision at each threshold. As
The value ranges of and δ(xi , xi′ ) γ(f, f′ )
are 0 to 1. The higher illustrated in Figure 6, a threshold of approximately 0.5 de-
the γ(f, f′ ), the more similar f and f ′ are considered to be. livers an optimal balance between precision and recall, with
Based on this similarity score definition, we assess the sim- both metrics exceeding 90%. Consequently, we selected a
ilarity between each function in the execution paths and each similarity threshold of 0.5.
function in the corpus. The corpus contains Q key functions
f
divided into four sets: San , Sin , Sn , and Scn . Given N recovered A.3 Corpus Statistics
execution paths of a firmware image, we calculate four simi-
f
larity matrices for each path including San , Sin , Sn , and Scn . Each Key functions in the corpus include the functions from open-
matrix represents the similarity scores between functions in source libraries and proprietary functions obtained during
the n-th path and those in the corpus for a specific verification the construction of the ground truth. Overall, the corpus con-
procedure. For instance, the similarity score matrix San for tains 129 functions: 76 from widely used libraries and 53
authenticity verification of the n-th path is defined as that are proprietary. Our observations on key functions used
γ(f1 , f′1 ) γ(f1 , f′2 ) ··· γ(f1 , f′Q )
for integrity and authentication verification align with previ-
γ(f2 , f′ )
1 γ(f2 , f′2 ) ··· ′
γ(f2 , fQ ) ous work [85], showing that executable programs generally
San =
.. .. .. .. ,
(3) utilize either low-level cryptographic APIs from standard li-
. . . .
braries or employ self-defined APIs that wrap these low-level
γ(fP , f′1 ) γ(fP , f′2 ) ··· ′
γ(fP , fQ )
APIs. Therefore, our corpus includes common cryptographic
where P is the number of functions in the execution path functions for digest algorithms (such as SHA family, MD
and γ(f p , f′ q ) is the similarity score between the p-th function family, and RIPEMD family), digital signature algorithms
in the execution path and the q-th function in the corpus, as (such as RSA, DSA, and ECDSA), and MAC algorithms (such
calculated using Equation 2. Note that when there is function as HMAC, CMAC, and Poly1305) from standard libraries,
overlap in the execution paths, the similarity score between namely Libcrypto [2], wolfCrypt [6], LibTomCrypt [3], Mbed
each function pair is only calculated once. Crypto [4], Nettle [5]. These functions are further classified
Similarity Score Threshold Selection. In the first stage of into two categories (proper and improper) based on their cor-
the Verification Procedure Recognition, the similarity score responding algorithms (see Table 5 for details). In addition,
threshold should effectively eliminate a substantial number of non-cryptographic digest functions based on CRC, considered
irrelevant functions while preserving the majority of essential weak for integrity verification, from zlib [7] and LibCRC [1]
key functions. This approach can ensure efficient and accurate are included. All these functions are compiled for ARM (both
recognition of the verification procedure in the second stage. 32-bit and 64-bit), MIPS (both 32-bit and 64-bit), and Pow-
To establish this threshold, we calculated numerous similarity erPC (both 32-bit and 64-bit) architectures using the GCC
scores between key functions found in the firmware images compiler with optimization levels ranging from O0 to O3.
from DG and their counterparts in the corpus. Moreover, we Proprietary functions in the corpus include those used for