Packet Sequence Oriented Fuzzing For Protocol Implementations
Packet Sequence Oriented Fuzzing For Protocol Implementations
Zhengxiong Luo1 , Junze Yu1 , Feilong Zuo1 , Jianzhong Liu1 , Yu Jiang1, B , Ting Chen2 ,
Abhik Roychoudhury3 , and Jiaguang Sun1
1 KLISS, BNRist, School of Software, Tsinghua University
2 University of Electronic Science and Technology of China
3 National University of Singapore
troller (PLC) devices and Internet of Things (IoT) devices. State Trace Transition
Test Priority
Instrumenting the device’s firmware is challenging and even Initilization
infeasible when the implementation is only accessible in a System State Mutation
Abstract Packet
Tracking Graph Operators
Sequence
black-box fashion, which necessitates a noninvasive solution
to obtain feedback for the protocol fuzzers. Analyze 1
Bug Oracle Packet Pattern
2) Highly complex protocol logic. Protocols feature highly Report
Map Sequence
0.8
OpenSSH (SSH) libcoap (CoAP) Dnsmasq (DNS) CycloneDDS (RTPS) Mosquitto (MQTT)
10000 6000 1500 25000
0.7 6000
8000 1 20000
0.6 4000 1000
6000 0.9 15000 4000
0.5
4000 0.8 10000
2000 500 2000
0.4
2000 5000
0.7
0.3 0 0 0 0 0
0:00 6:00 12:00 18:00 24:00 0:00 6:00 12:00 18:00 24:00
0.6 0:00 6:00 12:00 18:00 24:00 0:00 6:00 12:00 18:00 24:00 0:00 6:00 12:00 18:00 24:00
0.2
libiec_iccp_mod (ICCP) GnuTLS (SSL/TLS) 0.5 LibreSSL (SSL/TLS) OpenBGPD (BGP) rudp (RUDP)
5000 4000 6000 120
0.1 2000
4000 0.4 100
3000
0 4000 1500 80
3000
0 0.2 0.4 0.6 0.8 1 0.3
2000 1000 60
2000 0.2 2000 40
1000 500
1000 20
0.1
0 0 0 0 0
0:00 6:00 12:00 18:00 24:00 0:00 6:00 12:00 18:00 24:00
0 0:00 6:00 12:00 18:00 24:00 0:00 6:00 12:00 18:00 24:00 0:00 6:00 12:00 18:00 24:00
0 0.2 0.4 0.6 0.8 1
Time (hh:mm)
Figure 7: The number of unique branches covered (only on the server-side) by different fuzzers on each protocol implementation over ten
24-hour runs. The average number of discovered branches is displayed, alongside with minima and maxima over the individual runs.
6.3 Coverage Analysis ments. This mode modifies the library to disable randomness
and thus is more friendly to traditional fuzzers. Still, B LEEM
Since the compared fuzzing schemes are pretty diverse, we covers more branches than prior techniques on BoringSSL.
need a uniform metric for a fair comparison. Branch coverage Answer to RQ1. Overall, B LEEM can achieve higher cover-
is a commonly used metric to measure the effectiveness of age than existing protocol fuzzers, which means that B LEEM
fuzzers in software testing. Therefore, we use branch cover- can test protocol implementations broadly and deeply.
age as the metric for comparison and utilize LLVM’s Sani-
tizerCoverage [36] to count the number of unique branches 6.4 Bug-Detection Capability
covered by each fuzzer on a target program. To measure the bug detection capability, we adapt B LEEM
Figure 7 shows the branches covered on the server side to fuzz real-world protocol implementations, including open-
by different fuzzers. On average, B LEEM achieved 40.3%, source and closed-source.
35.7%, 23.4%, 48.9%, and 28.5% higher branch coverage Open-Source Targets. We use the number of unique vul-
than Snipuzz, AFLNet, SGFuzz, BooFuzz, and Peach, respec- nerabilities reported by AddressSanitizer [51] and Undefined-
tively, within 24 hours. All results are statistically significant BehaviorSanitizer [13] (a.k.a., ASan and UBSan) as the uni-
according to the Mann–Whitney U test, as recommended form metric. The reason is that the vulnerability detection
by Klees et al. [35]. In 64 out of 69 times, the minimum methods of B LEEM and other fuzzers vary. For example, the
branches achieved by B LEEM exceed the maximum branches classic black-box fuzzer Peach typically detects vulnerabil-
of the prior approaches. In other words, even the worst run ities by checking the liveness of under-test service via port
of B LEEM performs better than the best run of other prior probing. However, not all the vulnerabilities (e.g., some buffer-
approaches, except for partial runs on Dnsmasq, IEC104, and overflow vulnerabilities) will crash the program. Therefore,
OpenBGPD. The reason is that these protocols employ simple we utilize ASan and UBSan to enhance the target program and
interactive logic and packet format. Hence, other fuzzers also use the crashes identified by different fuzzers as the metric
did well on it. Even so, B LEEM also outperforms them on the to represent their vulnerability detection ability. Furthermore,
average branches achieved. DTLS and SSH employ random some Sanitizer-reported crashes may result from the same
nonces to provide replay protection [4, 39], as mentioned in root cause. To eliminate duplicate entries, we utilize the stack
§1. When adapted to their implementations, i.e., OpenSSL and traces in the Sanitizer report for bug deduplication and only
OpenSSH, existing protocol fuzzers have difficulty in complet- consider unique vulnerabilities.
ing the handshake, thus covering fewer branches. In contrast, B LEEM has detected 15 new vulnerabilities in several ex-
B LEEM can penetrate deeply into the protocol logic by utiliz- tensively used implementations of well-known protocols, with
ing the intercepted interactive traffic and the proposed SSTG. 10 CVE identifiers assigned after a coordinated disclosure.
TLS also employs this mechanism, and the results of other We also tried to reproduce these bugs using the other fuzzers
fuzzers on BoringSSL are fine. The reason is that we enabled based on the similar configuration construction method men-
the fuzzer mode provided by BoringSSL [11] in the experi- tioned in §6.2. Table 1 summarizes the vulnerabilities exposed
by B LEEM and whether other fuzzers can find them. Specif- state traces discovered during the SSTG construction, and the
ically, Peach, BooFuzz, AFLNet, SGFuzz, and Snipuzz can column “Len.” indicates the average length of these paths.
only expose 8, 5, 6, 7, and 5 bugs, a strict subset of the bugs The column “Types” indicates the number of different types
uncovered by B LEEM. These protocol implementations have of abstract packets (after concatenation), which are the ele-
been thoroughly tested, and some of them, e.g., GnuTLS [1] ments of the SUT States. The columns “Nodes” and “Trans.”
and LibreSSL [2], have even been incorporated into the OSS- indicate the state and state transition numbers of the SSTG,
Fuzz [27], which demonstrates the effectiveness of B LEEM respectively. The “Branch Coverage” shows the achieved
in bug detection. Some of these bugs are hard to trigger, and unique branches of the whole SUT, including the coverage
we provide the bug details and a case study in Appendix C. achieved on both sides, which is different from §6.3 because
Table 1: Previously unknown vulnerabilities exposed by both B LEEMRand and B LEEM can test the whole system.
B LEEM and the statistics of the compared fuzzers Note that the “Paths” and “Len.” are not necessarily propor-
Subject Type AFLNet Snipuzz SGFuzz BooFuzz Peach B LEEM CVE ID tional to the complexity of the constructed SSTG, as different
LibreSSL Stack Buffer Overflow CVE-2021-41581
GnuTLS Null Pointer Dereference CVE-2021-4209 transition ways of existing nodes and transitions can trigger a
BoringSSL SIGPIPE -
accel-ppp Stack Buffer Overflow CVE-2021-42870 new path while no new nodes or transitions will be found. The
accel-ppp Stack Buffer Overflow CVE-2021-42054
accel-ppp Memory Leak - table shows that the overall unique paths are finite, indicat-
IEC104 Stack Buffer Overflow CVE-2020-20486
IEC104 Segmentation Violation CVE-2020-18731 ing that our provision for the SSTG construction effectively
rdup Memory Leak CVE-2020-20665
libiec_iccp_mod Heap Buffer Overflow CVE-2020-20490 avoids state space explosion. To further illustrate this, we
libiec_iccp_mod Heap Buffer Overflow CVE-2020-20662
libiec_iccp_mod Heap Buffer Overflow CVE-2020-20663 also tried constructing a long initial sequence using the pro-
OpenBGPD Undefined Behavior -
OpenBGPD Undefined Behavior - vided client utility when testing the Dnsmasq, libcoap, and
mvfst Heap Buffer Overflow -
SUM 6 5 7 5 8 15 10 CVEs Mosquitto. The results demonstrate that the path numbers
Closed-Source Targets. We collected four firmware con- achieved on these projects are also within an acceptable range.
taining vulnerable protocol implementations, as disclosed by Table 3: Statistics about the constructed SSTG and the unique
the CVE dataset [16], from different mainstream IoT manufac- branches achieved by B LEEM and B LEEMRand.
turers to evaluate the performance of the selected black-box SSTG Construction SSTG Metrics Branch
Subject Fuzzer
Paths Len. Types Nodes Trans. Coverage
fuzzers in discovering severe vulnerabilities. These CVEs B LEEMRand 42 4.11 32 72 84 4293
BoringSSL
seriously threaten various devices and are classified as CRIT- B LEEM 75 3.82 69 152 183 4549
B LEEMRand 266 6.84 73 247 397 10512
OpenSSL
ICAL by CVSS 3.x Severity and Metrics (see Table 4 in B LEEM 256 5.96 90 267 442 10614
B LEEMRand 2352 7.83 492 1494 2806 53942
Appendix B). We compared B LEEM against selected black- mvfst
B LEEM 10781 7.99 671 2779 6581 55575
B LEEMRand 101 6.07 14 33 46 1384
box fuzzers and used the network-related monitors to de- accel-ppp
B LEEM 30 4.62 11 25 31 1385
tect crashes by checking the liveness of under-test services IEC104
B LEEMRand 39 5.57 84 113 137 279
B LEEM 49 5.86 88 149 164 321
through port probing. We use the time to first crash as the met- OpenSSH
B LEEMRand 43 5.23 20 59 82 12444
B LEEM 97 5.58 30 104 171 14579
ric to evaluate the bug-detection capability of these fuzzers. B LEEMRand 10013 85.08 325 1340 3400 8292
libcoap
B LEEM 10286 83.42 331 1427 4143 8530
As shown in Table 2, B LEEM achieves the best CVE discovery B LEEMRand 2958 22.60 60 163 511 1271
Dnsmasq
performance compared to other fuzzers. B LEEM and Peach B LEEM 1783 14.62 58 155 413 1292
B LEEMRand 55 4.55 18 67 132 22912
CycloneDDS
can find all of these CVEs, while BooFuzz and Snipuzz can B LEEM 139 4.26 31 153 303 23710
B LEEMRand 19522 15.67 215 1037 2815 9284
find only 3 and 1, respectively. On average, B LEEM can find Mosquitto
B LEEM 20652 13.09 253 1085 3142 10285
B LEEMRand 116 9.05 28 96 155 6059
a crash at least 7.5×, 13.3×, and 87.1× faster than Peach, libiec_iccp_mod
B LEEM 314 9.71 41 139 294 6265
BooFuzz, and Snipuzz, respectively, demonstrating B LEEM’s GnuTLS
B LEEMRand 50 4.66 25 70 98 5057
B LEEM 57 4.12 38 92 114 5222
efficiency boost over the state-of-the-art. LibreSSL
B LEEMRand 209 3.93 67 248 394 5473
B LEEM 196 3.78 100 277 385 6157
Table 2: Average time to expose published CVEs B LEEMRand 222 17.73 35 92 131 2072
OpenBGPD
B LEEM 253 17.21 43 111 169 2086
CVE ID Protocol Snipuzz BooFuzz Peach B LEEM B LEEMRand 149 5.86 21 82 151 112
CVE-2018-5767 HTTP - 25min 34min 6min rudp
B LEEM 154 5.70 30 109 185 115
CVE-2020-25067 UPnP 26min 36s 47s 33s
CVE-2019-14457 HTTP - - 652min 35min From each row of Table 3, the complexity of our proposed
CVE-2019-1663 HTTP - 501min 307min 72min
SSTG is roughly in positive correlation to the packet types
Answer to RQ2. B LEEM is capable of finding unknown and the covered unique branches, indicating that the proposed
bugs effectively in real-world protocol implementations. SSTG can reflect the inner system execution status of the SUT
in some degree. With the help of the guided sequence genera-
6.5 Effectiveness of Sequence Generation tion strategy, B LEEM achieves 5.7% more unique branches
To evaluate the effectiveness of the guided sequence gen- than B LEEMRand on average, and the improvement on the
eration (§5.3), we implemented B LEEMRand, a variant of server is typically comparable to that on the client since they
B LEEM, in which we replaced it with random sequence selec- are mutually reinforcing. We also note that B LEEMRand per-
tion and maintained the SSTG construction for comparison. forms better on Dnsmasq and accel-ppp in the complexity
Table 3 shows the mean value of each metric across repe- of the implemented SSTG. Through investigation, we found
titions. The column “Paths” indicates the number of unique that the logic of the corresponding SUTs is relatively simple
compared with other subjects. The randomly generated long these two packets should own the same DCID field to identify
packet sequences of B LEEMRand can easily trigger more out- the same connection, and the PN field value x of the former
puts of the SUT parties, resulting in a more complex SSTG. packet and the value y of the latter should satisfy y = x + 2. If
Nonetheless, B LEEMRand is hard to reach deep states in the these two packets are semantically incorrect, the server will
protocol implementation without guided packet sequence gen- reject them without a response. In the 24-hour experiments
eration, thus covering fewer branches than B LEEM. across 10 repetitions on mvfst, we found that all the com-
Case Study. To intuitively illustrate how B LEEM imple- pared fuzzers failed to trigger this behavior. Instead, B LEEM
ments guided fuzzing and its effectiveness, we use the ses- generates packets by resorting to the packets provided by the
sion discovered during fuzzing mvfst as a case study. By protocol parties. Therefore, it can easily guarantee the above
executing the selected SUT of mvfst, B LEEM constructed conditions and trigger this behavior (q4 − q7 ) at an early stage.
an initial SSTG in Figure 6 as the basis. Directed by Algo- Meanwhile, B LEEM discovered 410 more successor states on
rithm 1, B LEEM tried to stress q1 with the packet pattern average based on q4 − q7 . This state region is hard to trig-
b ⊕ σP , yielding the packet pattern sequence [a ⊕ σ◦ , b ⊕ σP ]. ger for other fuzzers, but discovering this region effectively
Then B LEEM instantiated it and triggered the session in Fig- contributes to B LEEM’s branch coverage.
ure 8 (in reality the client and server interact with B LEEM’s q0
a⨁𝜎∘
q1
b⨁𝜎∘
q2
c⨁𝜎∘
q3
packet instantiation sub-module, we omit it to facilitate the C(a) | S(Ø) S(b) | C(a) C(c) | S(b) S(d) | C(c)
b⨁𝜎 "
understanding). The mutated Initial[CRYPTO, ACK] (de- q4
a⨁𝜎∘
q5
e⨁𝜎∘
q6
f⨁𝜎∘
q7
noted as Initial[CRYPTO, ACK]* ) transmitted at 2 trig- C(a) | S(b) S(e) | C(a) C(f) | S(e) S(d) | C(f)
gered retransmission of the client ( 3 , this Initial[CRYPTO] Figure 9: The System State Tracking Graph (SSTG) after
differs from 1 at the concrete level ). Through our investi- introducing the new packet pattern (b ⊕ σP ) on q1 .
gation of the implementation logic, after reading the second Answer to RQ3. The guided sequence generation strat-
Initial[CRYPTO], the server immediately responds ACK egy of B LEEM is able to increase the exposure of different
for 3 (Initial[ACK]) instead of packing ACK and CRYPTO protocol behaviors, thus contributing to fuzzing effectiveness.
frames into one Initial packet like 2 . Then, it invokes
CloningScheduler, a packet scheduler designed to clone exit- 7 Related Work
ing packets that are still outstanding, to derive the following
Initial[CRYPTO] and Handshake[CRYPTO]. In this way, Protocol Fuzzing. Fuzzing has been widely adopted to test
the testing procedure covered the logic of CloningScheduler, protocol implementations [23–25, 38, 61–63]. Existing works
where a known problem has been exposed [19]. focus on individual packet generation and lose sight of con-
As a result, exercising q1 using the new packet pattern textual correctness between packets in a sequence. Peach [18]
b⊕σP achieved a new state trace. Empowered by the feedback and BooFuzz [33] select a state trace in the user-defined state
collector, B LEEM can monitor this automatically and update model each time and separately generate packets for these
it on the initial SSTG with an extended alphabet: states. Scapy fuzzing API [49] generates individual packets
b = {e : Initial[ACK]+Initial[CRYPTO]+Handshake[CRYPTO],
Ω based on given values, which is equivalent to a packet-level
f : Handshake[CRYPTO,ACK]}, mutation operator. Although these approaches provide syn-
yielding the overall SSTG shown in Figure 9. tactically valid packets that prevent early parse error, they
Client Server are incapable of handling the parameter dependency and
①
Initial[CRYPTO] generating correct values for the context-sensitive dynamic
Initial[CRYPTO, ACK]*, fields (e.g., handshake configuration of SSL). Meanwhile,
Handshake[CRYPTO]
②
Initial[CRYPTO]
they are pretty random and do not use feedback to guide
③
Initial[ACK], Initial[CRYPTO], fuzzing. Instead, B LEEM generates packets at a sequence
Handshake[CRYPTO]
④ level. It provides a sequence-based feedback mechanism to
Handshake[CRYPTO, ACK]
⑤ navigate protocol-logic exploration and a sequence-level ma-
1-RTT[APPLICATION_CLOSE]
⑥ nipulation to discover anomalies under out-of-order or du-
plicated packets. Meanwhile, it accounts for the contextual
Figure 8: The mvfst handshake flow if the packet at 2 is information by leveraging the intercepted packets between
mutated (Initial[CRYPTO,ACK]* ). the SUT parties.
Unfortunately, triggering such logic is non-trivial for other Some recent works introduce state awareness for protocol
fuzzers because they need to craft the two Initial[CRYPTO] fuzzing. SGFuzz [8] relies on a programmatic intuition that
packets carefully: (i) The two packets need to be syntactically the state variables used in protocol implementations encode
correct, which can be easily guaranteed by generation-based fine-grained program processing actions and often appear in
fuzzers like Peach but is hard for mutation-based fuzzers like enum-type variables. It recognizes these variables, injects
AFLNet. (ii) They need to correctly set some parameters instrumentation to monitor their assignment, and uses their
in the packets to ensure semantic correctness. For example, different values to identify different server program states.
In comparison, without instrumentation, B LEEM leverages derive the SSTG structure from the observed network traffic
the enum in the packet fields to identify packet type and to identify the state space and guide fuzzing. Meanwhile, the
constructs the states by combining the packet types with bidi- SSTG encompasses all protocol parties, unlike the traditional
rectional communication information. We also provide an state machine that depicts only one protocol party.
example in Appendix D. StateAFL [43] adopts fine-grained
compile-time instrumentation to obtain runtime information 8 Discussion
for protocol state inference. Nyx-Net [48] solves the re- Despite B LEEM’s positive results, we briefly discuss limita-
producibility problem mentioned in §1 by ensuring noise- tions and avenues of further improvement.
free fuzzing through a snapshot-based approach. These ap- First, the feedback collector analyzes the output packets
proaches require the source code or the binary of the protocol based on Scapy’s parsing capability. However, some protocols,
implementation and thus do not scale to the black-box nature. especially those proprietary protocols, are not supported by
Instead, B LEEM applies a noninvasive feedback mechanism Scapy. Since B LEEM can capture the network traffic and gain
and models state transitions across packet exchanges, thus can additional traffic beyond the initial session through fuzzing,
be scaled to diverse protocols even in the black-box setting. we can recognize unsupported protocols by resorting to traffic-
Some works utilize the server response to optimize fuzzing. based protocol reverse engineering [12, 57].
AFLNet [45] leverages the status code in the server re- Second, the current representation of the SSTG cannot
sponse as state feedback and uses it to guide packet mutation. always guarantee reproducibility due to its inherent non-
Snipuzz [20] infers the grammatical role of each message deterministic and coarse-grained transition labeling. We can
byte by analyzing the server response using a hierarchical solve this by transforming the SSTG into a deterministic finite
clustering strategy. This strategy works well for IoT proto- automata using typical algorithms [41, 55] and recording fine-
cols whose responses are usually textual and organized in a grained mutation information, such as the detailed subclass
common format such as JSON. These approaches work well and parameters of the mutation operator.
but are tightly coupled with specific protocol formats. In con- Third, B LEEM now supports crash detection and memory-
trast, B LEEM analyzes the semantics conveyed in the output related bugs with ASan/UBSan. B LEEM can also detect se-
to obtain system feedback and can be applied to both textual mantic bugs if provided with corresponding oracles. For exam-
and binary protocols. Meanwhile, it analyzes both client and ple, if provided with the packet-exchange constraints, B LEEM
server output and thus can obtain more information. can detect non-compliance with protocol specification by
State Machine Inference. The most closely related works analyzing the SUT’s state trace. The cause of CVE-2021-
employ learning algorithms to infer protocol state machines, 40523 [42] is that the server may fail to send WILL/WONT
and there are two different technologies. response for WILL commands, which violates the property
Active-inference-based approaches [17, 21–23, 52] actively restricted in RFC 854 [46]. To detect this bug, we can provide
generate packet sequences to query a protocol implementation such a packet-exchange constraint for B LEEM: a WILL packet
and infer a state machine using model learning algorithms, from the client should be followed by a WILL/WONT packet
such as Angluin’s L* algorithm [7]. Applying this algorithm from the server under normal conditions. With the help of
usually requires tailoring a mapper to translate between the the abstract packets, B LEEM can focus on the packet types,
abstract alphabets of the model and the concrete packets of the thus facilitating analysis. Then, it can detect the bug by check-
implementation, which is not reusable for different protocols ing, for the SUT State that matches hS(∗) | C(WILL)i (“*” is a
and implementations. Instead, B LEEM abstracts packets by wildcard), whether its outgoing edge with label WILL⊕σ◦ tran-
automatic semantic extraction and instantiates packets using sitions to state hC(WILL) | S(WILL)i or hC(WILL) | S(WONT)i.
the interactive traffic. If not, a bug exists.
Passive-inference-based approaches [14, 26, 28, 29] infer
a state machine by analyzing a corpus of packet sequences 9 Conclusion
sampled on the network. Pulsar [26] analyzes the sampled In this paper, we present B LEEM, a packet-sequence-oriented
network traces and infers a generative model for message for- protocol fuzzer that applies an evolutionary approach to ex-
mat and protocol states. AutoFuzz [28] analyzes the sampled plore the massive protocol state space: it accesses the sys-
traffic to infer the server’s finite state machine (FSM) and con- tem feedback by analyzing the output sequences and dynami-
ducts server fuzzing based on this stationary FSM. These ap- cally tunes the exploration direction by applying the proposed
proaches perform fuzzing based on the inferred model. There- guided fuzzing strategy. Meanwhile, B LEEM generates highly
fore, their fuzzing effectiveness relies on the completeness of protocol-logic-aware packet sequences by leveraging the ob-
the captured network traffic. Instead, B LEEM constructs the served interactive traffic. Compared to state-of-the-art fuzzers,
initial SSTG based on the SUT dialog and gradually enriches B LEEM can achieve higher coverage and detect more bugs in
it at runtime with the packets generated during guided fuzzing, real-world protocol implementations. B LEEM is fully auto-
which builds a closed-loop of vulnerability detection. matic and can be applied to test the implementations of most
Most of all, we do not infer the protocol’s state machine but general protocols in the black-box setting.
Acknowledgments [14] Paolo Milani Comparetti, Gilbert Wondracek, Christo-
pher Krügel, and Engin Kirda. Prospex: Protocol speci-
We thank the anonymous reviewers for their construc- fication extraction. IEEE Symposium on Security and
tive feedback and suggestions. This research is sponsored Privacy, 2009.
in part by the National Key Research and Development
Project (No. 2022YFB3104000, No2021QY0604) and NSFC [15] CVE-2014-0160. Heartbleed - a vulnerability in openssl.
Program (No. 62022046, 92167101, U1911401, 62021002). 2014. http://heartbleed.com.
[1] OSS-Fuzz/Gnutls. https://github.com/google/ [17] Joeri de Ruiter and Erik Poll. Protocol state fuzzing of
oss-fuzz/tree/master/projects/gnutls. TLS implementations. 2015 USENIX Security Sympo-
sium.
[2] OSS-Fuzz/Libressl. https://github.com/google/
oss-fuzz/tree/master/projects/libressl. [18] Michael Eddington. Peach fuzzing plat-
form. https://gitlab.com/gitlab-org/
[3] RFC 4346. The transport layer security (TLS) proto- security-products/protocol-fuzzer-ce.
col. section f.1.1.2: Rsa key exchange and authentica-
tion. Website. https://www.rfc-editor.org/rfc/ [19] facebookincubator. A known problem in the
rfc4346#appendix-F.1.1.2. CloningScheduler in Facebook mvfst. https:
//github.com/facebookincubator/mvfst/blob/
[4] RFC 6347. Datagram transport layer secu- 421196ec98a9abd69c7a4353c555a0c981a69109/
rity version 1.2. section 4.1.2.6: Anti-replay. quic/api/QuicBatchWriter.cpp#L17.
https://datatracker.ietf.org/doc/html/
rfc6347#section-4.1.2.6. [20] Xiaotao Feng, Ruoxi Sun, Xiaogang Zhu, Minghui Xue,
Sheng Wen, Dongxi Liu, Surya Nepal, and Yang Xi-
[5] aflnet. AFLNet: A greybox fuzzer for network protocols. ang. Snipuzz: Black-box fuzzing of IoT firmware via
https://github.com/aflnet/aflnet. message snippet inference. 2021 ACM SIGSAC CCS.
[6] Pedram Amini and Aaron Portnoy. Sulley. 2012. https: [21] Tiago Ferreira, Harrison Brewton, Loris D’antoni, and
//github.com/OpenRCE/sulley. Alexandra Silva. Prognosis: Closed-box analysis of net-
work protocol implementations. 2021 ACM SIGCOMM.
[7] Dana Angluin. Learning regular sets from queries and
counterexamples. Inf. Comput., 75, 1987. [22] Paul Fiterau-Brostean, Ramon Janssen, and Frits W.
[8] Jinsheng Ba, Marcel Böhme, Zahra Mirzamomen, and Vaandrager. Combining model learning and model
Abhik Roychoudhury. Stateful greybox fuzzing. 2022 checking to analyze TCP implementations. 2016 CAV.
Usenix Security Symposium. [23] Paul Fiterau-Brostean, Bengt Jonsson, Robert Merget,
[9] Domagoj Babic, Stefan Bucur, Yaohui Chen, Franjo Joeri de Ruiter, Konstantinos Sagonas, and Juraj So-
Ivancic, Tim King, Markus Kusano, Caroline Lemieux, morovsky. Analysis of DTLS implementations using
László Szekeres, and Wei Wang. FUDGE: Fuzz driver protocol state fuzzing. 2020 USENIX Security Sympo-
generation at scale. 2019 ACM ESEC/FSE. sium.
[10] bajinsheng. SGFuzz: Stateful greybox fuzzer. https: [24] Matheus E. Garbelini, Vaibhav Bedi, Sudipta Chattopad-
//github.com/bajinsheng/SGFuzz. hyay, Sumei Sun, and Ernest Kurniawan. Braktooth:
Causing havoc on bluetooth link manager via directed
[11] BoringSSL. Fuzzer mode. https://github.com/ fuzzing. 2022 USENIX Security Symposium.
google/boringssl/blob/master/FUZZING.md.
[25] Matheus E. Garbelini, Chundong Wang, Sudipta Chat-
[12] Georges Bossert, Frédéric Guihéry, and Guillaume Hiet. topadhyay, Sumei Sun, and Ernest Kurniawan. Sweyn-
Towards automated protocol reverse engineering using Tooth: Unleashing mayhem over bluetooth low energy.
semantic information. 9th ACM symposium on Informa- 2020 USENIX Annual Technical Conference.
tion, computer and communications security, 2014.
[26] Hugo Gascon, Christian Wressnegger, Fabian Yam-
[13] Clang. Clang 15.0.0 documentation, undefinedbe- aguchi, Dan Arp, and Konrad Rieck. Pulsar: Stateful
haviorsanitizer. https://clang.llvm.org/docs/ black-box fuzzing of proprietary network protocols. Se-
UndefinedBehaviorSanitizer.html. cureComm, 2015.
[27] Google. OSS-Fuzz. https://github.com/google/ [42] MITRE. CVE-2021-40523.
oss-fuzz.
[43] Roberto Natella. StateAFL: Greybox fuzzing for stateful
[28] Serge Gorbunov and Arnold Rosenbloom. Autofuzz: network servers. ArXiv, abs/2110.06253, 2021.
Automated network protocol fuzzing framework. Ijcsns,
[44] OpenBSD. Libressl. https://www.libressl.org.
2010.
[45] Van-Thuan Pham, Marcel Böhme, and Abhik Roychoud-
[29] Yating Hsu, Guoqiang Shu, and David Lee. A model-
hury. AFLNET: A greybox fuzzer for network protocols.
based approach to security flaw detection of network
2020 ICST.
protocol implementations. IEEE International Confer-
ence on Network Protocols, 2008. [46] J. Postel and J. Reynolds. Rfc854, telnet proto-
col specification. https://datatracker.ietf.org/
[30] Immor278. Snipuzz-py. https://github.com/ doc/html/rfc854.
Immor278/Snipuzz-py.
[47] Postel, J. and J. Reynolds. File transfer protocol, 1985.
[31] Kyriakos K. Ispoglou, Daniel Austin, Vishwath Mohan, https://www.rfc-editor.org/rfc/rfc959.html.
and Mathias Payer. FuzzGen: Automatic fuzzer genera-
tion. 2020 USENIX Security Symposium. [48] Sergej Schumilo, Cornelius Aschermann, Andrea Jem-
mett, Ali Reza Abbasi, and Thorsten Holz. Nyx-net: net-
[32] Iyengar, J., Ed., and M. Thomson, Ed. QUIC: A udp- work fuzzing with incremental snapshots. Seventeenth
based multiplexed and secure transport, 2021. https: European Conference on Computer Systems, 2022.
//www.rfc-editor.org/rfc/rfc9000.html.
[49] secdev. Scapy Fuzzing API. https:
[33] jtpereyda. BooFuzz: Network protocol fuzzing for hu- //scapy.readthedocs.io/en/latest/usage.
mans. https://github.com/jtpereyda/boofuzz. html#fuzzing.
[34] Jinho Jung, Stephen Tong, Hong Hu, Jungwon Lim, [50] secdev. Scapy: Packet crafting for python2 and python3.
Yonghwi Jin, and Taesoo Kim. WINNIE: Fuzzing https://scapy.net.
windows applications with harness synthesis and fast
cloning. 2021 NDSS. [51] Kostya Serebryany, Derek Bruening, Alexander
Potapenko, and Dmitriy Vyukov. AddressSanitizer:
[35] George Klees, Andrew Ruef, Benji Cooper, Shiyi Wei, A fast address sanity checker. 2012 USENIX Annual
and Michael W. Hicks. Evaluating fuzz testing. 2018 Technical Conference.
ACM SIGSAC CCS.
[52] Juraj Somorovsky. Systematic fuzzing and testing of
[36] LLVM. SanitizerCoverage. https://clang. TLS libraries. 2016 ACM SIGSAC CCS.
llvm.org/docs/SanitizerCoverage.html#
edge-coverage. [53] Synopsis. Defensics fuzz testing.
[37] Zhengxiong Luo, Feilong Zuo, Yu Jiang, Jian Gao, Xun [54] Peach Tech. Peach fuzzer configuration file (Peach
Jiao, and Jiaguang Sun. Polar: Function code aware fuzz Pit). Website. https://peachtech.gitlab.io/
testing of ICS protocol. ACM Trans. Embed. Comput. peach-fuzzer-community/v3/PeachPit.html.
Syst., 18:93:1–93:22, 2019. [55] Ken Thompson. Programming techniques: Regular ex-
[38] Zhengxiong Luo, Feilong Zuo, Yuheng Shen, Xun Jiao, pression search algorithm. Communications of the ACM,
Wanli Chang, and Yu Jiang. ICS protocol fuzzing: Cov- 1968.
erage guided packet crack and generation. ACM/IEEE [56] Twistedmatrix. Twisted: building the engine of your
Design Automation Conference (DAC), 2020. network. https://twistedmatrix.com.
[39] T. Kohno M. Bellare and C. Namprempre. The secure [57] Yapeng Ye, Zhuo Zhang, Fei Wang, X. Zhang, and
shell (SSH) transport layer encryption modes. https: Dongyan Xu. NetPlier: Probabilistic network protocol
//datatracker.ietf.org/doc/html/rfc4344. reverse engineering from message traces. 2021 NDSS.
[40] Kenneth L. McMillan and Lenore D. Zuck. Formal [58] Michal Zalewski. American fuzzy lop. 2015.
specification and testing of QUIC. ACM Special Interest
Group on Data Communication, 2019. [59] Cen Zhang, Xingwei Lin, Yuekang Li, Yinxing Xue,
Jundong Xie, Hongxu Chen, Xinlei Ying, Jiashui Wang,
[41] Robert McNaughton and Hisao Yamada. Regular ex- and Yang Liu. APICraft: Fuzz driver generation for
pressions and state graphs for automata. IRE Trans. closed-source SDK libraries. 2021 USENIX Security
Electron. Comput., 9, 1960. Symposium.
[60] Mingrui Zhang, Jianzhong Liu, Fuchen Ma, Huafeng deletion, element-addition (based on the Oracle Map cor-
Zhang, and Yu Jiang. IntelliGen: Automatic driver syn- pus), and list-order-shuffle on the original list.
thesis for fuzz testing. 2021 ICSE-SEIP.
• The EnumField represents the field whose possible values
[61] Yong-Hao Zou, Jia-Ju Bai, Jielong Zhou, Jianfeng Tan, are taken from a given enumeration. For example, in HTTP,
Chenggang Qin, and Shih-Min Hu. TCP-Fuzz: De- the Method field with possible values [“GET”, “POST”,
tecting memory and semantic bugs in TCP stacks with “HEAD”, ...] is EnumField. The EnumField mutation op-
fuzzing. 2021 USENIX Annual Technical Conference. erator selects a value from the valid enumeration set with
high probability. It also provides value out of the valid set,
[62] Feilong Zuo, Zhengxiong Luo, Junze Yu, Ting Chen,
with low probability, to manifest corner cases.
Zichen Xu, Aiguo Cui, and Yu Jiang. Vulnerability
detection of ICS protocols via cross-state fuzzing. IEEE
Trans. Comput. Aided Des. Integr. Circuits Syst., 2022.
[63] Feilong Zuo, Zhengxiong Luo, Junze Yu, Zhe Liu, and A.2 Packet Instantiation Sub-Module
Yu Jiang. PAVFuzz: State-sensitive fuzz testing of pro-
tocols in autonomous vehicles. ACM/IEEE Design Au- For target protocols on different layers, we implement cor-
tomation Conference (DAC), 2021. responding proxies working on underlying layers to provide
reliable underlying communication, including TCP Proxy,
A Implementation Details UDP Proxy, IP Proxy, and Ether Proxy. For example, when
fuzzing SSL protocol, we can use a TCP Proxy that provides
The implementation is well-modularized. We defined uni- reliable TCP connections with the client and server so that
form interfaces between the main fuzzing process and each we can focus on fuzzing the SSL packet instead of the full
module to facilitate scalability. In this way, we can easily ex- protocol stack. To this end, for a target protocol on layer `, the
tend B LEEM with new mutation operators, customized moni- support proxy working on ` − 1 should: (i) provide network
tors (e.g., a semantically-aware monitor), and new protocol isolation on layer ` − 1 to intercept the exchanged payload
stacks (e.g., Bluetooth stack) by implementing corresponding of layer `; (ii) maintain the two links with the client and
required interface functions. server and synchronize their status; and (iii) support efficient
concurrent interaction with the client and server on layer `.
A.1 Packet-Level Mutation Operators We implement the first requirement by configuring the
client with a proxy-provided service address, which differs
Scapy has identified five general field types, including from the server’s service address. For example, for the target
NumberField, StringField, ListField, EnumerationField, and SSL server listening on TCP port 4433, we start TCP Proxy
LengthField. We devise respective mutation operators for and bind to TCP port 4432, and then configure the client with
these field types based on their features. More specifically, the service address of TCP port 4432. This step is noninvasive
• The NumberField mutation operator performs random ad- since the client is typically configurable regarding the service
dition or subtraction operations to the original value while address to connect.
considering the valid value range or returns a number ran-
The second requirement is tailored for the TCP Proxy since
domly selected from the valid value range.
TCP is connection-oriented. For target protocol running on
• The LengthField holds the length value of referenced field. TCP, the TCP Proxy is responsible for maintaining the two
The LengthField mutation operator inherits all the opera- TCP links with the client and server and synchronizing their
tions defined in the NumberField mutation operator. It also status, including the connection establishment and connection
provides an additional operation of replacing the original close. We implement it upon Twisted [56], which provides an
value with extreme values (e.g., zero, negative values, the event-driven programming paradigm for internet applications.
maximum, and the minimum) to manifest corner cases that Specifically, we design and implement two Protocols based
cause memory errors, since their values usually affect the on twisted.internet.protocol.Protocol to handle the
memory access in the program (as the bug case study given data of these two links in an asynchronous manner.
in Appendix C shows).
Third, to concurrently handle bidirectional traffic, we im-
• The StringField mutation operator conducts a finite plement asynchronous interaction logic to separately manage
combination of string-splicing, substring-duplication, and the traffic of two directions, i.e., the server-to-client traffic
substring-deletion on the original string. and client-to-server traffic. The interaction logic of each direc-
• The ListField depicts the field holding a list with items tion continuously sniffs the traffic, conducts mutation on the
of the same type. The ListField mutation operator per- received packet (as directed by the proposed strategy), and
forms a finite combination of element-duplication, element- sends out the mutated packet.
Table 4: Published CVE IDs of protocol implementations in firmware and the emulation setting in our experiment
CVE ID Device Type Vendor Model Firmware Version Vulnerable Binary Protocol Emulation Platform
CVE-2018-5767 Router TENDA AC15 15.03.1.16 bin/httpd HTTP QEMU user-mode
CVE-2020-25067 Router NETGEAR R8300 1.0.2.130 usr/sbin/upnpd UPnP QEMU full-system
CVE-2019-14457 IP Camera VIVOTEK CC8160 0100d usr/sbin/httpd HTTP QEMU user-mode
CVE-2019-1663 Router CISCO RV130 1.0.3.44 usr/sbin/httpd HTTP QEMU full-system
A.3 Monitors and selected SUT. To facilitate a fair comparison, the selected
SUT (cf. the fourth column) all come from off-the-shelf utili-
We design and implement Process Monitor and Network Mon-
ties provided in the project, except for the client utility used
itors to detect crashes. These monitors can be used in combi-
to test accel-ppp and Dnsmasq, where we resorted to related
nation according to different scenarios.
utilities in the Linux community.
Process Monitor. When the target process is locally ac-
cessible, the Process Monitor detects crashes by checking
whether the process was terminated by a system signal (e.g., C Bug Case Study and Coordinated Disclosure
SIGSEGV). We can also enhance the program with ASan to
detect memory corruption (in this case, the process will be Table 6 shows the detailed information on previously un-
terminated by SIGABRT when a memory error occurs). known vulnerabilities exposed by B LEEM.
Network Monitor. We implement the following network 1 char working [ DOMAIN_MAX_LEN + 1] = { 0 };
monitors, which can be employed locally or remotely. 2 size_t i , wi = 0;
3 for (i = 0; i < len ; i ++) {
• TCP Monitor. For the TCP-based protocols, we detect 4 char c = candidate [i ];
crashes by trying to conduct a TCP connection to the lis- 5 ... // checking validation of candidate [i]
6 if ( wi > DOMAIN_MAX_LEN ) // bug : the corner case is
tening port since TCP is connection-oriented. mistakenly handled
• UDP Echo Monitor. For the UDP-based protocols, this 7 goto bad ;
8 working [ wi ++] = c;
monitor provides an interface that allows users to identify 9 if (i == len - 1) {
the heartbeat packet of the under-test protocol. Then it 10 candidate_domain = strdup ( working ); // buffer may
lack ’\0 ’ termination when invoking strdup
works by assembling it with a UDP header, sending it to 11 }
the listening port, and waiting for a response. 12 }
to the packets provided by the SUT parties and applies guided Application Data (23)
Other Fields
Certificate (11)
…
fuzzing, thus can easily satisfy (i) and (iii), allowing for more
Value of the enum field in the packet Other valid values of the enum field
efforts in the exploration of (ii).
We responsibly disclosed the vulnerabilities we found. Be- Figure 11: Enumeration fields in an SSL Server Hello packet
fore publicly disclosing our findings, we reported the vul-
nerabilities to the respective vendors following their security SGFuzz directly uses the different values of the state vari-
procedures and coordinated appropriate disclosure periods ables to identify different program states and captures state
with them, which aligns with industry standards. No vendor transitions by monitoring the assignment of recognized state
required us to redact our results prior to paper submission. variables based on instrumentation.
Correspondingly, Figure 11 shows an SSL Server Hello
D Enumeration-Type State Variable and Field packet. The Content Type and Handshake Type fields are
of type enumeration, and the Version and Length fields are
For BoringSSL, SGFuzz recognizes several state variables, in- of type number and length, respectively. These enumeration-
cluding tls12_server_hs_state_t and ssl_shutdown_t type fields determine the packet type and indicate the protocol
as follows. These variables encode fine-grained program state. Specifically, the Content Type field, with four valid
processing actions, including protocol-state-related (e.g., values, is set to Handshake (22), and the Handshake Type
tls12_server_hs_state_t) and implementation-logic- field is set to Server Hello (2), indicating this is a Server
related (e.g., ssl_shutdown_t). For example, the server Hello packet exchanged during the handshake phase. Other
program uses the tls12_server_hs_state_t to repre- fields, such as Version and Length, own low association
sent the protocol handshake state. It sets this variable to with the protocol state. Hence, B LEEM abstracts this packet
state12_send_server_hello when generating and send- to Handshake[Server_Hello] by omitting other fields. Ac-
ing a Server Hello packet. Meanwhile, it uses the variable tually, from the program’s perspective, when a client or server
ssl_shutdown_t to represent the program shutdown state receives a packet, it also first determines the packet type by
for the read half of the connection. If the server is configured parsing these fields and then takes corresponding actions (in-
not to send a close_notify packet, it sets this variable to cluding setting state variables) according to the state machine.
ssl_shutdown_close_notify, indicating doing nothing. Without instrumentation, B LEEM utilizes an on-the-fly ap-
enum tls12_server_hs_state_t { proach. Since the enumeration fields typically identify the
state12_start_accept = 0,
state12_read_client_hello , packet type and thus indicate the protocol state, B LEEM uses
state12_read_client_hello_after_ech , this key information to abstract concrete packets and con-
state12_select_certificate ,
state12_select_parameters , structs the SUT States by combining the abstract packets with
state12_send_server_hello , bi-directional communication information, as shown in §4.
...
};
enum ssl_shutdown_t {
ssl_shutdown_none = 0,
ssl_shutdown_close_notify = 1,
ssl_shutdown_error = 2,
};