0% found this document useful (0 votes)
6 views12 pages

PREUNN

The paper presents a novel approach to protocol reverse engineering (PRE) using neural networks, demonstrating its effectiveness in mimicking text-based protocols like HTTP and FTP. The multi-model framework incorporates various neural network architectures, achieving improved message clustering and context distribution. The findings suggest that this AI-driven methodology can enhance fuzzing and other security applications, though the black-box nature of neural networks limits direct comparisons with traditional PRE methods.

Uploaded by

shanti.swamy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views12 pages

PREUNN

The paper presents a novel approach to protocol reverse engineering (PRE) using neural networks, demonstrating its effectiveness in mimicking text-based protocols like HTTP and FTP. The multi-model framework incorporates various neural network architectures, achieving improved message clustering and context distribution. The findings suggest that this AI-driven methodology can enhance fuzzing and other security applications, though the black-box nature of neural networks limits direct comparisons with traditional PRE methods.

Uploaded by

shanti.swamy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

PREUNN: Protocol Reverse Engineering Using Neural Networks

Valentin Kiechle1 , Matthias Börsig2 a


, Sven Nitzsche2 b
, Ingmar Baumgart2 and Jürgen Becker3
1 AMAI
GmbH - AI Experts, Karlsruhe, Germany
2 FZI
Research Center for Information Technology, Karlsruhe, Germany
3 Institute for Information Processing Technology (ITIV), Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany

[email protected], {boersig, nitzsche, baumgart}@fzi.de, [email protected]

Keywords: protocol reverse engineering, artificial intelligence, machine learning, neural networks, fuzzing

Abstract: The ability of neural networks to universally approximate any function enables them to learn relationships
between arbitrary kinds of data. This offers great potential in information security topics such as protocol
reverse engineering (PRE), which has seen little usage of neural networks (NNs) so far. In this paper, we
provide a novel approach for implementing PRE with solely NNs, demonstrating a simple yet effective re-
verse engineering of text-based protocols. This approach is modular by design and allows for the exchange
of neural network models at any step with better performing models. The architectures used include a convo-
lutional neural network (CNN), an autoencoder (AE), a generative adversarial net (GAN), a long short-term
memory (LSTM), and a self-organizing map (SOM). All of these models combine for a new protocol reverse
engineering approach. The results show that the widespread application layer protocols HTTP and FTP can
successfully be mimicked by artificial intelligence, thereby paving the way for use cases such as fuzzing. A
direct comparison to other PRE approaches is not possible due to the black-box nature of neural networks and
represents the main limitation of our work. Our experiments showed that this multi-model approach yield up
to 19% better message clustering, improved context distribution, and proving LSTM to be the best candidate
for generating new messages with up to 67.6% valid HTTP packages and 100% valid FTP packages.

1 INTRODUCTION The capacity of deep learning to mimic network


protocols using reverse engineering and how to model
In 2012 deep neural networks (DNN) (Krizhevsky such an approach are the core scientific questions that
et al., 2012) were introduced, showing great ability in drove this project. This ability could enable fuzzing
automated feature extraction for classification, which to better harness future advances of the AI research
led to a breakthrough in the ImageNet Large Scale Vi- community. In our initial literature we selected the
sual Recognition Challenge (ILSVRC)1 . An increas- two renowned approaches Prospex (Comparetti et al.,
ing number of challenges have been tackled by deep 2009) and Discoverer (Cui et al., 2007) to represent
learning since then. One exception to that trend has a large body of ideas and strategies commonly used
been protocol reverse engineering (PRE), as most of in PRE. A novel approach is created based on these
its progress was published from 2004 to 2013, miss- two PRE designs to serve as an orientation for solv-
ing the modern artificial intelligence (AI) boom. PRE able neural network tasks and a framework for exper-
is a specific information security task that attempts to imentation. We then define several metrics and im-
recreate specifications about an unknown application plement the experimentation to come to a successful
layer protocol from artifacts of its communications. conclusion. Our key contributions include the anal-
These inferred specifications can be used in a variety ysis of how to apply deep learning in a modular and
of other security-related applications such as fuzzing. extendable way to the problems of protocol reverse
The coverage of bugs and edge cases is generally im- engineering, implementing a neural network-based
proved with guiding knowledge about the basic mes- PRE approach on text-based protocols, and showing a
sage structure and internal state of the protocol. promising direction of future research. There are also
a few minor contributions of smaller workarounds to
a https://orcid.org/0000-0002-6060-6026 solve problems during the experimentation, like the
b https://orcid.org/0000-0002-3327-6957
1 http://www.image-net.org/challenges/LSVRC/
convolutional embedding for the LSTMs.

DOI: 10.5220/0010813500003120 1
ISBN: 978-989-758-553-1; ISSN: 2184-4356
In Proceedings of the 8th International Conference on Information Systems Security and Privacy (ICISSP 2022), pages 345-356
Event website: https://icissp.scitevents.org/
Copyright ©2022 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
2 BACKGROUND similarity between messages is not only computed by
their format, but also by the reaction and response
As we attempt to combine two areas of modern re- they create. Next, an acceptor automaton was
search in this paper, a brief overview of both areas’ designed to identify valid sequences of messages.
theoretical backgrounds is given in this section. This automaton was later reduced using the exbar
algorithm (Lang, 1999). The final tool was extended
2.1 Protocol Reverse Engineering (PRE) to be compatible with the fuzzing tool Peach Fuzzer2 .
As a direct result, fuzzing can be improved on stateful
Communication over the internet and other digital protocols. We learn from Prospex how a separation
networks requires standardized protocols to ensure of tasks such as the extraction of features, clustering
uniform behavior. These protocols are generally used using an off the shelf method and finally reverse
in a stack with multiple layers, where each layer has engineering are key in PRE.
its specific tasks and serves as the operational basis
for the layers above. The widely known ISO/OSI-7- Discoverer Discoverer identifies small clusters of
layer model works as the standard guideline for such tokens within a message, which are then merged and
a protocol stack (TC97, 1984). refined recursively. For this, it relies on the assump-
Protocols can be divided into four types: text- tion that protocols use common delimiter symbols to
based or binary and stateful or stateless. PRE aims separate parts of their message format, such as com-
to learn as much as possible about the specifications mas, whitespaces, or line breaks. The interdepen-
of an unknown protocol by analyzing the artifacts that dence of various fields (e.g., for addresses/variables)
the communication creates. This includes, but is not in the message format and their data type (text or bi-
limited to, message formats, syntax, grammar, con- nary) is then learned by heuristics. We can see the
stants and keywords, message types, implicit or ex- importance of identifying key features and their influ-
plicit state machines, and more. The taxonomy shown ence on clustering in the Discoverer approach.
in Table 1 lists the two possible kinds of artifacts: net-
work traffic messages or system calls inside the binary 2.2 Neural Network Architectures
of an application that communicates using the target
protocol. The two most commonly inferred properties All the models we use in this paper are types of neu-
are the general message format/syntax and the under- ral networks. They consist of layers of interconnected
lying state automaton of a stateful protocol. A suc- neurons, where each connection has a value to ad-
cessfully reverse-engineered protocol allows for fur- just the importance of the connection for the next
ther analysis of the communication like deep package layer. These weights are adjusted to minimize an er-
inspection (Brook, 2018) and fuzzing (Besic, 2019). ror function during training, but remain fixed for test-
Both can work more effectively if they have the de- ing and thereafter. This concept allows for compli-
tailed specification of the protocol. cated mapping of input data distributions to output
Two prominent PRE examples are Prospex (Com- results according to the universal approximation the-
paretti et al., 2009) and Discoverer (Cui et al., 2007). orem (Hornik et al., 1989). As such, neural network
They were published in the middle of the major period architectures are suitable tools for different data op-
for PRE research between 2004 and 2013 (Duchene erations such as classification, regression, clustering,
et al., 2018; Narayan et al., 2015). From our initial and more. In the following paragraphs, we describe
literature research, we judge both to be good represen- the different architectures used in the experimentation
tations of their respective PRE classes, as mentioned of Section 4, what problems they are good at solving
in Table 1. and why we considered them for PRE.

Protocol Specification Extraction (Prospex) Convolutional Neural Network (CNN) CNNs be-
Prospex is a two-part PRE approach basing its came popular for their performance in image recog-
analysis on both network messages and execution nition tasks (Krizhevsky et al., 2012). The layers in
traces of a binary at runtime to infer the message for- this architecture have a particular property: they are
mat (Wondracek et al., 2008; Caballero et al., 2007) used in a sliding window method over the input data.
and state automaton in a second part (Comparetti Their weights for each of the sliding window steps
et al., 2009). The highly distinctive features selected remain the same and can therefore be used to detect
for the clustering step are among the core reasons for small, fixed patterns. Many such “kernels” exist in
the success. They include file system reactions and
memory analysis by tainting bytes. Therefore, the 2 http://peachfuzzer.com

2
Table 1: Taxonomy for classifying PRE approaches by requirements (columns) and by results (rows)

PRE taxonomy requirements and results Only network messages Messages and executable binary at runtime
Inferring message format/grammar Discoverer (Cui et al., 2007) Wondracek (Wondracek et al., 2008)
Inferring state automaton PREUNN Prospex (Comparetti et al., 2009)

parallel to each other in separate “channels” of one time dependencies inside the data. LSTMs are a kind
layer. This structure allows individual kernels to ap- of RNNs with a strong ability to find sequential de-
proximate highly descriptive feature filters, thereby pendencies over time while avoiding some common
removing the traditional step of manual feature ex- problems with recurrent error correction (Hochreiter
traction from the classification task. This architec- and Schmidhuber, 1997). For working with text data
ture is beneficial in any task involving images/pixel (sequential letters), it is common to use a dictionary
maps with individual features occupying many adja- or alphabet along with a suitable embedding to con-
cent pixels. However, in theory, this concept can be dense the information in a medium-sized vector for
applied to any form of input data with patterns. each word or letter. The text is transformed into a
matrix of lengthtext × lengthembedding .
Autoencoder (AE) Autoencoders are neural net-
works designed to achieve dimensionality reduction
for a given input while retaining as much information Self-Organizing Map (SOM) SOMs are architec-
as possible. In training, the loss function demands tures with one-dimensional or two-dimensional out-
that the output equals the input as closely as possible. put maps of neurons that contain the topology of the
The architecture itself has a bottleneck in a middle data (Kohonen, 1982). This means that data with
layer to force the network to compress data but keep similar properties and distribution of features will be
reconstructable information (Hinton and Salakhutdi- found in the same general area of the output map.
nov, 2006). The middle layer’s size has to balance Thereby, clustering is implied. SOMs give each out-
the compressing size and keep relevant information put neuron a score related to the data. The winning
in some unknown encoding. Naturally, this divides node can be returned for indexing.
the neural network into two components, namely the
encoder and the decoder part. After the AE has been
trained, the decoder element is removed so that any 3 MAIN APPROACH
input data will be returned in its encoded form only.
This section presents a novel way of looking at the un-
Generative Adversarial Network (GAN) GANs derlying task structure of reverse engineering a pro-
were developed to create a generative model for tocol. It is divided into multiple steps that are han-
learned data distribution. The peculiarity of this ar- dled mostly sequentially. The approach is designed to
chitecture comes from the use of two competing net- fit the capabilities of various neural network architec-
works (Goodfellow et al., 2014). The first, known tures and provides modularity. This allows for the re-
as generator, uses random noise as input and tries to placement of any model in the system by a better per-
create an image similar to those found in the train- forming one and thereby enabling novel AI designs to
ing dataset. The second network, known as the dis- be directly inserted into the process.
criminator, is given either a true or a fake image ran-
domly and has to learn the distinction between them
by merely classifying true from fake. The error cor- 3.1 Data Gathering
rection for the generator is based on the classifica-
tion result by the discriminator. This combination of To train a neural network, a representative dataset is
two networks causes a setup of competitive learning required. We chose a set of text-based application
where each NN tries to outdo the opposing one. layer protocol artifacts as a basis to allow for an easier
result evaluation as we do not have a direct compari-
Long Short-Term Memory (LSTM) Recurrent son with other PRE approaches. The chosen protocols
neural networks (RNNs) describe a type of architec- are HTTP v1.1 and FTP because they are commonly
ture, where some part of the internal hidden or input used, abundantly available and lack encryption. We
state is recursively put into the network again for the use several sources of datasets to cover a broader mix
next time step, thereby allowing the network to find of implementations and message type distribution.

3
3.2 Feature Extraction 3.6 Sequence Generation

In this first part, we want to extract highly distinguish- As a last step, we combine all trained models into one
able features. Both in Prospex and Discoverer, the generative PREUNN AI. Context information such as
feature selection was an essential part of the work, cluster index and sequential dependencies can be in-
but the features were chosen by the researchers. We cluded in the feature reverse engineering to achieve
intend to automate this process with neural networks. more accurate results. Ideally, this AI is capable of
Of particular interest are keywords, punctuation, syn- producing valid messages, which are not part of the
tactic characters, and other patterns that distinguish training set but have comparable statistical properties.
between different messages. Pseudo-random strings
such as tags and cookies are avoided, as they are gen-
erally unrelated to the protocol specifications. 4 EXPERIMENTS

3.3 Feature Reverse Engineering This section lists all implementations for the main ap-
proach and the experiments, that were used to test var-
ious neural network architectures. The hyperparame-
In traditional protocol reverse engineering, the anal- ters of all neural networks are set to well-performing
ysis infers rules, lists of variables or constants, and numbers after some semi-extensive manual testing.
grammar from the communication artifacts. With a The scope of the project did not include major opti-
neural network, the learned knowledge is intrinsically mizations as the hardware was unable to handle large
non-representative, meaning it is challenging to in- search spaces for automated hyperparameter tuning.
terpret by humans. We use a generative evaluation Our code is available on GitHub3 . All experiments
approach to judge the quality of the features learned were written in Python 3 using an object-oriented pro-
(and recreated) by the respective architectures. Such gramming style to ensure easy modification and ex-
a method will create new samples from the training tension of the experiments to new protocols. We se-
distribution and provide insights into what the neural lected PyTorch4 as the deep learning framework, and
networks learned. all experiments were conducted on an NVIDIA GTX
970.
3.4 Clustering
4.1 Data Preprocessing
Protocol messages can usually be grouped into types We used two datasets in our tests. The first one con-
whether or not the protocol explicitly specified these sists of the combination of multiple HTTP sources
groups. We can cluster these messages using infor- (Garfinkel, 2008; Shiravi et al., 2012; Goo et al.,
mation like sequential order, functionality and general 2019; Sharafaldin et al., 2019) that were selected to
format. The clustering would imply various types of cover different implementations and scenarios. The
messages, and we consider this task to be well-suited second dataset consists of FTP messages (Pang and
for neural networks (Bação et al., 2005). Paxson, 2003). Before any experiments can be con-
ducted, we examined the data for outliers and irregu-
3.5 State Recognition larities. Data engineering usually takes up a signifi-
cant amount of time in any machine learning project;
however, with network traces in pcap files, we were
A typical communication session usually involves able to shorten this process. The network analysis tool
multiple messages being sent or received. In stateful Wireshark5 provides a widely used parser for proto-
protocols, particular sequences of messages achieve a col package analysis. We filtered for valid packages
more complex state between the communication par- of HTTP and FTP respectively and discarded the rest.
ties. These sequential patterns imply a representative The length of application layer datagrams is lim-
state automaton for the inner state of the protocol. We ited by the underlying TCP protocol. An analysis of
think that a neural network with the capability of un- the new pcap file showed that lengthy messages only
derstanding sequential dependencies should be able occur at image transfers (HTTP) or custom messages
to learn the order of different message types and their
likelihood. This interpretation allows the usage of 3 https://github.com/PREUNN/preunn
time series prediction to imply the next state of the 4 https://pytorch.org/

protocol. 5 https://www.wireshark.org/

4
(FTP) and can subsequently be cut without significant
loss of information. We chose the length limit of 1024
bytes for all packages to uniform the neural network
inputs. Only 0.33% of HTTP and 0.18% of FTP data
that we use exceed this length limit.
As a further step for the HTTP experiments, all
content after the message header was removed, to
avoid XML and other non-HTTP data. We achieved
this by splitting each statement at every “\r\n\r\n”
occurrence and only using the first element of that
split. This double line break is the HTTP sign for
only payload following, and thus represents a conve-
nient way to clean the data. No filter was applied for
the FTP data.
Dataset bias is a common pitfall when developing Figure 1: HTTP dataset distributions in original (blue) and
machine learning solutions. It describes an uneven balanced (green). The overwhelming bias in favor of GET
distribution of classes within the data, which causes messages has been mitigated while preserving the notion of
this type being the most common.
suboptimal feature learning in the neural networks.
The protocols themselves do not specify classes of
messages directly; however, the manually created
classification for both protocols are our baselines to
orient the class balance on. It is desirable that the neu-
ral networks still learn which types are more common
than others, but the imbalance in our dataset is over-
whelmingly in favor of two or three common message
types. To balance the number of messages per class,
we came up with this improvised formula:
p
Nosamples per class = Nooccurrences ∗ 100 (1)
and visualized the distribution to see the effect in fig-
ure 1 and 2.
The packages for each class were selected ran-
domly until the limit was reached. This includes mul- Figure 2: FTP dataset distributions in original (blue) and
tiplications of rare messages to get a significant sam- balanced (green). With three dominant types and several
ple size in every class. Tests without this dataset bal- underrepresented types, the rebalancing smoothed the dis-
ancing have shown strong signs of overfitting in most tribution significantly while preserving tendencies.
experiments. We did not simply set all classes to the
sults from clustering to evaluate both architecture. We
same limit, as the notion of common and uncommon
only defined auxiliary metrics for each experiment.
should not be lost.

4.2 Feature Extraction Autoencoder Autoencoders only work on data of


fixed length and learn a compact representation of the
Neural networks are known to be highly flexible self- data. The network messages we use as data vary in
learning feature extractors across various tasks. When length, but generally do not exceed 1024 bytes. To
we try to interpret each message character as a pixel unify them, we pad shorter messages with zeros to
integer value and thus the entire message as a one- fill up the length of 1024. Small-scale tests indicate,
dimensional image, we can apply solutions for image- that padding/capping to this length does not impact
based tasks. We selected two types of architectures the performance of any model significantly. As the
for feature extraction: an autoencoder and a convolu- expected output for the AE architecture is equal to
tional neural network. The feature mappings learned the input, we used the Hamming distance as a guid-
by the models in these experiments later serve as input ing measure for success during the experimentation.
to the clustering. By the black-box nature of neural We train an autoencoder the following layer sizes:
networks, we cannot directly measure the quality of 1024 → 256 → 128 → 256 → 1024. We use softplus
the feature mappings and will instead use the later re- activations, the Adam optimizer is set to a learning

5
rate = 0.0005, the loss function is Mean Square Er- Class 1: unchanged
\r
ror (MSE), and the batch size = 128 for 10 epochs. H T T P / 1 . 1 2 0 0 O K \n

We use the resulting 128 neurons wide output of the


encoder part as the feature encoding for a message. Class 2: length of blocks: 8
The data for this experiment was interpreted as pixels 2 0 0 O K
\r
\n H T T P / 1 . 1
and subsequently has a continuous nature in the inter-
val [0, 255]. The distance achieved for data samples Class 3: length of blocks: 4
of length 1024 bytes in training is 254.44 on average O K
\r
\n / 1 . 1 2 0 0 H T T P
for HTTP and 41.28 for FTP. The very high Ham-
Figure 3: An HTTP statement scrambled into 3 classes (un-
ming distance results from the continuous nature of changed and blocks of length 8 and 4). The color indicates
the byte interpretations, even the padding symbols (0) the block-size and the order of the blocks is randomized.
couldn’t be reconstructed entirely. We observe that
padding symbols often miss the same ASCII symbol HTTP/1.1 200 OK
by 1 or 2 values if we interpret the ASCII table as Date: Mon, 14 Jun 2010 11:33:55 GMT
a scale from 0 to 255. In an image, such small col- Server: Apache/2.2.3 (Red Hat)
oration mistakes would hardly be noticeable. For the Last-Modified: Mon, 22 Jun 2009 21:03:26 GMT
alphabet as a continuous scale, the results often look ETag: "1ef8b2d-59-46cf634892380"
wrong. However, an autoencoder’s primary purpose Accept-Ranges: bytes
Content-Length: 89
is to reduce the feature dimensionality for minimal Vary: User-Agent
loss of information and the learning of patterns. In Content-Typ: image/gif
this case, we managed to reduce the dimension from
1024 down to 128 while retaining satisfying recon-
struction properties during the experimentation. The
lower average Hamming distance for FTP can be ex-
plained by their much shorter average message length.

Convolutional Neural Network CNNs are com- Figure 4: Example of HTTP statement classification score
monly used for supervised image classification. Our visualized by Grad-CAM. The padding has been cut off.
training data does not contain any labels that can be The semantically significant and consistent parts are high-
used for supervised learning. We came up with our lighted, while pseudo randomness in the ETag is ignored.
own unsupervised learning technique using data aug- The text highlighting in this figure is an approximation.
mentation. Messages are replicated and modified into
riety of methods for model explanation have been pro-
several known classes of augmentation types to ex-
posed for image classification tasks. We use the vi-
tract information about the syntactic context. Our idea
sualization of pixel importance known as Grad-CAM
is, that the ability of a CNN to differentiate between
(Selvaraju et al., 2017) as our quality measurement
these augmentation classes will teach it to become
for syntactical features found by the model. Our ex-
fine-tuned to common patterns in the syntax. HTTP
periments showed mixed results. Regarding HTTP,
or FTP statements are again padded/capped to a fixed
we see that various parts of a protocol message are
length of 1024, then divided into segments of vari-
highlighted differently, see figure 4. The highlighted
ous lengths (1024, 32, 16, 8, 4), which are then put
parts often match syntactically relevant pieces, while
into random order within the same statement. This ap-
pseudo-random strings are ignored. This is precisely
proach results in 5 classes (unchanged and scrambled
the kind of syntactical feature extraction that is de-
into blocks of length 32, 16, 8, and 4, respectively). A
sired for this experiment.
simplified example is illustrated in figure 3.
For FTP, the results are less visible in Grad-CAM,
We chose an architecture that uses blocks of 1D
as the average message length is much smaller. The
convolution, 1D batch normalization, softplus activa-
overall convergence of the experiment was also much
tion, and 20%-dropout in a total of 5 layers following
flatter than that of the HTTP version.
the channel sizes as follows: 1 → 128 → 64 → 32 →
16 → 8. The classification task was performed by two
fully connected layers, which were removed after the 4.3 Feature Reverse Engineering
training to get a feature map of 8 ∗ 30 = 240 neurons,
which is larger than that of the autoencoder. We used A common result of protocol reverse engineering is
a visualization of the learned features as an additional the representation of the target protocol in the form
analytical method to evaluate the performance. A va- of rules or clusters. Neural networks do not allow

6
us to directly visualize their internal representation of Further, experiments using this architecture are omit-
the features they extracted, however. To have a tan- ted.
gible result outside clustering and sequence recogni-
tion, we want to be able to generate new messages as
proof of the correctness of the feature learning abil- Long Short-Term Memory LSTMs are advanced
ity. We achieve this by using generative neural net- recurrent networks, trained on sequences of fixed
work models and examine their outputs. There are lengths. To create such conditions, padding is out of
two possible choices of how a text message can be in- the question, as it would insert undesirable sequential
terpreted: an image-like byte interpretation (as in the dependencies. We instead use a script, which con-
feature extraction before) or a sequential interpreta- catenates statements to achieve more than four times
tion as a sequence of ASCII alphabet symbols. We in- the required length, then choosing a substring of cor-
vestigated both alternatives, using a default GAN ar- rect length starting from a random position. This
chitecture for image-like byte interpretation, roughly script is only used for the training to teach the se-
following the suggestions given by the original au- quential dependencies from letter to letter. Before
thors (Goodfellow et al., 2014; Radford et al., 2015), and after each statement, a unique symbol for start-
and an LSTM model for interpretation as a sequence of-package (SOP) and end-of-package (EOP) is in-
of ASCII symbols, with a modified embedding for serted so that the network can learn to distinguish
accounting the randomness in some parts of protocol between different messages. The data is represented
messages (cookies, addresses, etc.). as a one-hot-encoding of the ASCII alphabet. The
architecture uses an embedding layer followed by a
convolution layer (kernel size=4, stride=4) for em-
Generative Adversarial Net The two networks a bedding adjustment. This trick is introduced here
GAN consists of, the generator and the discrimina- as convolutional embedding, designed to balance be-
tor, are trained in parallel. The generator uses four tween character-based and word-based embedding in
1D transposed convolution layers with (kernel size, data formats with a lot of random noise on character
stride)-parameter tuples of (2, 2), (4, 4), (8, 8), and level. The hidden size of the embedding is flipped
(16, 16) in ascending order. The first three layers with the feature-length dimension of the tensor; the
are followed by a 1D batch normalization and ReLU convolution layer interprets the hidden width as chan-
activation each, while the last one ends with a sig- nels and the feature-length dimensions as image di-
moid function. The number of channels is as fol- mensions. Only the batch size remains the same. This
lows: 1024 → 1024 → 128 → 32 → 1. The dis- dimension transposition is reversed after the convolu-
criminator uses four 1D convolution layers with 1D tion, resulting in an embedded tensor with a quarter of
batch normalization (except for the first layer) and the length. This is given as input to a single-layered
LeakyReLU with a 0.2 slope and 20%-dropout. The LSTM. The output goes through the reversed pro-
network is capped with a 360-neuron fully connected cedure of the convolution embedding in a 1D trans-
layer. Channels are as follows: 1 → 10 → 20 → 60 → posed convolution and a fully connected layer with
90 → 1. We use a formula to keep either network the same hyperparameters and dimension transposi-
from overtaking the other in training: If one network tions. Figure 5 shows an overview of how this em-
error went over a threshold OR the other network un- bedding works. The way to interpret this kind of em-
der a particular border, then the overperforming net- bedding is a learnable, weighted 4-gram of charac-
work is removed from training until the other catches ters, where the whole architecture only has to learn
up. Both use an Adam optimizer with a learning rate the next character (from 0–1023 to 1–1024 by index).
= 0.0005 and betas = (0.5, 0.99). As a loss function, This local context inside the 4-gram and the long-term
we used Binary Cross-Entropy (BCE). dependencies, which have been shortened by a factor
The GAN model’s training did not indicate a sig- of 4, are easier to learn for the model and more sta-
nificant convergence towards a stable state. As the ble to sample. For training, an Adam optimizer with a
pixel interpretation’s inaccuracy is limiting the GAN learning rate = 0.005 and standard beta is used. Nega-
from the start, combined with this architecture’s gen- tive Log-Likelihood (NLL) is used as the loss func-
erally unstable nature, we are not surprised by this tion since the error is measured on character level.
disappointing result. Creating text from a continu- This, of course, requires the input data mentioned at
ous data interpretation only sounded promising to us the top to consist of message strings with a length that
when we considered the static structure used by a pro- are multiples of 4.
tocol. However, not even the padding symbols have This sequence-based attempt at recreating HTTP
successfully been replicated with any significant ac- and FTP statements shows good results. The LSTM
curacy (about ±3 ASCII values) by the GAN model. architecture converges towards a minimal loss after

7
H T T P / 1 . 1 3 0 2 M O V E D\r
Batch of sequences H T T P / 1 . 1
H T T P / 1 . 1
2 0 0
2 0 0
P O S\r T \n
O K \n
HTTP/1.1 200 OK
...
Date: Mon, 14 Jun 2010 13:20:25 GMT
Server: Apache
Conv Conv Conv Conv Conv Last-Modified: Mon, 21 Jun 2010 14:18:09 GMT
4:1 4:1 4:1 4:1 4:1
ETag: "2de9573-2b-486717fb77ac0"
Accepted-Ranges: bytes
Batch of 1D images
(flipped tensor dimensions)
XX1
X1 1
XX3
X2 2
XX3
X3 3
XX4
X4 4
XX5
X5 5
Content-Length: 43
Connection: close
Content-Type: text/html; charset=iso-8859-1

Batch of sequences XX1 X2 X3 X4 X5


X1 1 XX2 2 XX3 3 XX4 4 XX5 5 HTTP/1.1 200 OK
...
Date: Mon, 14 Jun 2010 13:20:25 GMT
Figure 5: Simplified illustration of the convolutional em- Server: Apache
bedding. It allows for a weighted length reduction of text Content-Length: 43
and increases the size of the alphabet. Connection: close
Content-Type: text/html; charset=iso-8859-1
less than one epoch, indicating excellent structural Figure 6: This figure shows two examples of HTTP state-
learning and repeating patterns. We can explain this ments generated by an LSTM. For comparison, figure 4
by the nature of a text-based protocol such as HTTP shows an actual HTTP message from the dataset. One can
and FTP, which use keywords, a fixed grammar, and see that the general structure is similar, but the variable con-
a consistent alphabet. When sampling the LSTM (let- tents have been changed, and some optional information
ter by letter), a string with a valid message is pro- was added or altered. For fuzzing purposes, this is the de-
sired behavior.
duced and can be parsed using the SOP and EOP sym-
bols. We wrap the resulting statements with valid but Table 2: The FTP types have been manually grouped into
random headers of TCP and IP to become complete clusters of keywords/codes with similar meaning or purpose
network packages in a pcap file. The network analy- 0 misc

sis tool Wireshark6 showed 67.6% of the HTTP mes- 1 ACCT, ADAT, AUTH, CONF, ENC, MIC, PASS, PBSZ, PROT, QUIT, USER
2 230, 331, 332, 530, 532
sages as valid, with the remaining ones classified as 3 PASV, EPSV, LPSV
TCP with a random payload. For FTP, a 100% quota 4 227, 228, 229

was reached, however FTP messages can be rather 5 ABOR, EPRT, LPRT, MODE, PORT, REST, RETR, TYPE, XSEM, XSEN
6 125, 150, 221, 225, 226, 421, 425, 426
simple to be valid. A few messages are repeated often 7 ALLO, APPE, CDUP, CWD, DELE, LIST, MKD, MDTM, PWD, RMD, RNFR, RNTO,
in the training data, which also appear in the output STOR, STRU, SYST, XCUP, XMKD, XPWD, XRMD
8 212, 213, 215, 250, 257, 350, 532
of this experiment. Some generated examples can be 9 120, 200, 202, 211, 214, 220, 450, 451, 452, 500, 501, 502, 503, 504, 550, 551, 552, 553, 554, 555
seen in figure 6. We see these results as a sound basis
for further experimentation. For this evaluation to work, we have to know the
true classes (synonym for types) for both protocols
4.4 Clustering in advance. This is not straightforward, as neither
protocol specifies explicit types/groups of messages
For clustering, the feature extraction results are rele- apart from requests and responses. For HTTP, the
vant to encode the data samples to a smaller format. responses especially have a wide range of meanings,
We initialize three SOMs and compare their results: which we grouped by their respective code’s first digit
a baseline model with capped and padded messages for this analysis, as they represent a basic meaning
to a fixed length of 1024, a second SOM model using of the code without going into too much detail. This
the AE encoding, and a third model using the CNN means that all messages from 200–299, all 300–399,
feature map. These three models differ in their input and all 400–499 messages are considered to be of the
size but have an identical output map dimension of same type, respectively. Along with all the valid key-
16 × 1 for HTTP and 64 × 1 for FTP. The different words that an HTTP statement can start with (GET,
numbers for the output dimensions for both protocols POST, HEAD, DELETE, OPTIONS, PUT, TRACE,
are based on experimentation and roughly match the CONNECT), this gives us a total of 11 clusters with
variety of different message types for each protocol. an additional miscellaneous one (MISC). For FTP,
This can be adjusted by parameter for any new pro- we manually defined groupings of keywords and key-
tocol or optimization purposes. Training is performed codes with similar meaning or purpose, as shown in
with a learning rate = 0.005 and sigma = 1.5 for HTTP Table 2. This was a manual process, and we do not
and sigma = 3 for FTP. For details on the parameters, claim to have achieved perfection with this grouping.
please see the “MiniSom” library documentation7 .
For the experimentation on clustering, we use the
6 https://www.wireshark.org/ first multi-model approach. We compare three dif-
7 https://github.com/JustGlowing/minisom ferent configurations of self-organizing maps in terms

8
Table 3: Results of the clustering experiments for compari- 1024 Chars/Bytes
son. One can see an improvement over the baseline model HTTP/1.1 200 POST...
HTTP/1.1
when using Autoencoder. 200 POST...

Autoencoder
128 Chars/Bytes
Padding/Capping

(AE)
HTTP/1.1 HTTP/1.1
(a) HTTP clustering 200 OK... 200 OK...
HTTP/1.1...

Architecture Accuracy (dominant) Avg. Confidence (dominant)


HTTP/1.1 302 HTTP/1.1
Baseline SOM 75% (75%) 58.34% (58.34%) Moved... 302 Moved...

CNN + SOM 68.75% (68.75%) 53.61% (53.61%) 1 (POST)


Prediction: Sequence: SOM
AE + SOM 87.5% (87.5%) 69.24% (69.24%) 4
LSTM 1, 7, 1
4 (OK)
(16 Classes)
7 (Moved)
(b) FTP clustering ...
Architecture Accuracy (dominant) Avg. Confidence (dominant) Figure 7: This figure shows examples of HTTP statements
as they are processed for state recognition.
Baseline SOM 60.94% (72.22%) 51.8% (61.4%)
CNN + SOM 29.69% (29.69%) 18.19% (18.19%) to the SOM output is sufficient to indicate highly cor-
AE + SOM 67.19% (86%) 56.11% (71.82%) related sequences by presenting the network’s confi-
dence for what the following message type could be.
of their performance on our clustering metrics against The central idea of SOM learning is arranging sim-
each other. Firstly, the 128 neuron-wide SOM will ilar types of messages in proximity to one another,
use the encoding from the autoencoder model. Sec- thereby allowing the use of the MSE loss function to
ondly, the 240 neuron-wide SOM will use the feature approximate the correct message type in the LSTM
maps of the CNN architecture. Lastly, as a baseline, output. The use of Cross-Entropy (CRE) loss has
we use a raw SOM with 1024 neuron inputs for whole shown no convergence. For training, we use an Adam
statements. optimizer with a learning rate = 0.005 and betas =
We use two metrics to judge the effectiveness of (0.3, 0.9).
the clustering for each setup. The first is accuracy: The setup is shown in figure 7. A simple time-
how many detected clusters match a message type of series prediction using a recurrent architecture shows
the protocol with more than 50% confidence, which promising results in FTP, where actual states are im-
will be referred to as a “dominant” cluster. Here, con- plied in the protocol. With the 64 indicated possi-
fidence is defined as the relative share of a type among ble clusters, the LSTM matched 42% of the predicted
all messages assigned to one cluster. If there are 120 message types to the following actual message. This
messages of type A and 80 messages of type B, all number may seem low at first glance; however, for an
put into one cluster, then the confidence of this clus- in-depth state fuzzing approach using many attempts,
120 we view this as a significant improvement over ran-
ter to represent type A will be = 60%. The
120 + 80 dom choice.
second metric is the average confidence among all
This experiment was also conducted for the HTTP
clusters. Both metrics are also reported for dominant
protocol, and the results are less impressive. Out of
types only (> 50% for one type), to remove empty
the 16 implied clusters, 72% were correctly predicted.
and small clusters from the average.
This number may seem high at first glance but relies
Table 3 shows the results of the experiments.
on predicting the average cluster number aligned with
Some setups using dedicated feature extractors can
the dataset bias. In other words, the prediction simply
outperform the baseline significantly. The autoen-
states two or three alternating types for the GET mes-
coder appears to be better suited for this task, perhaps
sage, which drives the prediction accuracy higher than
because the CNN architecture required a workaround
any different prediction pattern. The dataset could
with data augmentation to even allow for training. For
not be balanced for classes in this experiment, as we
further experiments that require cluster indexes, we
wanted to avoid changing any sequential dependen-
combine the AE and SOM architectures as a pipeline
cies by randomly picking messages. As a result, the
to replace messages with their cluster index.
input data for HTTP has a significant bias towards
GET message types, as the original data have, which
4.5 State Recognition ultimately explains this behavior of the LSTM.
The results in this section showcase the potential
To recognize deeper states in a protocol, we use hazards of working with machine learning techniques,
the correlation of message sequences as they appear mixing their embeddings and approaches, as well as
chronologically in the dataset. For this, we replaced the interpretation of a predefined metric. Even though
all messages with their cluster assigned by the cluster- the results were a success in regard to the stateful pro-
ing model. A simple LSTM with fitting dimensions tocol, which was our original aim for this experiment,

9
applying the same efforts to a stateless protocol shows
the weakness and dangerous pitfalls when interpret-
ing the metric. For both protocols, the metric is mis-
leading at face value. Only an analytic look at actual
prediction results did correct the error.

4.6 Sequence Generation

To fully mimic all aspects of the behavior of a pro-


tocol, we must be able to create syntactically cor-
rect messages with real-world distribution. We use
an LSTM model similar to the one for feature re-
verse engineering with added context. For any mes-
sage, instead of using a generic SOP and EOP symbol Figure 8: HTTP feature reverse engineering (blue) vs. se-
as markers for beginning and end, we introduce new quence generation (orange) results in terms of type distri-
special symbols, which are individualized by cluster bution. Sequence generation shows better request/response
type. This results in 2 × 16 extra symbols for HTTP correlation.
and 2 × 64 additional symbols for FTP to be added to
the ASCII alphabet in their respective setups and sub-
sequently to the one-hot-encoding. The sequences put
into the LSTM, which uses a hidden size of 100 neu-
rons in two layers to account for the extra complexity,
have the beginnings and endings of each message in-
dicated by a cluster special SOP or EOP symbol. The
rest of the model architecture is identical to the one
used for feature reverse engineering, including con-
volutional embedding. Again, for training, we use the
Adam optimizer with a learning rate = 0.005 and de-
fault betas with an NLL loss function.
We use the same metric as for the feature re-
verse engineering LSTM. We parsed the generated Figure 9: FTP feature reverse engineering (blue) vs. se-
sequence into HTTP/FTP statements wrapped up in quence generation (orange) results in terms of type distribu-
some valid low-level protocol headers and collected tion. Sequence generation shows a better distribution except
for 1 class.
them in a pcap file. 63% of all sequences were iden-
tified as valid for HTTP and 100% for FTP by Wire- This result lays the final foundation we needed to
shark. The excellent result on FTP is explainable for enable deep learning-based fuzzing of any protocol,
the same reasons as in the feature reverse engineer- as a PRE approach is designed to be generally appli-
ing, suffering from the simplicity of the messages and cable.
training data. Figure 9 shows the type distribution of
both the feature reverse engineering and the sequence
generation for FTP to visualize the impact of adding
the other steps via cluster indexing. Only one class 5 RELATED WORK
stands out, which is either due to its simplicity or
a miscategorization of FTP types on our part. The Learn&Fuzz: Machine Learning for Input
curve indicates, that a wider, more balanced distri- Fuzzing Based on the idea that fuzzing only re-
bution of types was inferred. For the comparatively quires an approximately correct input, the authors
lower HTTP result, as shown in figure 8, the messages of (Godefroid et al., 2017) used a recurrent neu-
displayed a greater correlation between requests and ral network to help in fuzzing a PDF parser in a
responses in the sequence generation. This implies browser. The syntax for objects, a central building
that the added context of types allows the neural net- block of PDFs, was inferred by an AI in this work.
work to learn these connections more effectively. We The authors used three different sampling strategies
expect that more capable and larger architectures will from their network to address various challenges in
show an increase in the effectiveness of the observed fuzzing: NoSample, Sample, and SampleSpace. The
behavior. first one sees the RNN pick the letter with the highest

10
score among the output to ensure syntactically cor- sifying the message based on these features. Based
rect objects. The second strategy takes the network’s on the approach, they report a classification accuracy
softmax output as a probability distribution for pick- of about 99% on unknown datasets. Their method
ing the next letter. This increases the fuzzing cover- differs from PREUNN in several ways. They semi-
age but may result in many ill-formed objects. Lastly, automatically extract features using statistical meth-
the SampleSpace strategy combines the two previous. ods, while we fully automated this process using an
No-Sample is used until a whitespace is generated. autoencoder. Furthermore, their work’s goal is to
The next following character is sampled by proba- classify unknown protocol messages, whereas we aim
bility, only to switch back to the No-Sample strat- to generate new valid messages based on a novel pro-
egy again. This creates more syntactically correct ob- tocol. Classification in PREUNN happens implicitly.
jects while also creating a more comprehensive mix
of different objects. These strategies were used in Network Traffic Classification (NTC) The topic
the LSTM sampling. Our sampling, however, has to of NTC is related to PRE regarding feature extrac-
cover message context as well. tion. Several papers are combining NTC with ML and
NNs (Lopez-Martin et al., 2017; Michael et al., 2017;
Machine Learning for Black-Box Fuzzing of Net- Li et al., 2018). However, the more complex steps of
work Protocols The authors (Fan and Chang, 2017) PRE, like generating new packages for an unknown
see their work as the first attempt to combine an protocol, are not addressed in those papers.
in-depth learning approach with black-box fuzzing.
They deploy a sequence-to-sequence model, a two-
part LSTM model invented for machine translation
(Sutskever et al., 2014), to learn the semantics of
6 CONCLUSION
network protocols and create new output for fuzzing
PREUNN represents a novel approach for the separa-
purposes. Their work differs from this paper in the
tion of traditional protocol reverse engineering tasks
number of steps taken and models employed to learn
that can be implemented using only neural networks.
details about the protocol. Here, a whole model is
In this paper, the widespread application layer proto-
presented, which was derived from previous work on
col HTTP v1.1 and FTP were successfully reverse-
PRE. Additional information such as clustering or
engineered using our approach. This highlights the
context-based generation is missing as well.
potential of several deep learning architectures such
as AEs for feature extraction, LSTMs for feature re-
GANFuzz: A GAN-based Industrial Network Pro- verse engineering, SOMs for clustering, LSTMs for
tocol Fuzzing Framework The GANFuzz Frame- state recognition, and finally a combination of the
work (Hu et al., 2018) represents an alternative ap- above for sequence generation. The results achieved
proach to protocol fuzzing using deep learning but include a decent, intuitively agreeable clustering and a
falls into the same category as the previous paper. context-capable message generation model. We omit-
Instead of a sequence-to-sequence model, GANFuzz ted optimizations and testing the use of PREUNN for
uses a model known as a SequenceGAN (SeqGAN) fuzzing entirely due to time restraints and leave them
(Yu et al., 2017). This is a version of the common for future work. Our modular approach allows the use
GAN model but adjusted for text-based data using re- of newer architectures from the fields of deep learn-
inforcement learning. The authors of the paper train ing, such as natural language processing (e.g. BERT)
one SeqGAN per message type, which they deduce for improvements. Further future work could also in-
from a variety of clustering heuristics. This makes clude the use of reinforcement learning utilizing these
their approach more similar to PREUNN but is still pre-trained models to create an automated fuzzer that
missing several work steps, a concise work step mode, is capable of reverse engineering any similarly struc-
and the generation of data done by context symbols tured message-based language.
instead of entirely separate models.

Deep neural network-based automatic unknown REFERENCES


protocol classification system using histogram fea-
ture Jung and Jeong demonstrated a machine- Bação, F., Lobo, V., and Painho, M. (2005). Self-organizing
learning-based PRE approach to classify unknown maps as substitutes for k-means clustering. In Inter-
protocols (Jung and Jeong, 2020). They use statistical national Conference on Computational Science, pages
methods to extract features from ten different proto- 476–483. Springer.
cols and feed them into a deep belief network, clas- Besic, N. (2019). What is a fuzzer and what does fuzzing

11
mean. https://www.neuralegion.com/fuzzing-what- Kohonen, T. (1982). Self-organized formation of topolog-
is-fuzzer/. ically correct feature maps. Biological cybernetics,
Brook, C. (2018). What is deep packet inspection? 43(1):59–69.
how it works, use cases for dpi, and more. Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012).
https://digitalguardian.com/blog/what-deep-packet- Imagenet classification with deep convolutional neu-
inspection-how-it-works-use-cases-dpi-and-more. ral networks. In Pereira, F., Burges, C. J. C., Bottou,
Caballero, J., Yin, H., Liang, Z., and Song, D. (2007). Poly- L., and Weinberger, K. Q., editors, Advances in Neu-
glot: Automatic extraction of protocol message format ral Information Processing Systems 25, pages 1097–
using dynamic binary analysis. In Proceedings of the 1105. Curran Associates, Inc.
14th ACM conference on Computer and communica- Lang, K. J. (1999). Faster algorithms for finding minimal
tions security, pages 317–329. consistent dfas. NEC Research Institute, Tech. Rep.
Comparetti, P. M., Wondracek, G., Kruegel, C., and Kirda, Li, R., Xiao, X., Ni, S., Zheng, H., and Xia, S. (2018). Byte
E. (2009). Prospex: Protocol specification extraction. segment neural network for network traffic classifica-
In 2009 30th IEEE Symposium on Security and Pri- tion. In 2018 IEEE/ACM 26th International Sympo-
vacy, pages 110–125. IEEE. sium on Quality of Service (IWQoS), pages 1–10.
Cui, W., Kannan, J., and Wang, H. J. (2007). Discoverer: Lopez-Martin, M., Carro, B., Sanchez-Esguevillas, A., and
Automatic protocol reverse engineering from network Lloret, J. (2017). Network traffic classifier with con-
traces. In USENIX Security Symposium, pages 1–14. volutional and recurrent neural networks for internet
of things. IEEE Access, 5:18042–18050.
Duchene, J., Le Guernic, C., Alata, E., Nicomette, V., and
Kaâniche, M. (2018). State of the art of network pro- Michael, A., Valla, E., Neggatu, N. S., and Moore, A.
tocol reverse engineering tools. Journal of Computer (2017). Network traffic classification via neural net-
Virology and Hacking Techniques, 14(1):53–68. works. Technical Report UCAM-CL-TR-912, Univer-
sity of Cambridge, Computer Laboratory.
Fan, R. and Chang, Y. (2017). Machine learning for black-
Narayan, J., Shukla, S. K., and Clancy, T. C. (2015). A sur-
box fuzzing of network protocols. In International
vey of automatic protocol reverse engineering tools.
Conference on Information and Communications Se-
ACM Computing Surveys (CSUR), 48(3):1–26.
curity, pages 621–632. Springer.
Pang, R. and Paxson, V. (2003). Lawrence Berkeley Na-
Garfinkel, S. (2008). Nitroba university harassment sce- tional Laboratory - FTP - Packet Trace. Dataset:
nario. Dataset: https://digitalcorpora.org/corpora/ https://ee.lbl.gov/anonymized-traces.html.
scenarios/nitroba-university-harassment-scenario.
Radford, A., Metz, L., and Chintala, S. (2015). Unsu-
Godefroid, P., Peleg, H., and Singh, R. (2017). Learn&fuzz: pervised representation learning with deep convolu-
Machine learning for input fuzzing. In 2017 32nd tional generative adversarial networks. arXiv preprint
IEEE/ACM International Conference on Automated arXiv:1511.06434.
Software Engineering (ASE), pages 50–59. IEEE.
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R.,
Goo, Y.-H., Shim, K.-S., Lee, M.-S., and Kim, M.-S. Parikh, D., and Batra, D. (2017). Grad-cam: Visual
(2019). Http and dns traffic traces for experiment- explanations from deep networks via gradient-based
ing of protocol reverse engineering methods. http: localization. In Proceedings of the IEEE international
//dx.doi.org/10.21227/tpqf-fe98. conference on computer vision, pages 618–626.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Sharafaldin, I., Lashkari, A. H., Hakak, S., and Ghorbani,
Warde-Farley, D., Ozair, S., Courville, A., and Ben- A. A. (2019). Developing realistic distributed denial
gio, Y. (2014). Generative adversarial nets. In of service (ddos) attack dataset and taxonomy. In 2019
Advances in neural information processing systems, International Carnahan Conference on Security Tech-
pages 2672–2680. nology (ICCST), pages 1–8. IEEE.
Hinton, G. E. and Salakhutdinov, R. (2006). Reducing the Shiravi, A., Shiravi, H., Tavallaee, M., and Ghorbani, A. A.
dimensionality of data with neural networks. Science, (2012). Toward developing a systematic approach to
313:504 – 507. generate benchmark datasets for intrusion detection.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term computers & security, 31(3):357–374.
memory. Neural computation, 9(8):1735–1780. Sutskever, I., Vinyals, O., and Le, Q. V. (2014). Se-
Hornik, K., Stinchcombe, M., White, H., et al. (1989). Mul- quence to sequence learning with neural networks. In
tilayer feedforward networks are universal approxi- Advances in neural information processing systems,
mators. Neural networks, 2(5):359–366. pages 3104–3112.
Hu, Z., Shi, J., Huang, Y., Xiong, J., and Bu, X. (2018). TC97, I. (1984). Basic reference model. International Stan-
Ganfuzz: a gan-based industrial network protocol dard, ISO/IS, 7498.
fuzzing framework. In Proceedings of the 15th ACM Wondracek, G., Comparetti, P. M., Kruegel, C., and Kirda,
International Conference on Computing Frontiers, E. (2008). Automatic network protocol analysis. In
pages 138–145. NDSS, volume 8, pages 1–14.
Jung, Y. and Jeong, C.-M. (2020). Deep neural network- Yu, L., Zhang, W., Wang, J., and Yu, Y. (2017). Seqgan:
based automatic unknown protocol classification sys- Sequence generative adversarial nets with policy gra-
tem using histogram feature. The Journal of Super- dient. In Thirty-first AAAI conference on artificial in-
computing, 76(7):5425–5441. telligence.

12

You might also like