22-RAID-Encrypted Malware Traffic Detection Via Graph-Based Network Analysis PDF
22-RAID-Encrypted Malware Traffic Detection Via Graph-Based Network Analysis PDF
ABSTRACT Intrusions and Defenses (RAID 2022), October 26–28, 2022, Limassol, Cyprus.
Malicious activities on the Internet continue to grow in volume ACM, New York, NY, USA, 15 pages. https://doi.org/10.1145/3545948.3545983
and damage, posing a serious risk to society. Malware with remote 1 INTRODUCTION
control capabilities is considered one of the most threatening mali- Malware refers to intrusive software programs developed by cyber-
cious activities, as it can enable arbitrary types of cyber-attacks. As criminals with malicious intentions such as stealing data, corrupting
a countermeasure, many malware detection methods are proposed computers, bringing down servers, and penetrating networks. The
to identify malicious behaviours based on tra�c characteristics. most damaging malware are those with remote control capabil-
However, the emerging encryption and evasion techniques pose ities, which give attackers administrative control of the victim’s
substantial barriers to the full exploitation of network information. computer to enable arbitrary types of cyber-attacks. For example,
This signi�cantly impairs the e�ectiveness of existing malware de- NOPEN used in a massive breach of top-secret data in 2017 has
tection methods relying on a singular type of characteristics. In this such remote control capabilities [7]. The infection pattern of this
paper, we propose ST-Graph to resolve this issue. In addition to tra- type of malware is shown in Figure 1, where the infected host com-
ditional stream attributes, ST-Graph explores spatial and temporal municates with the control server to perform further malicious acts.
characteristics of network behaviours based on a graph represen- Due to the highly concealed and high-risk nature, the detection
tation learning algorithm and integrates all available information of malware has received considerable critical attention from both
to boost the detection decision. To illustrate the e�ectiveness of academia [28, 69, 71] and industry [8, 9, 13].
ST-Graph, we evaluate it on two datasets. Experimental results Real-world malware tra�c detection requires e�cient detection
demonstrate that ST-Graph outperforms state-of-the-art malware over complex network tra�c data with high accuracy. However,
detection systems and also shows good performance in e�ciency, the emergence of encryption strategies and various evasion tech-
generalizability, and robustness. Speci�cally, it achieves over 99% nologies of adversaries poses huge barriers to e�ective malware
precision and recall, and its False Positive Rate is even two orders tra�c detection. More speci�cally, adversaries employ encryption
of magnitude lower than (nearly 0.02 times) that of baseline mod- protocols (i.e. TLS protocol [52]) in the process of malware com-
els. Meanwhile, the deployment of ST-Graph in two real network munications to hide suspicious information. According to [18], in
scenarios for around one year shows an outstanding e�ciency 2021, more than 46% of malware have encrypted their communica-
with only 160 seconds time cost for 5-minute tra�c in 1.7 Gbps tions. The encryption on most tra�c greatly curtails the accessible
bandwidth. information (e.g., URL and HTTP headers in payload, ) that indi-
CCS CONCEPTS cate malicious network behaviour. This impairs the accuracy and
consistency of traditional network-based malware detection, such
• Security and privacy ! Intrusion detection systems. as Deep Packet Inspection (DPI) [47, 54].
KEYWORDS Many previous works have made e�orts to resolve the challenges
Tra�c-based malware detection, encryption tra�c, graph repre- of encryption. For example, to enlarge the set of indicative informa-
sentation tion, [3, 19] summarise the �ne-grained features of TLS tra�c and
ACM Reference Format:
use machine learning methods to identify encrypted tra�c gener-
Zhuoqun Fu, Mingxuan Liu, Yue Qin, Jia Zhang, Yuan Zou, Qilei Yin, Qi Li, ated by known malicious samples. Further, [37, 42] introduce deep
Haixin Duan, . 2022. Encrypted Malware Tra�c Detection via Graph-based learning methods to reduce the dependence on features and human
Network Analysis. In 25th International Symposium on Research in Attacks, experience. However, such detection methods based on features of
singular streams cannot be generalized to unknown samples and
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed may induce a large number of false positives that cannot be inter-
for pro�t or commercial advantage and that copies bear this notice and the full citation preted. To overcome such issues, another trend of methods [34, 35]
on the �rst page. Copyrights for components of this work owned by others than the attempt to reveal the essential characteristics of malware behaviour
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior speci�c permission through the accessing relations between- hosts and domains. How-
and/or a fee. Request permissions from [email protected]. ever, adopting such methods in the real-world network scenario is
RAID 2022, October 26–28, 2022, Limassol, Cyprus not trivial due to e�ciency concerns. More seriously, adversaries
© 2022 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-9704-9/22/10. . . $15.00 have developed various evasion techniques by perturbing stream
https://doi.org/10.1145/3545948.3545983 features, which further reduces the amount of usable information
RAID 2022, October 26–28, 2022, Limassol, Cyprus Fu and Liu, et al.
Victim Attacker Information in TLS handshake. Transport Layer Security (TLS) [52]
is utilized to assure con�dentiality and integrity between two com-
Attack
Malicious intrusion munication entities. For example, HTTPS is the plain text HTTP
Phase Intrusion acknowledge & Delivery malware protocol over TLS. In general, a TLS connection negotiates keys
via a TLS handshake and then uses secure symmetric to encrypt
Install malware communication payloads. Encryption of TLS could prevent any
Request instructions third parties from accessing plain communication payloads, which
Control
Phase Issue instructions attracts an increasing number of applications to deploy TLS. As a
…… coin has two sides, malicious adversaries also deploy TLS to encrypt
Figure 1: Infection pattern of remote control malware. their connections to evade detection by DPI systems [47, 54]. How-
ever, TLS handshake remains in plain-text in order to exchange the
for malware identi�cation. So far, the issue of lacking representative information necessary for encryption. During a TLS handshake,
information for encrypted malware detection has not been well re- two communicating parties authenticate with each other, negoti-
solved. Previous works only use a single type of characteristics such ate encryption algorithms, exchange encryption keys and �nally
as stream features or context information while the information agree on the encryption process based on previously exchanged
loss and compression still exist. -
In this paper, we propose ST-Graph, a multi-stream analysis
information. More speci�cally, the client initially sends a Clien-
tHello message, providing a list of cipher suites and a set of TLS
framework that explores multiple features from spatial and tempo- extensions. The cipher suites are a set of cryptographic algorithms
ral perspectives and integrates all available information for compre- required by TLS. The extensions are the features supported by the
hensive malware tra�c detection under encryption scenarios. More TLS client, where the Server Name Indication (SNI) extension indi-
speci�cally, we design an attribute heterogeneous graph oriented cates the hostname of the target server to which the client is trying
to spatio-temporal tra�c behaviour, which e�ectively captures the to connect. The server then responds with a ServerHello message,
characteristics of associations between nodes in a large-scale net- containing the selected algorithm and identity information. As the
work. Equipped with a graph representation learning method, our information exchanged during the handshake process is critical
method extends the scope of information that can be utilized for to the subsequent encrypted communication, it also provides vital
recognizing encrypted malicious tra�c, which boosts the detection information to detect malicious encrypted tra�c [2].
to achieve higher accuracy and robustness. In the meantime, we Graph Representation Learning. Graph is a data structure for
carefully design our graph representation learning algorithm on representing complex networks and modelling abstract concepts
the basis of random walk [49], which learns spatial and tempo- such as relations between entities. However, traditional graph
ral features for network tra�c with high e�ciency. The learned construction algorithms (e.g. adjacency matrices [53], adjacency
host representations aggregate information from behavioural se- list [10]) and graph analysis algorithms (e.g. search algorithms [59])
quences in the network, which e�ectively highlights the di�erences have di�culty in solving the increasingly complex graph topology
in tra�c between compromised hosts and benign samples with and can incur huge space and time overheads. At the same time,
little information loss. graph representation algorithms [24] allow for more �exible and
Our main contributions are summarised as follows: e�cient analysis on large-scale graphs, with the goal of optimiz-
• We propose ST-Graph, a real-time malicious tra�c detection ing a set of vectors that numerically represent node information
framework under an encryption scenario. By exploring and inte- as well as graph structure. Among such methods, graph embed-
grating multiple features, ST-Graph e�ectively reveals malicious ding [21] is a representative approach, which transforms nodes
behaviours within an encrypted network, which enables low false on graphs into vectors. A graph could be approximated by several
alarm rate detection. node lists, which is similar to the word sequences in a piece of
• We design a heterogeneous attribute graph for encrypted tra�c text. Therefore, word2vec [39], a text embedding method, could be
and propose a novel embedding method, namely interval-inclined adopted for graph embedding e�ectively [49]. First, a list of nodes
can be regarded as a “corpus” of a graph, which can be obtained by
-
knowledge about graph representation learning. tion our detection system, ST-Graph, at the gateway to monitor the
Encrypted Malware Tra�ic Detection
via Graph-based Network Analysis RAID 2022, October 26–28, 2022, Limassol, Cyprus
External
ST-Graph section, we illustrate the key observations of malware infections,
Detect
explain the design of the system, and describe the work�ow of
Internal
ST-Graph.
4.1 Key Observations
Malicious level
… Gateway
…
Figure 2: Threat model of ST-Graph. Attack
Control
Server H1 H2 H3
whole tra�c, as shown in Figure 2. Speci�cally, ST-Graph listens
to encrypted network tra�c generated by internal hosts accessing Figure 3: Schematic representation of host connectivity in
external servers at the gateway and detects infected hosts with a network environment with malware infections. Colour
suspicious communications in a real-time manner. In this work, of circle represents the maliciousness of the destination
we only focus on the standardized TLS protocol encrypted tra�c. server. For example, white circles represent normal desti-
Since it’s the mainstream encryption protocol with the most usage nation servers, like www.google.com, grey circles represent
rate for malware, due to its ease of deployment. To be noticed, our middle-malicious servers, like phishing websites, and black
detection system only captures the tra�c without manipulating circles represent malicious servers, like C&C servers.
them and thus will not a�ect the benign forwarding tra�c.
However, such detection demand poses huge challenges mainly Since remote control malware has the advantage of low attack
from two perspectives: 1) curtailed available information limits cost and the capability of widespread, it has been widely taken by
the e�ectiveness of the detection, and 2) the highly comprehen- adversaries in more than half of malware attacks [30]. To detect
sive network connections hinder the e�ciency of the detection. remote control malware, we �rst perform an empirical study to
More speci�cally, the �rst concern results from recent encryption condense the key features. Based on previous work that found that
and adversaries’ evasion technologies, which greatly curtail the some malware has similarities in attack targets and attack meth-
amount of information that can be utilized for detection. In en- ods [29, 46, 57], we perform a manual analysis of remote control
cryption scenarios, the invisibility of payloads of communication malware from two aspects: code and generated tra�c. First, we
prevents traditional detection methods, e.g. DPI methods [47, 54], randomly sample 30 malware as our ground truth and run them on
from making accurate predictions. More seriously, adversaries have several operating systems under our control. Note that the entire
evolved several techniques to evade existing single-stream-based experiment environment is enclosed and will not a�ect any third
detection methods [17, 61, 65]. In obfuscation-based evasion, adver- parties. From the code aspect, we reverse their code and analyse
saries change frequency of accesses from a host to command and the code logic and hard-coded content. We �nd that malware in the
control (C&C) servers [26], which blurs the suspicious property same family prefers to share a similar software framework, which is
of the host to evade detection based on packet length and time the root cause of the similar tra�c behaviour. The reason for code
interval [3, 19, 45]. In disguising-based evasion, malware takes the sharing and reuse, as we speculate, is likely to be saving costs. And
form of benign software connections, disguises that its tra�c is malware is generally not up to date and still accepts low version
generated by benign software [17], visits benign third-party web- TLS connections. Besides, developers of malware often use default
sites initially as a front [68], or reduces the number of websites �xed parameters, especially the time interval. For example, we �nd
visited [38] to avoid context-based detection [50, 73]. The suspi- that some malware usually set the time interval to connect to the
cious behaviours hide behind millions of legitimate requests, which control servers to 60 seconds.
poses substantial barriers to e�ective detection and introduces a Further, to explore the tra�c behaviour, we try to decrypt the
high level of false positives [11, 17]. Therefore, encryption and eva- tra�c generated by ground truth malware. With the master key
sion behaviours prompt us to explore more available information extracted from controlled operating systems, we could analyse
for e�ective detection. the plain communication between malware and servers. The simi-
As for the second concern, the detection over network inter- larity of code within the same family leads to a concentration of
actions confronts a large scale of network throughput, resulting behavioural characteristics in their tra�c behaviour. Due to a clear,
from the increasing number of network users and the complex joint attack purpose, i.e. remote control of the infected hosts, there
network connectivity. Meanwhile, the demand of exploring addi- exists a relatively �xed tra�c pattern in remote control attack. As
tional, multi-faceted features for e�ective detection also intensi�es shown in Figure 1, the communication process can be summarised
the computational complexity. Therefore, we need to design new into two main phases: the attack phase (solid lines) and the control
detection technologies that can explore and process more network phase (dashed lines). Initially, the attack phase is to deliver malware
features over comprehensive network interactions with tolerable to victim’s machine by spoo�ng websites (grey circles) or exploiting
computational complexity. vulnerabilities to install malware, which can help adversaries gain
control of the host. For example, for H2 in Figure 3, the attacker
4 SYSTEM OVERVIEW commits a fraudulent activity by sending phishing emails in the
In this paper, we propose ST-Graph, a multi-stream analysis frame- attack phase and lures victim to click the embedded link, which
work equipped with a novel graph embedding algorithm. In this may cause the redirection from phishing website (grey circle) to
RAID 2022, October 26–28, 2022, Limassol, Cyprus Fu and Liu, et al.
Encrypted Traffic temporal (i.e., connection order and time-related information) char-
1 Spatio-Temporal Graph Representor acteristics of hosts’ network behaviour into host representations
Traffic Preprocessor
Graph Edge Host to boost the e�ectiveness of detection decisions. To achieve higher
Flow Features
2
Graph Representor
Construction Embedding Representation
e�ciency, we improve the algorithm of graph representation by
only optimising edge representation with iterative updates, while
3
Detector
the optimal node representations are derived from closed-form so-
Host Features lutions. This signi�cantly reduces the computational complexity of
Malware Infection Host Server
graph representation learning.
malware from code and tra�c analysis, i.e., attack phase and control
for each stream. More speci�cally, we extract the contents of the
phase. Besides, we �nd an exclusive observation, i.e., the forged
TLS handshake in each reserved TLS stream, including the TLS ver-
innocent network connection test, which gives us a unique per-
sion and the supported cipher suites, and calculate some statistical
spective on detection. Our observations not only extend existing
information such as the number of packets and the bytes within a
understanding of malware behaviour but also deliver distinctive
stream. Such statistical information extracted from the data stream
�ndings. Below, we summarise the common characteristics of mal-
remains an important element of malware tra�c detection and is
ware infections from the following aspects:
used later in the computation of the graph representation.
• Spatial Feature refers to the concentration property between hosts
Graph Representor. We build a heterogeneous graph to represent
and target servers, especially within families. As mentioned before,
the end-to-end communications among hosts and servers by their
due to framework reuse, some malware within the same family
temporal and spatial features. Speci�cally, the temporal features are
exhibits high similarity, re�ected by the destination servers they try
the tra�c characteristics we extract from the streams, which are
to connect with. For example in Figure 3, H2 and H3 are connected
embedded in the stream representations; and the spatial features
to the same destination servers in the control phase.
are re�ected by the graph structure. Based on this, we design a
• Temporal Feature of the connection. The connection order of a
node embedding algorithm to transfer the spatio-temporal features
host over a period of time can help us rebuild the infection process
into host representations, which re�ect the similarity of the host’s
of malware, which can clearly show the change with phase. Besides,
access behaviour. We will elaborate on this module in §5.
due to the setting of �xed parameters, communications between
Detector. Given the embedding vector of each host that numeri-
infected hosts and control servers are regular, especially the packet
cally quanti�es the host’s behavioural characteristics, we apply a
length and packet interval time [26].
machine learning approach to calculate the likelihood of the host
4.2 System Design being infected. Considering the ability to resist over�tting [12]
To achieve e�ectiveness and e�ciency, the key point of our de- and the better performance in comparative experiments (see Ap-
tection system, ST-Graph, is to retrieve as much information as pendix A), we �nally employ the random forest (RF) regression
possible with tolerable computation complexity. According to the algorithm [12] as our detector. It is an ensemble learning method
observations of our empirical study (See §4.1), we summarise two that makes predictions by averaging the output of multiple deci-
categories of general and distinguishable features. First, spatial sion trees. With the predicted infection value of each host, the
feature re�ects the property of network connection relationships detector outputs a list of suspicious hosts along with their access
for each host. In addition, temporal feature shows the side-channel information.
information of communications. These two kinds of features jointly 5 ST-GRAPH: SPATIO-TEMPORAL GRAPH
depict the network communication behaviour of hosts in a compre- In this section, we present the design details of ST-Graph, elaborate
hensive way. on the process of graph construction and the optimization of edge
Based on the observed features, we design ST-Graph, which in- representation, and explain how we propagate spatial and temporal
corporates the spatial (i.e., accesses between hosts and servers) and information in a lossless way to represent hosts for malware tra�c
Encrypted Malware Tra�ic Detection
via Graph-based Network Analysis RAID 2022, October 26–28, 2022, Limassol, Cyprus
detection. The key of the approach is to exploit the similarities in as the number, the character ratio, and the vowel or consonant
-
the network behaviour of di�erent applications. To do this, we �rst letter ratio for identi�cation. We also extract domain name length
- -
build a heterogeneous graph to correlate all network connections features to cope with lexicon-based DGA, which selects words from
-
I-
its all accesses generated in a sequential order, which integrates network behaviour. For example, the large number of heartbeat
-
the tra�c characteristics and network structure associated with packets used to maintain a connection can result in small packets
-
the host. Compared with Graph Neural Network-based methods per stream [3]. So we calculate some statistical information such
that train thousands of parameters [72], and Knowledge Graph as the number, the length, or the time interval of packets sent and
Embedding-based methods that jointly optimize the representation received within a stream.
of all nodes and edges [36, 66], ST-Graph learns edge embeddings 5.2 Edge Embedding
based on random-walk with a small number of iterations and op-
Edge embeddings are numerical representations of streams. In this
timizes host embeddings with a closed-form solution. This highly
paper, we integrate two categories of features into edge embeddings:
reduces the computational complexity to meet the need for real-
1) stream attributes ( for preserving original tra�c information and
time detection. -
the traditional random walk method [49] which traverses the graph
-
along the edges and maximizes the similarity of all edges within
IP address is used as a substitute when the domain is not available. - -
several traverses. The main reason for this design choice is that
We use an edge set ⇢ = {4 |4 = (⌘4 , 34 ), ⌘4 2 , 34 2 ⇡ } to denote
other graph-based models for edge representation learning, such as
-
-
which decides the edge to take for the next step based on previous
extracted from external nodes, and 3) the side channel statistics
-
-
wanderings.
features of the stream. -
C&C servers are not updated promptly, so they usually accept (⌘D , 3D )|D 2 ⇢, ⌘4 = ⌘D _ 34 = 3D }. Let F = [F 0, F 1, ..., F %! ] be
-
encryption algorithms. The underlying reasons are that attackers between two edges D, E as 3 (D, E) = |8D 8 E |, where 84 is the order
-
are relatively less concerned about whether communications will of edge 4 being connected by its host ⌘4 (See §5.1). Assume the
- -
be decrypted and that vulnerabilities in lower versions of operating random walker starts from edge >, we de�ne the probability of
selecting a certain edge G 2 ⇠> from the neighbours of > for the
-
1
extract features from the TLS ClientHello message, including the
Pr(F1 = G |F0 = >) = Õ
-
3 (>,G )
TLS version, the list of o�ered cipher suites and extensions as 1
.
~2⇠> 3 (~,> )
TLS handshake features. Such features can provide information
about the encryption algorithm supported by the client.II For the Here, our random walker inclines to select the edge having the
second type of attributes, our feature extraction is mainly based minimum connection order distance with the current walk as the
-
Generation Algorithms (DGA). Hence, we extract features such interval are tended to be consecutive steps in the random walk.
在这⾥,我们的随机游⾛者更倾向于选择与当前步骤连接次序最接近的边作为下⼀步。这样做的⽬的是为了让⽣成在短时间内的流量在
3
-
随机游⾛中成为连续的步骤。
--elu - ⑳
S -
du h U
, it
S
Xu X
,
=
x = u
Xn , x = 1 ,
if X Cu
Qux =
g , otherwise
.
RAID 2022, October 26–28, 2022, Limassol, Cyprus Fu and Liu, et al.
For the following walks, each selection is based on the two which are not informative for characterizing the host’s behaviour
previous walks. Assume the last step is at edge D and the current step while taking a large proportion of tra�c data. Hence, we rank
takes edge E, the walker selects the next step from E’s neighbours the streams associated with the host according to the importance
-
-
G 2 ⇠ E with of the server visited through the stream before the information
- -
U · 1 D,G
propagation. Such importance is with respect to 1) the frequency
-
= D) = Õ
3 (E,G )
Pr(F8+1 = G |F8 = E, F8 1 1
,
C 2⇠ E UD,C · 3 (E,C ) of the server being visited by all hosts and 2) the order (84 : 4 2
where the value of UD,G is determined by the relation between G ⇢, ⌘4 = C ^ 34 = D) of the host C visiting the server D. Intuitively, if
a server D is visited by most hosts, it is likely to be visited by the
and D. When G = D, UD,G takes the value of ?1 ; when G is one of the
-
- -
-
- - -
use a constant ? to control the probability of returning from E to the
- d (4) =
| {4 0 |4 0 2 ⇢ ^ 34 0 = D } |
,
starting point D. Otherwise, if there are one or more edges between 84 · |⇢ |
edge G and edge D, we apply a constant @ to control the probability
-
nodes (DFS); if @ > 1, the walker tends to visit local nodes (BFS), Further, we model the correlation score between0 host C and edge 4
which enhances the coverage of the surrounding neighbours. as a weighted sum over the importance of 4 to C and the similarity
According to this strategy, for each edge 4, we generate a net- between the host representation 6C and edge embedding 54 :
exp (6C · 54 )
- -
Õ
/
poral perspectives. Hence, to incorporate the spatial and temporal where _ is a scalar hyperparameter and / = 8 2⇢ exp (6C · 58 ) is a
features of the graph into edge embeddings, we set up a vector normalization factor. Let LC = {4 |4 2 ⇢ ^ ⌘4 = C } denote all edges
A4 for each edge, and optimize the vector by the proximity within proceeding from host C. The joint correlation score of all edges in
-
-
÷
its network neighbourhood N4 . More speci�cally, we model the LC is de�ned as:÷
-
(version 0.23.2) [48] for both singular value decomposition in host Table 1: Malware Families
representation and the classi�cation algorithms. Family Name # Samples Family Name # Samples
Baselines. We compare our ST-Graph with two state-of-the-art Minerd 5717 Unwanted 1610
malicious tra�c detection methods (ETA and FS-Net), which both Cryxos 4652 Faceliker 1564
perform detection in stream-level detection. The computational PhishingSite 3080 Trojandownloader 1528
complexity of Graph Neural Network-based methods [36, 66] is Wacatac 2949 Brocoiner 1482
too heavy to meet e�cient detection needs at all. Therefore, we hidelink 2860 Sality 1046
Kryptik 2403 Zbot 1035
exclude these methods as our comparative methods. The details of
Redirector 2190 RelevantKnowledge 771
the baselines are as follows. Generickdz 1974 Scrinject 756
• ETA [2] adopts the random forest model to detect malware traf- Installcore 1807 Ramnit 719
�c. Speci�cally, the ETA utilizes the stream features, including Iframe 1731 Others 13237
TLS handshake metadata, DNS contextual streams linked to the en-
crypted stream, and the HTTP headers of HTTP contextual streams
from the same source IP address within a 5-minute window. each malware family. Each example runs for 5 minutes on average,
• FS-Net [37] is an end-to-end deep learning model, which takes the during which all generated tra�c is recorded and saved. Since most
multi-layer encoder-decoder structure to mine the potential sequen- of the behaviours of the samples in the sandbox can be observed in
tial characteristics of streams. In summary, it learns representative the �rst two minutes [31], we consider �ve minutes to be a su�-
features from raw streams and then classi�es them. cient time interval to observe almost all valid behaviours. In total,
We set hyperparameters of baselines to either the values adopted we end up capturing 239,007 streams from 53,111 samples.
by their authors or default values. Speci�cally, for FS-Net, we set the • Benign Tra�c: We construct benign tra�c from two data sources,
dimension of hidden state as 128, layer number as 2 and dimension one is real tra�c captured in a campus network and the other is the
of length embedding as 16. For the other parameters involved, such tra�c generated by benign samples running in the sandbox. Most
as the parameters in random forest, we apply the default settings samples in the benign dataset are collected from a large campus
in scikit-learn (version 0.23.2). network with nearly 10,000 active hosts. We passively monitor all
Environment and Parameters. We conduct the methods on a inbound and outbound encrypted tra�c at di�erent time points for
Supermicro server with two Intel Xeon E5-2690 CPUs (2× 14 cores), 5 consecutive months from January 2021 to April 2021 and save
Centos 7.9.2009, 345G memory. For hyper-parameters, we set the the raw pcap packets. Due to the possibility of malware-infected
dimension of the host vectors ⇡⌘ as 256, the length of random walk hosts in the campus network, we cannot directly use all the tra�c
path %! as 100 and the number of paths per edge % # as 10. And as benign tra�c. We chose Alexa ranking as a �ltering criterion,
we set the parameter ? controlling the probability of returning as 1 keeping domains with the top 1 million tra�c rankings. The reason
and the parameter @ controlling the probability of exploring new for this choice is that websites with more than 1 million rankings on
nodes as 2 for better coverage of the surrounding neighbours. Alexa can be considered as “Long Tail”, thus this �ltering guarantees
Metrics. We use the following metrics to evaluate the detection the diversity of the sample and also ensures the tra�c is benign to
performance: (i) precision, (ii) recall and (iii) false-positive rates a certain extent. Since the crawled tra�c may contain some private
(FPR). (See Appendix B) information, we also need to anonymize the IPs. The same host is
treated as the same fake IP to ensure the integrity of the network
6.2 Dataset relationship. On the other hand, in order to avoid large bias in the
We conduct experiments on one public dataset (CICInvesAndMal- domain distribution between benign and malicious tra�c due to the
2019, as AndMal2019 [58]) and one dataset collected by ourselves di�erent collection methods, we also collect the tra�c generated by
(EncMal2021). AndMal2019 dataset includes tra�c and device logs benign samples running in the same sandbox environment. Such
generated by 5065 benign and 426 malicious Android apps on real samples are partly collected from the top 100 of the Microsoft
smart devices. And the malicious apps can be divided into 39 fami- Store’s top free list and partly collected from the programs �agged
lies. Our collected dataset EncMal2021 is constructed by capturing as benign by the malware analysis site. Among all the domains
the tra�c generated by the example and the campus network tra�c. of the tra�c generated by the benign samples, 24% are not in the
It contains 108,847 hosts with 5,202,093 streams, 4.5% of which are Alexa Top 1 million rankings. We believe this mitigates the bias
marked as malicious, and the others are marked as benign. Below associated with �ltering tra�c while keeping the tra�c benign.
we elaborate on the data collection process of EncMal2021. Finally, we capture 4,940,593 streams in the existing network by
Data Collection. EncMal2021 consists of malicious and benign 53,281 hosts and 22,493 streams in the sandbox using 2,455 samples.
tra�c data. Dataset statement. In EncMal2021, the malicious samples are run
• Malicious Tra�c: We use malware analysis sandboxes to col- in sandboxes, which makes the distribution of malicious tra�c
lect the data. The sandboxes, including Windows 7 and Windows not exactly consistent with the actual situation. However, we com-
10 operating systems, allow users to submit malicious executable pensate for this by enriching the malware types and the sandbox
examples and control the runtime based on the execution of the environment, so that the constructed dataset is comprehensive as
executable �les. Malicious samples come from malware analysis possible.
website VirusTotal and large security companies we work with, Train-test Split. For the �rst task, i.e., malware detection, we
with millions of sample updates per day. Table1 shows the names perform di�erent train-test split strategies on the two datasets, for
of malware families and the number of examples we sample from the capacities of these datasets are di�erent.
RAID 2022, October 26–28, 2022, Limassol, Cyprus Fu and Liu, et al.
Table 2: Detection Performance of ST-Graph and Baselines. In EncMal2021, use joint testing set as the test set.
Malware Detection Malware Family
Dataset Method Performance Time Taken Performance Time Taken
Precision(%) Recall(%) FPR(%) Train(s) Test(s) Precision(%) Recall(%) FPR(%) Train(s) Test(s)
ETA 99.1726 99.3013 0.1915 729.26 138.24 79.9473 76.5882 1.2410 306.49 76.66
EncMal2021 FS-Net 99.3515 92.5990 0.2565 45324.10 17741.15 73.0697 72.6062 1.5218 41048.30 1890.69
ST-Graph 99.9805 99.9221 0.0045 5956.45 150.31 93.3669 91.2105 0.4016 1489.41 127.14
ETA 77.8451 74.7129 19.0409 65.31 22.48 28.8382 27.5170 1.8134 139.05 39.19
AndMal2019 FS-Net 72.8997 72.5344 21.4712 23275.86 5112.93 19.4906 16.7309 2.0863 23055.60 774.21
ST-Graph 99.2973 99.6444 0.0170 2113.13 27.19 53.7568 53.4014 1.0180 520.36 66.85
(a) Precision comparison (higher is better) (b) Recall comparison (higher is better) (c) FPR comparison (lower is better)
Figure 5: Comparison between joint testing set and disjoint testing set on EncMal2021.
• For EncMal2021: We divide the collected dataset into three parts: can identify unknown malware families, i.e., generalization. Gen-
1) training set, 2) joint testing set and 3) disjoint testing set, with a ratio eralization is to describe a model’s ability to react to new data.
of 6:2:2. Here, the joint testing set shares the same distribution of In malicious tra�c detection, it is important to know how well a
malware families with the training set; while the malware families trained model will generalize to unseen data. We train all models
in the disjoint testing set do not intersect with those in the training
-
using training set and test them with joint testing set and disjoint
set. And we allocate benign tra�c randomly and keep the ratio of
-
testing set, respectively.
benign tra�c to malicious tra�c at 10:1. The results of the joint testing set are shown in Table 2 (1st row).
• For AndMal2019: We use 60% samples for training and 40% for From the results, we observe ST-Graph approaches with high pre-
testing. Here, we do not further divide the testing set into a joint cision and recall over of 99.99%, and the FPR is even two orders of
or a disjoint one since AndMal2019 only includes a limited number magnitude lower than the other two models. In terms of computa-
of malicious families. tion cost, compared with ETA, a simple feature engineering-based
As for the second task, i.e., malware family classi�cation, we use model, ST-Graph takes more time for training and testing, while
8:2 as the ratio of train-test split for both two datasets. The labels this is much less than the deep learning-based model FS-Net.
for this task are the name of families in Table 1. Figures 5a, 5b and 5c show the detection results of the three
Ethical Concerns. We also consider ethical concerns when col- models for the disjoint testing set, compared with the results on
lecting our dataset. As aforementioned, a portion of our benign the joint testing set. In terms of precision, only our model does not
tra�c comes from border tra�c on the real campus network. To degrade on unseen samples. This is due to the fact that although its
avoid burdening normal network access, we limited our collection recall for malicious communication tra�c drops a little, it avoids
to passive listening. To protect privacy, we only saved TLS tra�c more false positives. As for the recall score, each model decreases to
when capturing tra�c and anonymized the address of the commu- some extent, while our model is still the most e�ective one. FS-Net
nication. In the detection stages, we focus on the destination of can detect most malware tra�c, but it also introduces a high level
the tra�c (domains) and do not look into the payload of any TLS of false positives even when the distribution of benign tra�c data
tra�c. Also, the tra�c is stored in the physical servers to which changes slightly. This may be due to its over�tting, where small
only privileged administrators have access. changes can lead to changes in the results of the model, making it
6.3 Detection Performance unrealistic to be applied to security analysis in practice.
Malware Family Classi�cation. We use only malicious data for
We compare ST-Graph with other baselines on EncMal2021 and
multi-classi�cation to understand ST-Graph’s e�ectiveness for ma-
AndMal2019, respectively.
6.3.1 Evaluation on EncMal2021. We use EncMal2021 to verify the licious family classi�cation. This task is valuable because when
model has good malware detection abilities in a large-scale dataset, malicious communications are detected, we need to analyse their
i.e. it can achieve good binary and multi-classi�cation results. binaries to con�rm the alarm, while the family labels of the alarms
Malware Detection. The purpose of this experiment is to evalu- o�ered by the detection model itself greatly reduce analysis time.
ate the detection e�ectiveness of ST-Graph and to test whether it
Encrypted Malware Tra�ic Detection
via Graph-based Network Analysis RAID 2022, October 26–28, 2022, Limassol, Cyprus
has good detection and generalisation capabilities and that our Table 4: Precision(%) of Real-world evaluation results.
graph embedding algorithm does perform better in our scenario Stage ETA FS-Net ST-Graph
compared to other graph embedding algorithms.
Campus-04/15/2021 8.0 4.0 86.0
Campus-02/25/2022 6.0 2.0 64.0
Table 3: Ablation study’s results of ST-Graph.
Enterprise-04/15/2021 1.0 0.5 80.0
Settings Precision(%) Recall(%) FPR(%) Enterprise-02/25/2022 0.8 0.5 68.0
Original model 99.9805 99.9221 0.0045
No domain feat. 99.9784 99.8983 0.005 performance of the model. The system is able to consistently main-
Features No handshake feat. 99.8522 99.5578 0.0161 tain a high detection accuracy and a low false alarm rate. For the
No statistical feat. 99.5848 99.0915 0.0955 parameters % # , the detection performance tends to get better and
DeepWalk 99.7376 99.4148 0.0184 then worse as the number of random walk paths per edge increases
Embedding
node2vec 99.8891 99.5947 0.0121 and the number of contexts per edge increases.
Attack Phase
www.s**ta.com
Control Phase
s0.2**n.net
lupic.cdn.b**bos.com
ssum-sec.ca**ia.com
image6.p**ic.com
pixel.ru**ct.com
cms.qu**ve.com
odr.mo**1.com
(a) Average length per stream (b) Time interval between streams
spatio-temporal features in our approach play a decisive role in en- • Features of TLS streams utilize the information in the TLS Hand-
hancing the model’s generalizability. On this basis, we next discuss shake process, which is the only plain-text process of TLS interac-
the impact of each attack on ST-Graph. tions. Anderson et al. extract information as representative features
Disguising Attack. Goal of this attack is to disguise malicious traf- for detection, including TLS version, the cipher suites provided by
�c as benign applications. Frolov et al. proposed a tool to modify the client, the TLS extensions used, the server certi�cate, and the
TLS information to mimic other popular TLS implementations by result of negotiation between the two parties [2, 3]. Althouse et al.
changing the �ngerprints extracted from ClientHello and Server- �rst stitched information from TLS handshake to compute JA3/JA3S
Hello [17]. However, this attack only modi�es the TLS-stream in- �ngerprints and detect malware by matching the client/server �n-
formation without changing the network connection relationships. gerprints [1]. However, methods with only these features will be
Therefore, ST-Graph can still detect infected hosts even with this ine�ective under TLS 1.3, with fewer plain-text available informa-
kind of attack. tion for detection.
Obfuscation Attack. This attack is more discreet, which tends to • Features of side-channel exploits information, such as packet
confuse side-channel information-based detection systems. Wang length, packet interval time and packet length frequency. For exam-
et al. [65] change the timestamp, direction and packet size of pack- ple, several works utilized packet length statistics information to
ets to confuse detection based on packet length and time interval. detect malware at the TCP/IP layer [19, 60, 67]. Furthermore, Liao
Similarly, even changing the side-channel information, ST-Graph et al. utilized deep learning techniques to improve the performance
retains the connection relationships, which allows it to still e�ec- of the detection models based on packet length sequences.
tively detect. Since obfuscation attacks will cause a highly dispersed Above methods usually consider a single feature, which is not
distribution of packet lengths and time intervals, which is re�ected robust and can be easily escaped by using evasion strategies.
in higher entropy. As such, it’s simple to defend this attack, i.e., Context-based detection. Encryption results in limited available
calculating the entropy of side-channel information. plain-text information for TCP connections, context-based detec-
To conclude, existing attacks have a limited impact on ST-Graph. tion methods try to use information from multiple protocols [3, 20]
or multi-stream connections to extend the perspective. In fact, in-
formation from DNS could help to construct the relation graph of
8.2 Limitations multiple malware servers [35, 44, 50]. To notice, Oprea et al. detect
Scale of network. We acknowledge that the scale of internal net- malware and APT infections within an organization [46], while
work has an impact on ST-Graph. The increase of internal hosts their method only works on available seeds of known malware.
will cause more nodes and edges in our graph structure. A larger Note that relations between multiple malware and hosts may
number of edges in the graph brings a heavier time cost for the de- change over time, so capturing the dynamic changes is essential in
tection, and therefore an in�nite number of hosts cannot be handled the detection. However, previous graph-based methods are mainly
at the gateway. However, our real-world evaluation is conducted based on static relationship graphs and usually ignore the temporal
on two large representative networks, with bandwidth up to 3.6 characteristics, which limits the e�ectiveness of detection with less
Gbps and covering over 10,000 hosts. This huge-scale experiment information. In our work, we propose ST-Graphto detect malicious
could prove the e�ciency of ST-Graph in reality. tra�cs based on a spatial-temporal graph.
In future work, we would explore more e�cient tra�c feature
representation methods and ease the practical deployment limi- 10 CONCLUSION
tations by deploying clusters or separate inspection by network
In this paper, we propose ST-Graph, an encrypted malicious tra�c
segment for large enterprise network environments with high band-
detection system equipped with a well-designed, novel graph repre-
width. In addition, the features we used rely on the TLS protocol.
sentation learning algorithm. By exploring additional, informative
Although the model can still achieve 99.85% precision (Table 3) after
network attributes and e�ectively integrating multiple features, ST-
removing protocol-related features, it is still necessary to explore
Graph achieves high detection accuracy with a signi�cantly lower
the detection methods for generic encrypted tra�c in the future.
false alarm rate and tolerable computational complexity. Experi-
mental results on both self-collected dataset and benchmark dataset
9 RELATED WORK demonstrate the e�ectiveness and the e�ciency of ST-Graph, com-
pared with state-of-the-art malware tra�c detection systems. In
Traditional network-based malware detection methods such as
addition, ST-Graph shows good performance in both generalization
DPI [47, 54] and HTTP-based methods [41, 45], perform keyword
and robustness perspectives and reveals outstanding e�ciency by
matching on the plain-text payload of each packet. However, tra�c
real-world deployment.
encryption makes these methods no longer e�ective. Besides, lim-
ited available information under encryption makes network-based
detection more challenging. We divide existing encrypted-network-
based detection methods into single-stream-based detection and ACKNOWLEDGMENTS
context-based detection. We are grateful to anonymous reviewers for their constructive
Single-stream-based detection. Single-stream-based detection comments on this work. This work is supported by National Natural
strives to dig any available plain-text information for detection in Science Foundation of China (Grant No. U1836213, U19B2034), the
encrypted tra�c. Two main features are utilized by single-stream- Huawei Technologies Co., Ltd under Grant No. TC20200917004. Jia
based detection: features of TLS stream and features of side-channel. Zhang is the corresponding author ([email protected]).
Encrypted Malware Tra�ic Detection
via Graph-based Network Analysis RAID 2022, October 26–28, 2022, Limassol, Cyprus
[62] Tobias Urban, Dennis Tatang, Thorsten Holz, and Norbert Pohlmann. 2018. To- C SAMPLES FOR MANUAL ANALYSIS
wards understanding privacy implications of adware and potentially unwanted
programs. In ESORICS. Springer, 449–469. Table 6 presents information on the 30 samples analysed manually.
[63] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro
Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint Table 6: Families and Md5 of Malware Samples.
arXiv:1710.10903 (2017).
[64] Binghui Wang and Neil Zhenqiang Gong. 2019. Attacking graph-based classi�ca- Family Related md5
tion via manipulating the graph structure. In CCS. 2023–2040. ab775c62c8d03b33f9d9b60e013d54f5
[65] Junnan Wang, Liu Qixu, Wu Di, Ying Dong, and Xiang Cui. 2021. Crafting
3eeace60ad9f357dc8b77981465381c3
Adversarial Example to Bypass Flow-&ML-based Botnet Detector via RL. In 24th
International Symposium on Research in Attacks, Intrusions and Defenses. 193–204. Adware 3185657ca1707f2364f34fea46afc455
[66] Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge 75e3d279cc20419f2af57975f755614c
graph embedding by translating on hyperplanes. In AAAI, Vol. 28. 04436941856e5a8c3296b55292c47636
[67] Nigel Williams, Sebastian Zander, and Grenville Armitage. 2006. A preliminary
fc92be6f976b598cf65a241c8520cfd6
performance comparison of �ve machine learning algorithms for practical IP
tra�c �ow classi�cation. SIGCOMM 36, 5 (2006), 5–16. f18ac8264e592a4949c5fe09979234be
[68] James Wyke. 2016. The ZeroAccess rootkit. https://nakedsecurity.sophos.com/ e8766491e8c8a9fc79e86197b1a55a76
zeroaccess4/ Accessed March 20, 2022. d5bea7ed4�c27366e218a99d5709987
[69] Zhixing Xu, Sayak Ray, Pramod Subramanyan, and Sharad Malik. 2017. Mal- Wacatac
fe3bea4366ecc34ddbf90291762cba88
ware detection using machine learning based analysis of virtual memory access
patterns. In DATE, 2017. IEEE, 169–174. d63d7bceed0da682db6170c24663b3b0
[70] Simon Conant Yaron Samuel. 2017. Ewind – Adware in Applications’ Cloth- df5ed0925bfb4e141a134bfdf4b2e0ce
ing. https://unit42.paloaltonetworks.com/unit42-ewind-adware-applications- d32e7100988f924a0070418051de053f
clothing/ Accessed January 22, 2022.
[71] Yanfang Ye, Tao Li, Donald A. Adjeroh, and S. Sitharama Iyengar. 2017. A Survey
d31ca0d08a4bc600c51ecd6e891551eb
Minerd
on Malware Detection Using Data Mining Techniques. ACM Comput. Surv. 50, 3 331dfc88f7d056b2667875199eb2d504
(2017), 41:1–41:40. 332265a774e2ad113cbf4d05189d2ee0
[72] Si Zhang, Hanghang Tong, Jiejun Xu, and Ross Maciejewski. 2019. Graph convo- CobaltStrike 363eddcc28509a08c039833a9b6e2a04
lutional networks: a comprehensive review. Computational Social Networks 6, 1
(2019), 1–23. 79a4c854d00928024f9ce3020a041451
[73] Futai Zou, Siyu Zhang, Weixiong Rao, and Ping Yi. 2015. Detecting malware Flystudio 67b4843d49d60e16372160cc10f80cf8
based on DNS graph mining. International Journal of Distributed Sensor Networks gamhak c6752�eb1ebc6bba468a71c36e69c85
11, 10 (2015), 102687.
01372c76417280c2c7a524edd268d5a0
krypyik
ceb684549a97dae140df9e4cef10c308
A MACHINE LEARNING ALGORITHMS hoax 464852e25233ece2e3ed769e727f3ef3
We compared seven common algorithms: Logistic Regression, Sup- Rbot e90fb38bd6c50f517d6d1fc00b445f91
port Vector Machine (SVM), Naive Bayes, Arti�cial Neural Network avaddoncrypt 33874816e3eb31b874e1301fcf73bb72
(ANN), k-nearest neighbours (k-NN), Decision Tree and Random lockscreen 4cf1bf8ebf10d596f7ecbb1c24258eef
Forest ensemble. We used an implementation of Scikit-learn for all graftor 71fb3ed4bf17e328c045a062fbf0895e
19435957f4a3d1380bfcd4c087e40a93
algorithms except ANN, which uses Keras. All also used the default csdi
cc07157af9a75f492748baed0d22a9fe
hyperparameters. The experimental results are shown in Table 5.
NetWorm 97acceb1b93ace58c250901ebd55aadd
Table 5: Classi�cation Result of 7 algorithms
(a) Dimensionality of host vectors ⇡⌘ (b) Random walk path length %! (c) Random walk path nums per edge % #
Figure 12: Parameter sensitivity results.