0% found this document useful (0 votes)
53 views15 pages

22-RAID-Encrypted Malware Traffic Detection Via Graph-Based Network Analysis PDF

Uploaded by

luoxj0116
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views15 pages

22-RAID-Encrypted Malware Traffic Detection Via Graph-Based Network Analysis PDF

Uploaded by

luoxj0116
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Encrypted Malware Tra�ic Detection

via Graph-based Network Analysis


Zhuoqun Fu1 , Mingxuan Liu1 , Yue Qin2 , Jia Zhang1 , Yuan Zou1,3 , Qilei Yin1 , Qi Li1 , Haixin Duan1,4
1
Institute for Network Sciences and Cyberspace, Tsinghua University, Beijing, China
2 Indiana University Bloomington, Bloomington, USA
3 GeekSec Security Group, Beijing, China 4 Qi An Xin Group Corp., Beijing, China
1 {fzq20, liumx18}@mails.tsinghua.edu.cn, {yinqilei, qli01, duanhx}@tsinghua.edu.cn, [email protected]
2 [email protected], 3 [email protected]

ABSTRACT Intrusions and Defenses (RAID 2022), October 26–28, 2022, Limassol, Cyprus.
Malicious activities on the Internet continue to grow in volume ACM, New York, NY, USA, 15 pages. https://doi.org/10.1145/3545948.3545983
and damage, posing a serious risk to society. Malware with remote 1 INTRODUCTION
control capabilities is considered one of the most threatening mali- Malware refers to intrusive software programs developed by cyber-
cious activities, as it can enable arbitrary types of cyber-attacks. As criminals with malicious intentions such as stealing data, corrupting
a countermeasure, many malware detection methods are proposed computers, bringing down servers, and penetrating networks. The
to identify malicious behaviours based on tra�c characteristics. most damaging malware are those with remote control capabil-
However, the emerging encryption and evasion techniques pose ities, which give attackers administrative control of the victim’s
substantial barriers to the full exploitation of network information. computer to enable arbitrary types of cyber-attacks. For example,
This signi�cantly impairs the e�ectiveness of existing malware de- NOPEN used in a massive breach of top-secret data in 2017 has
tection methods relying on a singular type of characteristics. In this such remote control capabilities [7]. The infection pattern of this
paper, we propose ST-Graph to resolve this issue. In addition to tra- type of malware is shown in Figure 1, where the infected host com-
ditional stream attributes, ST-Graph explores spatial and temporal municates with the control server to perform further malicious acts.
characteristics of network behaviours based on a graph represen- Due to the highly concealed and high-risk nature, the detection
tation learning algorithm and integrates all available information of malware has received considerable critical attention from both
to boost the detection decision. To illustrate the e�ectiveness of academia [28, 69, 71] and industry [8, 9, 13].
ST-Graph, we evaluate it on two datasets. Experimental results Real-world malware tra�c detection requires e�cient detection
demonstrate that ST-Graph outperforms state-of-the-art malware over complex network tra�c data with high accuracy. However,
detection systems and also shows good performance in e�ciency, the emergence of encryption strategies and various evasion tech-
generalizability, and robustness. Speci�cally, it achieves over 99% nologies of adversaries poses huge barriers to e�ective malware
precision and recall, and its False Positive Rate is even two orders tra�c detection. More speci�cally, adversaries employ encryption
of magnitude lower than (nearly 0.02 times) that of baseline mod- protocols (i.e. TLS protocol [52]) in the process of malware com-
els. Meanwhile, the deployment of ST-Graph in two real network munications to hide suspicious information. According to [18], in
scenarios for around one year shows an outstanding e�ciency 2021, more than 46% of malware have encrypted their communica-
with only 160 seconds time cost for 5-minute tra�c in 1.7 Gbps tions. The encryption on most tra�c greatly curtails the accessible
bandwidth. information (e.g., URL and HTTP headers in payload, ) that indi-
CCS CONCEPTS cate malicious network behaviour. This impairs the accuracy and
consistency of traditional network-based malware detection, such
• Security and privacy ! Intrusion detection systems. as Deep Packet Inspection (DPI) [47, 54].
KEYWORDS Many previous works have made e�orts to resolve the challenges
Tra�c-based malware detection, encryption tra�c, graph repre- of encryption. For example, to enlarge the set of indicative informa-
sentation tion, [3, 19] summarise the �ne-grained features of TLS tra�c and
ACM Reference Format:
use machine learning methods to identify encrypted tra�c gener-
Zhuoqun Fu, Mingxuan Liu, Yue Qin, Jia Zhang, Yuan Zou, Qilei Yin, Qi Li, ated by known malicious samples. Further, [37, 42] introduce deep
Haixin Duan, . 2022. Encrypted Malware Tra�c Detection via Graph-based learning methods to reduce the dependence on features and human
Network Analysis. In 25th International Symposium on Research in Attacks, experience. However, such detection methods based on features of
singular streams cannot be generalized to unknown samples and
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed may induce a large number of false positives that cannot be inter-
for pro�t or commercial advantage and that copies bear this notice and the full citation preted. To overcome such issues, another trend of methods [34, 35]
on the �rst page. Copyrights for components of this work owned by others than the attempt to reveal the essential characteristics of malware behaviour
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior speci�c permission through the accessing relations between- hosts and domains. How-
and/or a fee. Request permissions from [email protected]. ever, adopting such methods in the real-world network scenario is
RAID 2022, October 26–28, 2022, Limassol, Cyprus not trivial due to e�ciency concerns. More seriously, adversaries
© 2022 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-9704-9/22/10. . . $15.00 have developed various evasion techniques by perturbing stream
https://doi.org/10.1145/3545948.3545983 features, which further reduces the amount of usable information
RAID 2022, October 26–28, 2022, Limassol, Cyprus Fu and Liu, et al.

Victim Attacker Information in TLS handshake. Transport Layer Security (TLS) [52]
is utilized to assure con�dentiality and integrity between two com-
Attack
Malicious intrusion munication entities. For example, HTTPS is the plain text HTTP
Phase Intrusion acknowledge & Delivery malware protocol over TLS. In general, a TLS connection negotiates keys
via a TLS handshake and then uses secure symmetric to encrypt
Install malware communication payloads. Encryption of TLS could prevent any
Request instructions third parties from accessing plain communication payloads, which
Control
Phase Issue instructions attracts an increasing number of applications to deploy TLS. As a
…… coin has two sides, malicious adversaries also deploy TLS to encrypt
Figure 1: Infection pattern of remote control malware. their connections to evade detection by DPI systems [47, 54]. How-
ever, TLS handshake remains in plain-text in order to exchange the
for malware identi�cation. So far, the issue of lacking representative information necessary for encryption. During a TLS handshake,
information for encrypted malware detection has not been well re- two communicating parties authenticate with each other, negoti-
solved. Previous works only use a single type of characteristics such ate encryption algorithms, exchange encryption keys and �nally
as stream features or context information while the information agree on the encryption process based on previously exchanged
loss and compression still exist. -
In this paper, we propose ST-Graph, a multi-stream analysis
information. More speci�cally, the client initially sends a Clien-
tHello message, providing a list of cipher suites and a set of TLS
framework that explores multiple features from spatial and tempo- extensions. The cipher suites are a set of cryptographic algorithms
ral perspectives and integrates all available information for compre- required by TLS. The extensions are the features supported by the
hensive malware tra�c detection under encryption scenarios. More TLS client, where the Server Name Indication (SNI) extension indi-
speci�cally, we design an attribute heterogeneous graph oriented cates the hostname of the target server to which the client is trying
to spatio-temporal tra�c behaviour, which e�ectively captures the to connect. The server then responds with a ServerHello message,
characteristics of associations between nodes in a large-scale net- containing the selected algorithm and identity information. As the
work. Equipped with a graph representation learning method, our information exchanged during the handshake process is critical
method extends the scope of information that can be utilized for to the subsequent encrypted communication, it also provides vital
recognizing encrypted malicious tra�c, which boosts the detection information to detect malicious encrypted tra�c [2].
to achieve higher accuracy and robustness. In the meantime, we Graph Representation Learning. Graph is a data structure for
carefully design our graph representation learning algorithm on representing complex networks and modelling abstract concepts
the basis of random walk [49], which learns spatial and tempo- such as relations between entities. However, traditional graph
ral features for network tra�c with high e�ciency. The learned construction algorithms (e.g. adjacency matrices [53], adjacency
host representations aggregate information from behavioural se- list [10]) and graph analysis algorithms (e.g. search algorithms [59])
quences in the network, which e�ectively highlights the di�erences have di�culty in solving the increasingly complex graph topology
in tra�c between compromised hosts and benign samples with and can incur huge space and time overheads. At the same time,
little information loss. graph representation algorithms [24] allow for more �exible and
Our main contributions are summarised as follows: e�cient analysis on large-scale graphs, with the goal of optimiz-
• We propose ST-Graph, a real-time malicious tra�c detection ing a set of vectors that numerically represent node information
framework under an encryption scenario. By exploring and inte- as well as graph structure. Among such methods, graph embed-
grating multiple features, ST-Graph e�ectively reveals malicious ding [21] is a representative approach, which transforms nodes
behaviours within an encrypted network, which enables low false on graphs into vectors. A graph could be approximated by several
alarm rate detection. node lists, which is similar to the word sequences in a piece of
• We design a heterogeneous attribute graph for encrypted tra�c text. Therefore, word2vec [39], a text embedding method, could be
and propose a novel embedding method, namely interval-inclined adopted for graph embedding e�ectively [49]. First, a list of nodes
can be regarded as a “corpus” of a graph, which can be obtained by
-

random walk, for exploring and incorporating spatial and temporal


random walks [16]. Based on the node list, the vectorized represen-
-

characteristics of the tra�c data.


• We evaluate our detection system in several real network scenar- tations of nodes can be optimized using a skip-gram network [40],
ios for up to a year and observe good results. Compared to other by maximizing the likelihood probability of each node’s context.
works, our detection model results in higher accuracy (nearly 10 In this paper, we extend this method to learn edge representations
times that of baselines), and signi�cantly lower false positives with from the prioritized graph topology constructed from the complex,
a tolerable time cost. real-world network tra�c data.
• By real-world deployment, our detection system �nds some mali-
3 PROBLEM STATEMENT
cious cases that cannot be discovered by other systems and reveals
some emerging malicious tra�c types. Our goal is to detect infected hosts within an organization by ob-
serving host tra�c behaviours. Generally, gateway refers to the
2 BACKGROUND network boundary to distinguish between internal and external
In this section, we brie�y overview the available information for networks, such as campus networks and enterprise networks. Due
malicious tra�c detection under encryption and summarise the to the gateway’s global perspective of the tra�c through it, we posi-
·

knowledge about graph representation learning. tion our detection system, ST-Graph, at the gateway to monitor the
Encrypted Malware Tra�ic Detection
via Graph-based Network Analysis RAID 2022, October 26–28, 2022, Limassol, Cyprus

External
ST-Graph section, we illustrate the key observations of malware infections,
Detect
explain the design of the system, and describe the work�ow of
Internal
ST-Graph.
4.1 Key Observations

Malicious level
… Gateway

Figure 2: Threat model of ST-Graph. Attack
Control
Server H1 H2 H3
whole tra�c, as shown in Figure 2. Speci�cally, ST-Graph listens
to encrypted network tra�c generated by internal hosts accessing Figure 3: Schematic representation of host connectivity in
external servers at the gateway and detects infected hosts with a network environment with malware infections. Colour
suspicious communications in a real-time manner. In this work, of circle represents the maliciousness of the destination
we only focus on the standardized TLS protocol encrypted tra�c. server. For example, white circles represent normal desti-
Since it’s the mainstream encryption protocol with the most usage nation servers, like www.google.com, grey circles represent
rate for malware, due to its ease of deployment. To be noticed, our middle-malicious servers, like phishing websites, and black
detection system only captures the tra�c without manipulating circles represent malicious servers, like C&C servers.
them and thus will not a�ect the benign forwarding tra�c.
However, such detection demand poses huge challenges mainly Since remote control malware has the advantage of low attack
from two perspectives: 1) curtailed available information limits cost and the capability of widespread, it has been widely taken by
the e�ectiveness of the detection, and 2) the highly comprehen- adversaries in more than half of malware attacks [30]. To detect
sive network connections hinder the e�ciency of the detection. remote control malware, we �rst perform an empirical study to
More speci�cally, the �rst concern results from recent encryption condense the key features. Based on previous work that found that
and adversaries’ evasion technologies, which greatly curtail the some malware has similarities in attack targets and attack meth-
amount of information that can be utilized for detection. In en- ods [29, 46, 57], we perform a manual analysis of remote control
cryption scenarios, the invisibility of payloads of communication malware from two aspects: code and generated tra�c. First, we
prevents traditional detection methods, e.g. DPI methods [47, 54], randomly sample 30 malware as our ground truth and run them on
from making accurate predictions. More seriously, adversaries have several operating systems under our control. Note that the entire
evolved several techniques to evade existing single-stream-based experiment environment is enclosed and will not a�ect any third
detection methods [17, 61, 65]. In obfuscation-based evasion, adver- parties. From the code aspect, we reverse their code and analyse
saries change frequency of accesses from a host to command and the code logic and hard-coded content. We �nd that malware in the
control (C&C) servers [26], which blurs the suspicious property same family prefers to share a similar software framework, which is
of the host to evade detection based on packet length and time the root cause of the similar tra�c behaviour. The reason for code
interval [3, 19, 45]. In disguising-based evasion, malware takes the sharing and reuse, as we speculate, is likely to be saving costs. And
form of benign software connections, disguises that its tra�c is malware is generally not up to date and still accepts low version
generated by benign software [17], visits benign third-party web- TLS connections. Besides, developers of malware often use default
sites initially as a front [68], or reduces the number of websites �xed parameters, especially the time interval. For example, we �nd
visited [38] to avoid context-based detection [50, 73]. The suspi- that some malware usually set the time interval to connect to the
cious behaviours hide behind millions of legitimate requests, which control servers to 60 seconds.
poses substantial barriers to e�ective detection and introduces a Further, to explore the tra�c behaviour, we try to decrypt the
high level of false positives [11, 17]. Therefore, encryption and eva- tra�c generated by ground truth malware. With the master key
sion behaviours prompt us to explore more available information extracted from controlled operating systems, we could analyse
for e�ective detection. the plain communication between malware and servers. The simi-
As for the second concern, the detection over network inter- larity of code within the same family leads to a concentration of
actions confronts a large scale of network throughput, resulting behavioural characteristics in their tra�c behaviour. Due to a clear,
from the increasing number of network users and the complex joint attack purpose, i.e. remote control of the infected hosts, there
network connectivity. Meanwhile, the demand of exploring addi- exists a relatively �xed tra�c pattern in remote control attack. As
tional, multi-faceted features for e�ective detection also intensi�es shown in Figure 1, the communication process can be summarised
the computational complexity. Therefore, we need to design new into two main phases: the attack phase (solid lines) and the control
detection technologies that can explore and process more network phase (dashed lines). Initially, the attack phase is to deliver malware
features over comprehensive network interactions with tolerable to victim’s machine by spoo�ng websites (grey circles) or exploiting
computational complexity. vulnerabilities to install malware, which can help adversaries gain
control of the host. For example, for H2 in Figure 3, the attacker
4 SYSTEM OVERVIEW commits a fraudulent activity by sending phishing emails in the
In this paper, we propose ST-Graph, a multi-stream analysis frame- attack phase and lures victim to click the embedded link, which
work equipped with a novel graph embedding algorithm. In this may cause the redirection from phishing website (grey circle) to
RAID 2022, October 26–28, 2022, Limassol, Cyprus Fu and Liu, et al.

Encrypted Traffic temporal (i.e., connection order and time-related information) char-
1 Spatio-Temporal Graph Representor acteristics of hosts’ network behaviour into host representations
Traffic Preprocessor
Graph Edge Host to boost the e�ectiveness of detection decisions. To achieve higher
Flow Features
2
Graph Representor
Construction Embedding Representation
e�ciency, we improve the algorithm of graph representation by
only optimising edge representation with iterative updates, while
3
Detector
the optimal node representations are derived from closed-form so-
Host Features lutions. This signi�cantly reduces the computational complexity of
Malware Infection Host Server
graph representation learning.

Figure 4: Overview of ST-Graph.


4.3 Work�ow
As shown in Figure 4, our detection system consists of three com-
malware download site (black circle). Afterwards, in the control
ponents: data preprocessor, graph representor, and detector. Below
phase, infected hosts would connect to the control server of adver-
we elaborate on the functionality of each component.
saries to receive and then execute instructions [46]. We �nd that
Tra�c Preprocessor. Raw tra�c is usually captured as packets in
most malware prefers to connect to di�erent destination servers in
the Pcap format. In this original data format, many communication
di�erent phases, which construct a connection order. We speculate
details are presented in a fragmented manner, making it di�cult
the reason may be to evade the detection. Interestingly, the forged
to be utilized for detection. Therefore, we construct a Tra�c Pre-
innocent network connectivity test of malware is common, which
processor to recover the complete communication while removing
is preparation before connecting to control servers, e.g., querying ? the extra meta-information from the tra�c data. To begin with, we
the host’s public IPs or downloading servers’ certi�cates. For exam- -

apply stream reassembly to recover the entire end-to-end commu-


ple, in Figure 3, dashed lines connected to white circles represent -

nication. To do this, we extract a 5-tuple (i.e., <source IP, source


the network connectivity test behaviour. Speci�cally, the infected
port, destination IP, destination port, protocol>) from each original
hosts may actively connect to external servers, such as third-party
packet, and integrate the packets associated with the same 5-tuple
servers for network inspection. -

as a stream. Afterwards, we reserve the tra�c of the TLS protocol


Above, we summarise the two main phases of remote control
with a complete TLS Handshake and extract the key information
-

malware from code and tra�c analysis, i.e., attack phase and control
for each stream. More speci�cally, we extract the contents of the
phase. Besides, we �nd an exclusive observation, i.e., the forged
TLS handshake in each reserved TLS stream, including the TLS ver-
innocent network connection test, which gives us a unique per-
sion and the supported cipher suites, and calculate some statistical
spective on detection. Our observations not only extend existing
information such as the number of packets and the bytes within a
understanding of malware behaviour but also deliver distinctive
stream. Such statistical information extracted from the data stream
�ndings. Below, we summarise the common characteristics of mal-
remains an important element of malware tra�c detection and is
ware infections from the following aspects:
used later in the computation of the graph representation.
• Spatial Feature refers to the concentration property between hosts
Graph Representor. We build a heterogeneous graph to represent
and target servers, especially within families. As mentioned before,
the end-to-end communications among hosts and servers by their
due to framework reuse, some malware within the same family
temporal and spatial features. Speci�cally, the temporal features are
exhibits high similarity, re�ected by the destination servers they try
the tra�c characteristics we extract from the streams, which are
to connect with. For example in Figure 3, H2 and H3 are connected
embedded in the stream representations; and the spatial features
to the same destination servers in the control phase.
are re�ected by the graph structure. Based on this, we design a
• Temporal Feature of the connection. The connection order of a
node embedding algorithm to transfer the spatio-temporal features
host over a period of time can help us rebuild the infection process
into host representations, which re�ect the similarity of the host’s
of malware, which can clearly show the change with phase. Besides,
access behaviour. We will elaborate on this module in §5.
due to the setting of �xed parameters, communications between
Detector. Given the embedding vector of each host that numeri-
infected hosts and control servers are regular, especially the packet
cally quanti�es the host’s behavioural characteristics, we apply a
length and packet interval time [26].
machine learning approach to calculate the likelihood of the host
4.2 System Design being infected. Considering the ability to resist over�tting [12]
To achieve e�ectiveness and e�ciency, the key point of our de- and the better performance in comparative experiments (see Ap-
tection system, ST-Graph, is to retrieve as much information as pendix A), we �nally employ the random forest (RF) regression
possible with tolerable computation complexity. According to the algorithm [12] as our detector. It is an ensemble learning method
observations of our empirical study (See §4.1), we summarise two that makes predictions by averaging the output of multiple deci-
categories of general and distinguishable features. First, spatial sion trees. With the predicted infection value of each host, the
feature re�ects the property of network connection relationships detector outputs a list of suspicious hosts along with their access
for each host. In addition, temporal feature shows the side-channel information.
information of communications. These two kinds of features jointly 5 ST-GRAPH: SPATIO-TEMPORAL GRAPH
depict the network communication behaviour of hosts in a compre- In this section, we present the design details of ST-Graph, elaborate
hensive way. on the process of graph construction and the optimization of edge
Based on the observed features, we design ST-Graph, which in- representation, and explain how we propagate spatial and temporal
corporates the spatial (i.e., accesses between hosts and servers) and information in a lossless way to represent hosts for malware tra�c
Encrypted Malware Tra�ic Detection
via Graph-based Network Analysis RAID 2022, October 26–28, 2022, Limassol, Cyprus

detection. The key of the approach is to exploit the similarities in as the number, the character ratio, and the vowel or consonant
-

the network behaviour of di�erent applications. To do this, we �rst letter ratio for identi�cation. We also extract domain name length
- -

build a heterogeneous graph to correlate all network connections features to cope with lexicon-based DGA, which selects words from
-

I-

proprietary dictionaries for combination to reduce randomness. Il


-

between hosts and servers. Based on the graph structure, we apply


2. Random Walk [16] to generate a list of corresponding connections In addition, we include the side channel information as our third
for each stream and use a probabilistic model to optimize the edge type of attributes for they provide indirect signals to our detection.
-

representations. Each stream is represented by its edge representa-


-
Although the packet length and the arrival time do not provide
-

tion and tra�c characteristics. Finally,3


.we represent each host by insight into the content of the connection, they can help to infer
-

its all accesses generated in a sequential order, which integrates network behaviour. For example, the large number of heartbeat
-

the tra�c characteristics and network structure associated with packets used to maintain a connection can result in small packets
-

the host. Compared with Graph Neural Network-based methods per stream [3]. So we calculate some statistical information such
that train thousands of parameters [72], and Knowledge Graph as the number, the length, or the time interval of packets sent and
Embedding-based methods that jointly optimize the representation received within a stream.
of all nodes and edges [36, 66], ST-Graph learns edge embeddings 5.2 Edge Embedding
based on random-walk with a small number of iterations and op-
Edge embeddings are numerical representations of streams. In this
timizes host embeddings with a closed-form solution. This highly
paper, we integrate two categories of features into edge embeddings:
reduces the computational complexity to meet the need for real-
1) stream attributes ( for preserving original tra�c information and
time detection. -

与训练数千个参数的基于图神经⽹络的⽅法[72]和联合优化所有节点和边的表示的基于知识图嵌⼊的⽅法[36, 66]相⽐,ST-Graph基于随机学习边 2) spatio-temporal embeddings for re�ecting comprehensive and


-

dynamic network behaviours. We consider the network interactions


缘嵌⼊ 进⾏少量迭代,并使⽤封闭式解决⽅案优化主机嵌⼊。
5.1 Graph Construction
as the spatial features generated from graph topology, while the
-

Graph Topology. We design a host-server bipartite graph ⌧ =


temporal features refer to the sequential order of the servers being
( , ⇡, ⇢, (, ) to capture the interactions between internal hosts
visited by the host through the stream. To integrate spatial and
and external nodes. We denote the internal hosts as a vertex set
temporal features, we propose an Interval-inclined Random Walk ' :
, each represented by its IP as the unique identi�er; Similarly,
4 ! A4 thatCtakes into account both network interactions and access
the external server destinations are represented by a vertex set ⇡, -

orders to generate Spatio-temporal embeddings.>Our model extends


each initialized by the server’s domain name or its IP address. The
·

the traditional random walk method [49] which traverses the graph
-

domain can be obtained from SNI in ClientHello, and the server’s -

along the edges and maximizes the similarity of all edges within
IP address is used as a substitute when the domain is not available. - -

several traverses. The main reason for this design choice is that
We use an edge set ⇢ = {4 |4 = (⌘4 , 34 ), ⌘4 2 , 34 2 ⇡ } to denote
other graph-based models for edge representation learning, such as
-

all connections (i.e., streams) between hosts and servers. If a host


Knowledge Graph Embedding methods [36, 66] and Graph Neural
⌘ has a TLS handshake with a server 3, then an edge 4 = (⌘, 3)
Networks [25, 63], su�er from high computational complexity and
connecting these two nodes will be added to ⇢. = {84 |4 2 ⇢}
cannot serve our demand for e�cient, real-time detection. In this
denotes the temporal feature of each edge, where 84 represents the
paper, we incorporate the temporal feature, i.e., the access order,
order of 4 connected by its host ⌘4 . ( = {B4 |4 2 ⇢} denotes the
into the context generating process proposed by node2vec [22],
attributes extracted from the streams as described below.
where the random walk strategy balances the breadth-�rst sampling
Stream Attributes. The stream attributes ( characterize the fea-
(BFS) and depth-�rst sampling (DFS). In general, from a starting
·

tures of every single stream. To explore more detailed features, we -

edge >, we perform random walks with at most %! steps for % #


extend previous works [3, 55] with TLS handshake features such
times to collect -a set of edges N> . Further, we optimize the Spatio-
- -

as orders of cipher suits. Speci�cally, we exploit stream attributes


temporal embedding of > by maximizing the similarity of edges in
-

(details in Appendix E) from three aspects: 1) the TLS handshake


-
N>C. Below we elaborate on our augmented random-walk strategy,
features extracted from internal hosts, 2) the domain name features
-

-
which decides the edge to take for the next step based on previous
extracted from external nodes, and 3) the side channel statistics
-

-
wanderings.
features of the stream. -

For an edge 4, we de�ne its neighbours as a collection of edges


For the �rst type of attributes, we observe that malware and that share the same host or the server with 4, formally ⇠4 = {D :
-
-

C&C servers are not updated promptly, so they usually accept (⌘D , 3D )|D 2 ⇢, ⌘4 = ⌘D _ 34 = 3D }. Let F = [F 0, F 1, ..., F %! ] be
-

lower TLS versions and maintain support capabilities for weak


- -

a certain random walk. We de�ne the connection order distance 3


-

encryption algorithms. The underlying reasons are that attackers between two edges D, E as 3 (D, E) = |8D 8 E |, where 84 is the order
-

are relatively less concerned about whether communications will of edge 4 being connected by its host ⌘4 (See §5.1). Assume the
- -

be decrypted and that vulnerabilities in lower versions of operating random walker starts from edge >, we de�ne the probability of
selecting a certain edge G 2 ⇠> from the neighbours of > for the
-

systems are easier to exploit. According to this observation, we


next walk as
-

1
extract features from the TLS ClientHello message, including the
Pr(F1 = G |F0 = >) = Õ
-
3 (>,G )
TLS version, the list of o�ered cipher suites and extensions as 1
.
~2⇠> 3 (~,> )
TLS handshake features. Such features can provide information
about the encryption algorithm supported by the client.II For the Here, our random walker inclines to select the edge having the
second type of attributes, our feature extraction is mainly based minimum connection order distance with the current walk as the
-

on the fact that many malicious domains are computed by Domain


-
next step. By doing this, streams that are generated in a small time
- -

Generation Algorithms (DGA). Hence, we extract features such interval are tended to be consecutive steps in the random walk.
在这⾥,我们的随机游⾛者更倾向于选择与当前步骤连接次序最接近的边作为下⼀步。这样做的⽬的是为了让⽣成在短时间内的流量在
3
-

随机游⾛中成为连续的步骤。

--elu - ⑳

S -
du h U
, it

S
Xu X
,
=
x = u

Xn , x = 1 ,
if X Cu
Qux =

g , otherwise
.

RAID 2022, October 26–28, 2022, Limassol, Cyprus Fu and Liu, et al.

For the following walks, each selection is based on the two which are not informative for characterizing the host’s behaviour
previous walks. Assume the last step is at edge D and the current step while taking a large proportion of tra�c data. Hence, we rank
takes edge E, the walker selects the next step from E’s neighbours the streams associated with the host according to the importance
-
-

G 2 ⇠ E with of the server visited through the stream before the information
- -

U · 1 D,G
propagation. Such importance is with respect to 1) the frequency
-

= D) = Õ
3 (E,G )
Pr(F8+1 = G |F8 = E, F8 1 1
,
C 2⇠ E UD,C · 3 (E,C ) of the server being visited by all hosts and 2) the order (84 : 4 2
where the value of UD,G is determined by the relation between G ⇢, ⌘4 = C ^ 34 = D) of the host C visiting the server D. Intuitively, if
a server D is visited by most hosts, it is likely to be visited by the
and D. When G = D, UD,G takes the value of ?1 ; when G is one of the
-

- -

target host C, too. Also, if C visits D at a very early stage, then D is an


neighbours of D, UD,G is set as 1; otherwise UD,G = @1 . Here, ? and @
-

important access destination of C, compared with the other servers. ?


-

Hence, we de�ne the importance of edge 4 to its host C as


-

are hyper-parameters.IlWhen edge G is overlapped with edge D, we


-

-
- - -
use a constant ? to control the probability of returning from E to the
- d (4) =
| {4 0 |4 0 2 ⇢ ^ 34 0 = D } |
,
starting point D. Otherwise, if there are one or more edges between 84 · |⇢ |
edge G and edge D, we apply a constant @ to control the probability
-

where |{4 0 }| is the number of edges ending at D, |⇢| is the total


number of edges, and 84 is the order of edge 4 being connected by C.
-

of E going to a new node. If @ < 1, the walker tends to visit global


-

nodes (DFS); if @ > 1, the walker tends to visit local nodes (BFS), Further, we model the correlation score between0 host C and edge 4
which enhances the coverage of the surrounding neighbours. as a weighted sum over the importance of 4 to C and the similarity
According to this strategy, for each edge 4, we generate a net- between the host representation 6C and edge embedding 54 :
exp (6C · 54 )
- -

work neighbourhood N4 by % # times of random walks. The edges corr(C, 4) = _d (4) + (1 _) ,


in N4 are highly correlated with edge 4 from both spatial and tem- - -

Õ
/
poral perspectives. Hence, to incorporate the spatial and temporal where _ is a scalar hyperparameter and / = 8 2⇢ exp (6C · 58 ) is a
features of the graph into edge embeddings, we set up a vector normalization factor. Let LC = {4 |4 2 ⇢ ^ ⌘4 = C } denote all edges
A4 for each edge, and optimize the vector by the proximity within proceeding from host C. The joint correlation score of all edges in
-
-

÷ 
its network neighbourhood N4 . More speci�cally, we model the LC is de�ned as:÷
-

plausibility that an edge = is correlated with edge D as exp(6C · 54 )


C���( LC ) = corr(C, 4) = _? (4) + (1 _) ,
exp(AD · A= ) /
Pr(= |D) = Õ
=
. (1) 42LC 42LC
E2⇢ exp(AD · A E ) and its logarithmic form can be approximated by the �rst-order
We assume the neighbourhood relation between di�erent edge pairs Taylor Expansion as 
is independent, and further de�ne the neighbourhood likelihood of ’ exp(6C · 54 )
edge D as ÷ log [C���( LC ) ] = log _d (4) + (1 _)
Pr( N |D) = Pr(= |D). (2) /
D 42LC
2’ ✓ ◆ 3
6 7
=2ND
⇡ 66 · 54 77 · 6C ,
We optimize the neighbourhood likelihood of all edges to pursuing b
log (_ [d (4) + b ]) +
the global proximity: 642LC d (4) + b 7
÷ 4 5
max Pr( ND |D) where b = 1_/_ is a constant. We restrict the host representation 6C
AD
D2⇢ Õ ⇣ ⌘
’ ’ exp(AD · A= )
b
as a unit vector. Let "C = 4 2 LC 54 · log (_[d (4) + b]) + d (4)+b .
= max log Õ
AD
D2⇢ = 2 #D E2⇢ exp(A E · AD ) (3) To maximize log[C���(LC )], 6C should be a vector of length 1 hav-
ing the same direction with "C . Therefore, we have the optimized
’ 26 ’ 3
7
6 | ND | · log /D + 7 host representation as 6C⇤ = |" |.
"C
= max 6 A · A =7
6 7
D C
AD
D2⇢ 4 5
Õ 6 EXPERIMENTAL EVALUATION
=2ND
where /D = E 2⇢ exp(A E · AD ) is a normalization factor.
We use negative sampling to obtain an approximation of /D In this section, we present experimental evaluations of our detection
and optimize the objective function (3) using a stochastic gradient system. We evaluate ST-Graph on two tasks: malware detection and
ascent algorithm to update the Spatio-temporal embeddings of malware family classi�cation. The �rst task distinguishes hosts with
edges. Finally, we represent edge 4 by concatenating its stream malicious network behaviour from benign ones, while the second
feature B4 and its spatio-temporal embedding A4 : 54 = q ([B4 ||A4 ]), task further speci�es the family of malware. A malware family is a
where q (G) = |GG | is a normalization function. group of associated programs with similar attack techniques, some
of which have “code overlap” [15] to a large extent. Grouping them
5.3 Host Representation as a family broadens the scope of a single piece of malware as it
To detect infected hosts through tra�c behaviours, we propagate alters over time while reserving distinct family traits.
the information of the streams associated with each host to nu-
6.1 Experimental Setup
merically represent the hosts. Formally, for host ⌘, we have its
representation 6⌘ as 6⌘ = P��������({A4 |4 2 ⇢ ^ ⌘4 = ⌘}) to Implementation. We present the tools used in each component
represent the host’s sequential access behaviour in a short period. of our detection system (See §4.3). For tra�c preprocessor, we em-
Below we explain the details of the information propagation from ploy TShark (version 3.2.5) [32] to parse the fragmented packets
streams to hosts. and then recover the complete communication. To implement the
For a target host C, assume D is a server being visited by C through graph representor, we use NetworkX (version 2.5.1) [23] to build
the stream 4 = (C, D). Among network tra�cs, each host may access the heterogeneous graph and initialize nodes with text represen-
many public services (e.g., windows update service, etc.), some of tation by Gensim (version 4.1.2) [51]. Finally, we use scikit-learn
Encrypted Malware Tra�ic Detection
via Graph-based Network Analysis RAID 2022, October 26–28, 2022, Limassol, Cyprus

(version 0.23.2) [48] for both singular value decomposition in host Table 1: Malware Families
representation and the classi�cation algorithms. Family Name # Samples Family Name # Samples
Baselines. We compare our ST-Graph with two state-of-the-art Minerd 5717 Unwanted 1610
malicious tra�c detection methods (ETA and FS-Net), which both Cryxos 4652 Faceliker 1564
perform detection in stream-level detection. The computational PhishingSite 3080 Trojandownloader 1528
complexity of Graph Neural Network-based methods [36, 66] is Wacatac 2949 Brocoiner 1482
too heavy to meet e�cient detection needs at all. Therefore, we hidelink 2860 Sality 1046
Kryptik 2403 Zbot 1035
exclude these methods as our comparative methods. The details of
Redirector 2190 RelevantKnowledge 771
the baselines are as follows. Generickdz 1974 Scrinject 756
• ETA [2] adopts the random forest model to detect malware traf- Installcore 1807 Ramnit 719
�c. Speci�cally, the ETA utilizes the stream features, including Iframe 1731 Others 13237
TLS handshake metadata, DNS contextual streams linked to the en-
crypted stream, and the HTTP headers of HTTP contextual streams
from the same source IP address within a 5-minute window. each malware family. Each example runs for 5 minutes on average,
• FS-Net [37] is an end-to-end deep learning model, which takes the during which all generated tra�c is recorded and saved. Since most
multi-layer encoder-decoder structure to mine the potential sequen- of the behaviours of the samples in the sandbox can be observed in
tial characteristics of streams. In summary, it learns representative the �rst two minutes [31], we consider �ve minutes to be a su�-
features from raw streams and then classi�es them. cient time interval to observe almost all valid behaviours. In total,
We set hyperparameters of baselines to either the values adopted we end up capturing 239,007 streams from 53,111 samples.
by their authors or default values. Speci�cally, for FS-Net, we set the • Benign Tra�c: We construct benign tra�c from two data sources,
dimension of hidden state as 128, layer number as 2 and dimension one is real tra�c captured in a campus network and the other is the
of length embedding as 16. For the other parameters involved, such tra�c generated by benign samples running in the sandbox. Most
as the parameters in random forest, we apply the default settings samples in the benign dataset are collected from a large campus
in scikit-learn (version 0.23.2). network with nearly 10,000 active hosts. We passively monitor all
Environment and Parameters. We conduct the methods on a inbound and outbound encrypted tra�c at di�erent time points for
Supermicro server with two Intel Xeon E5-2690 CPUs (2× 14 cores), 5 consecutive months from January 2021 to April 2021 and save
Centos 7.9.2009, 345G memory. For hyper-parameters, we set the the raw pcap packets. Due to the possibility of malware-infected
dimension of the host vectors ⇡⌘ as 256, the length of random walk hosts in the campus network, we cannot directly use all the tra�c
path %! as 100 and the number of paths per edge % # as 10. And as benign tra�c. We chose Alexa ranking as a �ltering criterion,
we set the parameter ? controlling the probability of returning as 1 keeping domains with the top 1 million tra�c rankings. The reason
and the parameter @ controlling the probability of exploring new for this choice is that websites with more than 1 million rankings on
nodes as 2 for better coverage of the surrounding neighbours. Alexa can be considered as “Long Tail”, thus this �ltering guarantees
Metrics. We use the following metrics to evaluate the detection the diversity of the sample and also ensures the tra�c is benign to
performance: (i) precision, (ii) recall and (iii) false-positive rates a certain extent. Since the crawled tra�c may contain some private
(FPR). (See Appendix B) information, we also need to anonymize the IPs. The same host is
treated as the same fake IP to ensure the integrity of the network
6.2 Dataset relationship. On the other hand, in order to avoid large bias in the
We conduct experiments on one public dataset (CICInvesAndMal- domain distribution between benign and malicious tra�c due to the
2019, as AndMal2019 [58]) and one dataset collected by ourselves di�erent collection methods, we also collect the tra�c generated by
(EncMal2021). AndMal2019 dataset includes tra�c and device logs benign samples running in the same sandbox environment. Such
generated by 5065 benign and 426 malicious Android apps on real samples are partly collected from the top 100 of the Microsoft
smart devices. And the malicious apps can be divided into 39 fami- Store’s top free list and partly collected from the programs �agged
lies. Our collected dataset EncMal2021 is constructed by capturing as benign by the malware analysis site. Among all the domains
the tra�c generated by the example and the campus network tra�c. of the tra�c generated by the benign samples, 24% are not in the
It contains 108,847 hosts with 5,202,093 streams, 4.5% of which are Alexa Top 1 million rankings. We believe this mitigates the bias
marked as malicious, and the others are marked as benign. Below associated with �ltering tra�c while keeping the tra�c benign.
we elaborate on the data collection process of EncMal2021. Finally, we capture 4,940,593 streams in the existing network by
Data Collection. EncMal2021 consists of malicious and benign 53,281 hosts and 22,493 streams in the sandbox using 2,455 samples.
tra�c data. Dataset statement. In EncMal2021, the malicious samples are run
• Malicious Tra�c: We use malware analysis sandboxes to col- in sandboxes, which makes the distribution of malicious tra�c
lect the data. The sandboxes, including Windows 7 and Windows not exactly consistent with the actual situation. However, we com-
10 operating systems, allow users to submit malicious executable pensate for this by enriching the malware types and the sandbox
examples and control the runtime based on the execution of the environment, so that the constructed dataset is comprehensive as
executable �les. Malicious samples come from malware analysis possible.
website VirusTotal and large security companies we work with, Train-test Split. For the �rst task, i.e., malware detection, we
with millions of sample updates per day. Table1 shows the names perform di�erent train-test split strategies on the two datasets, for
of malware families and the number of examples we sample from the capacities of these datasets are di�erent.
RAID 2022, October 26–28, 2022, Limassol, Cyprus Fu and Liu, et al.

Table 2: Detection Performance of ST-Graph and Baselines. In EncMal2021, use joint testing set as the test set.
Malware Detection Malware Family
Dataset Method Performance Time Taken Performance Time Taken
Precision(%) Recall(%) FPR(%) Train(s) Test(s) Precision(%) Recall(%) FPR(%) Train(s) Test(s)
ETA 99.1726 99.3013 0.1915 729.26 138.24 79.9473 76.5882 1.2410 306.49 76.66
EncMal2021 FS-Net 99.3515 92.5990 0.2565 45324.10 17741.15 73.0697 72.6062 1.5218 41048.30 1890.69
ST-Graph 99.9805 99.9221 0.0045 5956.45 150.31 93.3669 91.2105 0.4016 1489.41 127.14
ETA 77.8451 74.7129 19.0409 65.31 22.48 28.8382 27.5170 1.8134 139.05 39.19
AndMal2019 FS-Net 72.8997 72.5344 21.4712 23275.86 5112.93 19.4906 16.7309 2.0863 23055.60 774.21
ST-Graph 99.2973 99.6444 0.0170 2113.13 27.19 53.7568 53.4014 1.0180 520.36 66.85

(a) Precision comparison (higher is better) (b) Recall comparison (higher is better) (c) FPR comparison (lower is better)

Figure 5: Comparison between joint testing set and disjoint testing set on EncMal2021.

• For EncMal2021: We divide the collected dataset into three parts: can identify unknown malware families, i.e., generalization. Gen-
1) training set, 2) joint testing set and 3) disjoint testing set, with a ratio eralization is to describe a model’s ability to react to new data.
of 6:2:2. Here, the joint testing set shares the same distribution of In malicious tra�c detection, it is important to know how well a
malware families with the training set; while the malware families trained model will generalize to unseen data. We train all models
in the disjoint testing set do not intersect with those in the training
-
using training set and test them with joint testing set and disjoint
set. And we allocate benign tra�c randomly and keep the ratio of
-
testing set, respectively.
benign tra�c to malicious tra�c at 10:1. The results of the joint testing set are shown in Table 2 (1st row).
• For AndMal2019: We use 60% samples for training and 40% for From the results, we observe ST-Graph approaches with high pre-
testing. Here, we do not further divide the testing set into a joint cision and recall over of 99.99%, and the FPR is even two orders of
or a disjoint one since AndMal2019 only includes a limited number magnitude lower than the other two models. In terms of computa-
of malicious families. tion cost, compared with ETA, a simple feature engineering-based
As for the second task, i.e., malware family classi�cation, we use model, ST-Graph takes more time for training and testing, while
8:2 as the ratio of train-test split for both two datasets. The labels this is much less than the deep learning-based model FS-Net.
for this task are the name of families in Table 1. Figures 5a, 5b and 5c show the detection results of the three
Ethical Concerns. We also consider ethical concerns when col- models for the disjoint testing set, compared with the results on
lecting our dataset. As aforementioned, a portion of our benign the joint testing set. In terms of precision, only our model does not
tra�c comes from border tra�c on the real campus network. To degrade on unseen samples. This is due to the fact that although its
avoid burdening normal network access, we limited our collection recall for malicious communication tra�c drops a little, it avoids
to passive listening. To protect privacy, we only saved TLS tra�c more false positives. As for the recall score, each model decreases to
when capturing tra�c and anonymized the address of the commu- some extent, while our model is still the most e�ective one. FS-Net
nication. In the detection stages, we focus on the destination of can detect most malware tra�c, but it also introduces a high level
the tra�c (domains) and do not look into the payload of any TLS of false positives even when the distribution of benign tra�c data
tra�c. Also, the tra�c is stored in the physical servers to which changes slightly. This may be due to its over�tting, where small
only privileged administrators have access. changes can lead to changes in the results of the model, making it
6.3 Detection Performance unrealistic to be applied to security analysis in practice.
Malware Family Classi�cation. We use only malicious data for
We compare ST-Graph with other baselines on EncMal2021 and
multi-classi�cation to understand ST-Graph’s e�ectiveness for ma-
AndMal2019, respectively.
6.3.1 Evaluation on EncMal2021. We use EncMal2021 to verify the licious family classi�cation. This task is valuable because when
model has good malware detection abilities in a large-scale dataset, malicious communications are detected, we need to analyse their
i.e. it can achieve good binary and multi-classi�cation results. binaries to con�rm the alarm, while the family labels of the alarms
Malware Detection. The purpose of this experiment is to evalu- o�ered by the detection model itself greatly reduce analysis time.
ate the detection e�ectiveness of ST-Graph and to test whether it
Encrypted Malware Tra�ic Detection
via Graph-based Network Analysis RAID 2022, October 26–28, 2022, Limassol, Cyprus

(a) ETA (b) ST-Graph

Figure 7: Visual distinction of features where red dots are


malware.

vector visualization of the ETA and ST-Graph. The �gure clearly


shows that our approach enables a greater aggregation of malware
Figure 6: Malware family classi�cation result of ST-Graph. compared to ETA, which enables a better classi�cation result.
Malware Family Classi�cation. The results in Table 2 show that
Figure 6 shows the normalized confusion matrix of multi-class it is possible to detect malware using only network tra�c, but
classi�cation for ST-Graph. And the results of the other two models single-stream methods are not su�cient to characterize the mal-
show in Appendix D. We observe that ETA model and FS-Net model ware families. Our approach makes it easier to �nd correlations
can only identify a small part of malware families while having poor between malware. By combining the spatial-temporal character-
performance in the majority of them. For example, ETA performs a istics of hosts accessing the network, our detection results can be
good classi�cation for RelevantKnowledge. After analyzing the raw improved considerably.
tra�c data generated by all samples in RelevantKnowledge family,
we found that all samples have the same cipher suites, signature 6.4 Generalization of ST-Graph
algorithms and other characteristics in the TLS handshake phase. In addition to examining the model’s generalization to new infec-
These features also di�er from the tra�c generated by the samples tions (§6.3.1), we also conduct ablation studies to explore the role
in other families, forming a unique �ngerprint. Beyond that, the of stream attributes and spatio-temporal embeddings.
distinction between malware may not be obvious, and the features 6.4.1 Features. We reduce the variety of features of edges to con-
extracted by humans are designed to detect anomalies rather than �rm that ST-Graph can detect malware in less informative and
to distinguish between malware families. On the other hand, FS-Net more critical situations. TLS 1.3 [52] has been standardized in 2018
prefers to group malware into large malware families because large and the adoption rate reaches 48% on major websites by January
malware families have a greater variety of packet length sequences. 2021 [27, 33]. The main change in this version is it enhances secu-
Hence, it is more di�cult for FS-Net to distinguish between malware rity by also encrypting the handshake process, while this results in
with similar packet length sequences. less information being available in ClientHello. And the proposed
Our model can achieve 93% precision in classifying malware fam- Encrypted-SNI (ESNI) extension will prevent others from fetching
ilies. Figure 11 (in Appendix D) visualizes the results of embedding the server name [6, 14]. This directly a�ects the acquisition of TLS
the tra�c graphs into six of these families. It shows that malware handshake features and domain features for Spatio-temporal Graph.
families can be distinguished based on the clustering results, and Even so, our method still works under the stringent conditions of
each family can form a large cluster. In addition to some intuitive TLS 1.3 and remains e�cient and e�ective. We run ST-Graph on
TLS handshake features, our model can capture these same-family EncMal2021 without stream attributes that are blocked by TLS 1.3,
network relationships and so achieve better classi�cation results. and the results are shown in Table 3 (2nd row), where the stream
However, our model still cannot classify malware families com- attributes are removed one by one from top to bottom.
pletely correctly. On the one hand, this is because malware families According to the results, we conclude that as the feature types
are classi�ed by security analysis tools based on the software bi- are reduced, there is a slight weakening in the precision and a slight
nary, and the labels themselves may not be completely accurate; increase in the FPR of our method. Without using any stream fea-
on the other hand, there is still a portion of malware that may tures, ST-Graph is still able to achieve satisfactory results and even
not present complete communication in the sandbox. These two causes a lower FPR compared to baseline models using all stream
reasons can cause our model to misclassify malware that is not attributes (See Table 2). This suggests that the spatio-temporal fea-
well di�erentiated. In general, our model can perform �ne-grained tures extracted by our method play a decisive role. Therefore, our
classi�cation. approach is still competitive in detecting malicious tra�c after the
6.3.2 Evaluation on AndMal2019. We also test our model on a pub- full deployment of TLS 1.3.
licly available dataset to verify that it is applicable to a wide range 6.4.2 Embedding. We modify the graph embedding algorithm in
of data with good detection results. Graph Representor to standard techniques (such as DeepWalk [49]
Malware Detection. We �rst evaluate the detection e�ectiveness and node2vec [22]), leaving the other modules unchanged for test-
on this dataset, i.e. the binary classi�cation performance. Table ing. This experiment is also tested on EncMal2021 and the results
2 (2nd row) illustrates the results and Figure 7 shows the feature are recorded in Table 3 (3rd row). The results show that our system
RAID 2022, October 26–28, 2022, Limassol, Cyprus Fu and Liu, et al.

has good detection and generalisation capabilities and that our Table 4: Precision(%) of Real-world evaluation results.
graph embedding algorithm does perform better in our scenario Stage ETA FS-Net ST-Graph
compared to other graph embedding algorithms.
Campus-04/15/2021 8.0 4.0 86.0
Campus-02/25/2022 6.0 2.0 64.0
Table 3: Ablation study’s results of ST-Graph.
Enterprise-04/15/2021 1.0 0.5 80.0
Settings Precision(%) Recall(%) FPR(%) Enterprise-02/25/2022 0.8 0.5 68.0
Original model 99.9805 99.9221 0.0045
No domain feat. 99.9784 99.8983 0.005 performance of the model. The system is able to consistently main-
Features No handshake feat. 99.8522 99.5578 0.0161 tain a high detection accuracy and a low false alarm rate. For the
No statistical feat. 99.5848 99.0915 0.0955 parameters % # , the detection performance tends to get better and
DeepWalk 99.7376 99.4148 0.0184 then worse as the number of random walk paths per edge increases
Embedding
node2vec 99.8891 99.5947 0.0121 and the number of contexts per edge increases.

6.5 Robustness of ST-Graph 7 REAL-WORLD EVALUATION


We then design experiments to analyse the robustness of ST-Graph To assess the performance of ST-Graph more comprehensively,
about noise labels and the sensitivity to parameters of ST-Graph. we deploy it in two real-world scenarios, i.e., a campus and an
enterprise gateway, for almost a year-long (from April 2021 to
6.5.1 Noise Labels. We evaluate how noisy data a�ect the proposed April 2022) operation and evaluation. These two network scenarios
model in this experiment. This is worth concerning for samples both have thousands of active users inside. Due to the signi�cant
running in a sandbox may not produce truly malicious behaviour number of users, the network throughput is large enough to prove
in a short period of time, while such behaviours are still labelled as the e�ciency of ST-Graph, reaching the tra�c bandwidth of 3.6
malicious in our dataset as long as they are generated by malware. Gbps on the campus and 1.7 Gbps in the enterprise. Due to its best
To explore the impact of this problem, we randomly select a portion performance, we deploy the model trained in §6.3.1 in this real-
of benign hosts and mark them as malicious ones and investigate world evaluation and compare its performance to existing detection
whether this changes the decision of the model. systems (ETA [2] and FS-Net [37]).
7.1 Real-word Result
Manual Analysis of Precision. To evaluate the precision of
alarms, we operate a manual analysis. It is impossible to manu-
ally analyse all alarms, with nearly tens of thousands of alerts per
day from these 3 models. As such, we randomly select 50 alarms
for each model and analyse them manually by an expert researcher,
with the help of information from threat intelligence [56]. First,
Figure 8: Detection performance with noise labels. we judge an alert to be correct when an identi�er (domain or IP
address) of the destination server is �agged malicious by threat in-
telligence. Then, for alarms which threat intelligence cannot cover,
In the experiment, the ratio of benign to malicious hosts is 10:1.
we manually visit this destination server and judge with response
Figure 8 shows the relationship between the proportion of benign
contents. If the response content compared to the contextual tra�c
hosts mislabeled as malicious ones and the detection e�ectiveness.
information is abnormal, e.g., response is a phishing website, we
The solid line represents the precision of the detection, while the
consider this alarm is correct. To notice, if there is no correct alarm
dotted line is the FPR. From the results, we observe that ST-Graph
in these 50 samples, we resample 50 alarms from unsampled data
and EAT show better robustness than FS-Net. More speci�cally,
space until we �nd at least one correct alarm, and the precision rate
with 15% mislabelling, ST-Graph still achieves 97% precision and
is calculated on all sampled alarms. Due to the random nature of
lower than 1% FPR. We observe that labelling benign hosts as mali-
sampling, we believe that our results on the sampled dataset can
cious does not change the detection rate of truly malicious tra�c.
be representative of the global results.
Even if the samples do not generate malicious tra�c when running
Results of Campus. To illustrate the variation in detection per-
in the sandbox and are consequently mislabeled, the impact on the
formance over time, we choose results for the start date and end
model prediction is not serious by only introducing a small number
date to analyse and compare, which are shown in Table 4. At the
of false positives.
start (April 2021), ST-Graph had the lowest average daily alarm
6.5.2 Parameter Sensitivity. We evaluate ST-Graph under di�erent rate of 0.237% compared to the other two detection systems (3.034%
parameter settings. Speci�cally, we keep all parameters at their for ETA and 3.665% for FS-Net). From the precision of alarms by
default values and adjust the values of three parameters ⇡⌘ (i.e., manual analysis on 50 random sampled alarms in Table 4, we �nd
dimension of host representation), %! (i.e., length of random walk) that the precision of ST-Graph(86%) signi�cantly surpasses the
and % # (i.e., times of random walk starting from each edge) in turn, other two detection systems. As a result, ST-Graph is accurate and
respectively. The experimental results are shown in Figure 12 in meanwhile with a low false positive, which can signi�cantly reduce
Appendix D. The results show that changing the parameters within the cost of manual auditing. In addition, ST-Graph even can detect
a certain range does not have a signi�cant impact on the detection a large number of false negatives that cannot be detected by other
Encrypted Malware Tra�ic Detection
via Graph-based Network Analysis RAID 2022, October 26–28, 2022, Limassol, Cyprus

Attack Phase

www.s**ta.com

Control Phase
s0.2**n.net
lupic.cdn.b**bos.com

ssum-sec.ca**ia.com

image6.p**ic.com

pixel.ru**ct.com

cms.qu**ve.com
odr.mo**1.com
(a) Average length per stream (b) Time interval between streams

Figure 10: Statistical information on mining tra�c.


ag.in**id.com

(a) Ewind examples (b) Host’ access sequence


that most of these websites are advertising websites (blue circles),
Figure 9: Activities of Ewind.
while two websites are benign websites with promotional content
(white circles). Therefore, we think that the connection sequence
two detection models. In order to evaluate the ageing issue over is the root cause for the detection of ST-Graph, which re�ects
time, we compare the results of these 3 models after nearly one the whole procedure of this malware. Apart from the anomalous
year (February 2022). Although there is a slight degradation in the observation of the connection relationship, this malware didn’t ex-
performance of ST-Graph over time, it is still far superior to the hibit any other malicious features. Speci�cally, Ewind works on the
other two models. Android platform that uses a system interface to encrypt tra�c [4].
Results of Enterprise. As the training set for the detection model Therefore, the encryption-related information in TLS is not di�er-
comes from the campus, the distribution is di�erent from that of ent from benign applications, which leads to the failure of ETA, the
the enterprise network. A�ected by the inconsistent distribution of TLS-information-based detection system. Besides, in the control
training set, the average daily alarm rate for all models is higher phases, Ewind has no subsequent interaction after TLS handshake.
than the results on campus. However, ST-Graph still has the lowest As such, FS-Net, based on packet-based side-channel information,
alarm rate at 0.249%, compared to 22.313% for ETA and 26.482% for has no capability to detect this malware.
FS-Net. Similar to the results of campus, ST-Graph achieves the Miner. Malicious coin miners are an emerging attack since 2018,
highest precision and its performance is less a�ected by the time whose activity is similar to ransomware [43]. By unauthorized
ageing issue. use of victim’s device for cryptomining, it can consume valuable
It’s worth noting that our detection is e�cient in both scenarios. computing resources, which poses new threats to the Internet.
In enterprise, ST-Graph only costs 160 seconds for 5-minute tra�c From the results of campus, we detect a host infected by a ma-
with 1.7 Gbps bandwidth. Even in the more complex network of licious miner. Without SNI or any other domain information, this
campus, ST-Graph still cost less time, with 200 seconds for 5-minute host only connects one target server only with IP address while
tra�c with 3.6 Gbps bandwidth. Overall, our model is e�ective and with hundreds of connections. Based on the response by manually
e�cient in detecting malware-infected hosts, subject only to slight connecting to this IP address, we verify that this IP address hosts
time ageing issues. an online mining pool. By analysis of the communication tra�c
7.2 Case Studies between this host and server, we �nd highly regular activities, es-
As mentioned before, ST-Graph could detect false negatives from pecially side-channel temporal features. As shown in Figure 10, the
other detection models, i.e., malicious behaviours that are not de- average length of streams between this host and server and the time
tected by other models. To understand the di�erences in perfor- interval between each connection are regular and within a �xed
mance of di�erent detection models for these “unknown” malware, range. From this regular behaviour, we speculate that this infected
we analyse two representative cases: EWind and Miner. host sends “heartbeat” messages to indicate its alive status and
EWind. As shown in Figure 9a, Ewind is a sophisticated, long- mining results to mining pool (control server) regularly. Although
standing class of adware [62], pro�ting through displaying adver- the connection sequence is not very informative, ST-Graph is still
tisements on the victim’s device. More seriously, Ewind also in- able to detect this kind of malware based on this regular activity.
cludes functionality such as collecting device data and forwarding From the analysis of these “unknown” malware, we have further
SMS messages to attackers, which poses a serious security risk. proven the e�ectiveness of ST-Graph.
From the results of enterprise, we identify a host infected with 8 DISCUSSION
malware belonging to a variant of the Ewind [70] family, without
In this section, we discuss the defence ability of ST-Graph against
being detected by the other two detection models. As we cannot
existing attacks and the limitations of ST-Graph.
access the source code of Ewind, we only analyse the tra�c be-
haviour of these infected hosts. By extracting the connection list of 8.1 Defence Ability against Attacks
this infected host, we depict the infection procedure of Ewind in Several existing works have proposed methodologies to mislead
Figure 9b. First, it visited a free software download site, which is tra�c detection systems, including disguising attacks and obfusca-
peculated be the attack phase. Afterwards, we observe a series of tion attacks.1 We have already discussed in §6.4 that the extracted
access requests to multiple websites at very short intervals (almost
simultaneously). By manually checking these websites, we �nd 1 In this work, we don’t compare attacks target to learning-based models [5, 64].
RAID 2022, October 26–28, 2022, Limassol, Cyprus Fu and Liu, et al.

spatio-temporal features in our approach play a decisive role in en- • Features of TLS streams utilize the information in the TLS Hand-
hancing the model’s generalizability. On this basis, we next discuss shake process, which is the only plain-text process of TLS interac-
the impact of each attack on ST-Graph. tions. Anderson et al. extract information as representative features
Disguising Attack. Goal of this attack is to disguise malicious traf- for detection, including TLS version, the cipher suites provided by
�c as benign applications. Frolov et al. proposed a tool to modify the client, the TLS extensions used, the server certi�cate, and the
TLS information to mimic other popular TLS implementations by result of negotiation between the two parties [2, 3]. Althouse et al.
changing the �ngerprints extracted from ClientHello and Server- �rst stitched information from TLS handshake to compute JA3/JA3S
Hello [17]. However, this attack only modi�es the TLS-stream in- �ngerprints and detect malware by matching the client/server �n-
formation without changing the network connection relationships. gerprints [1]. However, methods with only these features will be
Therefore, ST-Graph can still detect infected hosts even with this ine�ective under TLS 1.3, with fewer plain-text available informa-
kind of attack. tion for detection.
Obfuscation Attack. This attack is more discreet, which tends to • Features of side-channel exploits information, such as packet
confuse side-channel information-based detection systems. Wang length, packet interval time and packet length frequency. For exam-
et al. [65] change the timestamp, direction and packet size of pack- ple, several works utilized packet length statistics information to
ets to confuse detection based on packet length and time interval. detect malware at the TCP/IP layer [19, 60, 67]. Furthermore, Liao
Similarly, even changing the side-channel information, ST-Graph et al. utilized deep learning techniques to improve the performance
retains the connection relationships, which allows it to still e�ec- of the detection models based on packet length sequences.
tively detect. Since obfuscation attacks will cause a highly dispersed Above methods usually consider a single feature, which is not
distribution of packet lengths and time intervals, which is re�ected robust and can be easily escaped by using evasion strategies.
in higher entropy. As such, it’s simple to defend this attack, i.e., Context-based detection. Encryption results in limited available
calculating the entropy of side-channel information. plain-text information for TCP connections, context-based detec-
To conclude, existing attacks have a limited impact on ST-Graph. tion methods try to use information from multiple protocols [3, 20]
or multi-stream connections to extend the perspective. In fact, in-
formation from DNS could help to construct the relation graph of
8.2 Limitations multiple malware servers [35, 44, 50]. To notice, Oprea et al. detect
Scale of network. We acknowledge that the scale of internal net- malware and APT infections within an organization [46], while
work has an impact on ST-Graph. The increase of internal hosts their method only works on available seeds of known malware.
will cause more nodes and edges in our graph structure. A larger Note that relations between multiple malware and hosts may
number of edges in the graph brings a heavier time cost for the de- change over time, so capturing the dynamic changes is essential in
tection, and therefore an in�nite number of hosts cannot be handled the detection. However, previous graph-based methods are mainly
at the gateway. However, our real-world evaluation is conducted based on static relationship graphs and usually ignore the temporal
on two large representative networks, with bandwidth up to 3.6 characteristics, which limits the e�ectiveness of detection with less
Gbps and covering over 10,000 hosts. This huge-scale experiment information. In our work, we propose ST-Graphto detect malicious
could prove the e�ciency of ST-Graph in reality. tra�cs based on a spatial-temporal graph.
In future work, we would explore more e�cient tra�c feature
representation methods and ease the practical deployment limi- 10 CONCLUSION
tations by deploying clusters or separate inspection by network
In this paper, we propose ST-Graph, an encrypted malicious tra�c
segment for large enterprise network environments with high band-
detection system equipped with a well-designed, novel graph repre-
width. In addition, the features we used rely on the TLS protocol.
sentation learning algorithm. By exploring additional, informative
Although the model can still achieve 99.85% precision (Table 3) after
network attributes and e�ectively integrating multiple features, ST-
removing protocol-related features, it is still necessary to explore
Graph achieves high detection accuracy with a signi�cantly lower
the detection methods for generic encrypted tra�c in the future.
false alarm rate and tolerable computational complexity. Experi-
mental results on both self-collected dataset and benchmark dataset
9 RELATED WORK demonstrate the e�ectiveness and the e�ciency of ST-Graph, com-
pared with state-of-the-art malware tra�c detection systems. In
Traditional network-based malware detection methods such as
addition, ST-Graph shows good performance in both generalization
DPI [47, 54] and HTTP-based methods [41, 45], perform keyword
and robustness perspectives and reveals outstanding e�ciency by
matching on the plain-text payload of each packet. However, tra�c
real-world deployment.
encryption makes these methods no longer e�ective. Besides, lim-
ited available information under encryption makes network-based
detection more challenging. We divide existing encrypted-network-
based detection methods into single-stream-based detection and ACKNOWLEDGMENTS
context-based detection. We are grateful to anonymous reviewers for their constructive
Single-stream-based detection. Single-stream-based detection comments on this work. This work is supported by National Natural
strives to dig any available plain-text information for detection in Science Foundation of China (Grant No. U1836213, U19B2034), the
encrypted tra�c. Two main features are utilized by single-stream- Huawei Technologies Co., Ltd under Grant No. TC20200917004. Jia
based detection: features of TLS stream and features of side-channel. Zhang is the corresponding author ([email protected]).
Encrypted Malware Tra�ic Detection
via Graph-based Network Analysis RAID 2022, October 26–28, 2022, Limassol, Cyprus

REFERENCES Behavior in Sandboxes.. In NDSS.


[1] John Althouse. 2019. TLS Fingerprinting with JA3 and JA3S. https://engineering. [32] Ulf Lamping and Ed Warnicke. 2004. Wireshark user’s guide. Interface 4, 6 (2004),
salesforce.com/tls-�ngerprinting-with-ja3-and-ja3s-247362855967. 1.
[2] Blake Anderson and David McGrew. 2016. Identifying Encrypted Malware Tra�c [33] Hyunwoo Lee, Doowon Kim, and Yonghwi Kwon. 2021. TLS 1.3 in Practice: How
with Contextual Flow Data. In AISec@CCS. 35–46. TLS 1.3 Contributes to the Internet. In Proceedings of the Web Conference 2021.
[3] Blake Anderson and David McGrew. 2017. Machine Learning for Encrypted 70–79.
Malware Tra�c Classi�cation: Accounting for Noisy Labels and Non-Stationarity. [34] Jehyun Lee and Heejo Lee. 2014. GMAD: Graph-based Malware Activity Detec-
In SIGKDD. 1723–1732. tion by DNS tra�c analysis. Computer Communications 49 (2014), 33–47.
[4] AndroidDev. 2021. Developer Guides-SSLSocket. https://developer.android.com/ [35] Kai Lei, Qiuai Fu, Jiake Ni, et al. 2019. Detecting malicious domains with behav-
reference/javax/net/ssl/SSLSocket Accessed January 22, 2022. ioral modeling and graph embedding. In ICDCS. IEEE, 601–611.
[5] Aleksandar Bojchevski and Stephan Günnemann. 2019. Adversarial Attacks on [36] Yankai Lin, Zhiyuan Liu, Maosong Sun, et al. 2015. Learning entity and relation
Node Embeddings via Graph Poisoning. In ICML, Vol. 97. PMLR, 695–704. embeddings for knowledge graph completion. In AAAI.
[6] Zimo Chai, Amirhossein Ghafari, and Amir Houmansadr. 2019. On the Impor- [37] Chang Liu, Longtao He, Gang Xiong, et al. 2019. Fs-net: A �ow sequence network
tance of Encrypted-SNI (ESNI) to Censorship Circumvention. In FOCI@USENIX for encrypted tra�c classi�cation. In INFOCOM. IEEE, 1171–1179.
Security Symposium. USENIX Association. [38] Dan Mcwhrter. 2014. APT1: Exposing One of China’s Cyber Espionage
[7] Catalin Cimpanu. 2016. NOPEN Is the Equation Group’s Backdoor for Unix Units. https://www.mandiant.com/resources/apt1-exposing-one-of-chinas-
Systems. https://news.softpedia.com/news/nopen-is-the-equation-group-s- cyber-espionage-units.
backdoor-for-unix-systems-508257.shtml Accessed March 10, 2022. [39] Tomas Mikolov, Kai Chen, Greg Corrado, and Je�rey Dean. 2013. E�cient
[8] Cisco. 2018. Cisco Advanced Malware Protection Solution Overview. estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
https://www.cisco.com/c/en/us/solutions/collateral/enterprise-networks/ (2013).
advanced-malware-protection/solution-overview-c22-734228.html. [40] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Je� Dean. 2013.
[9] Cloud�are. 2021. Cloud�are Gateway. https://www.cloud�are.com/products/ Distributed representations of words and phrases and their compositionality.
zero-trust/gateway/ Accessed March 10, 2022. Advances in neural information processing systems 26 (2013).
[10] Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Cli�ord Stein. 2022. [41] Mamoru Mimura, Yuhei Otsubo, Hidehiko Tanaka, and Hidema Tanaka. 2017. A
Introduction to Algorithms. MIT press. practical experiment of the HTTP-based RAT detection method in proxy server
[11] Neustar International Security Council. 2020. NISC Survey Results. https://www. logs. In AsiaJCIS. IEEE, 31–37.
nisc.neustar/nisc-survey-results/ Accessed March 20, 2022. [42] Yisroel Mirsky, Tomer Doitshman, Yuval Elovici, and Asaf Shabtai. 2018. Kitsune:
[12] Antonio Criminisi, Jamie Shotton, Ender Konukoglu, et al. 2012. Decision forests: an ensemble of autoencoders for online network intrusion detection. arXiv
A uni�ed framework for classi�cation, regression, density estimation, manifold preprint arXiv:1802.09089 (2018).
learning and semi-supervised learning. Foundations and trends® in computer [43] Michael Nadeau. 2021. Cryptojacking explained: How to prevent, detect,
graphics and vision 7, 2–3 (2012), 81–227. and recover from it. https://www.csoonline.com/article/3253572/what-is-
[13] Azure documentation. 2022. Microsoft Antimalware for Azure Cloud Services cryptojacking-how-to-prevent-detect-and-recover-from-it.html.
and Virtual Machines. https://docs.microsoft.com/en-us/azure/defender-for- [44] Pejman Naja�, Andrey Sapegin, Feng Cheng, and Christoph Meinel. 2017. Guilt-
iot/organizations/how-to-control-what-tra�c-is-monitored. by-association: detecting malicious entities via graph mining. In International
[14] David Fi�eld. 2018. Anticipating a world of encrypted SNI: risks, opportunities, Conference on Security and Privacy in Communication Systems. Springer, 88–107.
how to win big. https://www.bamsoftware.com/sec/esni.html. [45] Terry Nelms, Roberto Perdisci, and Mustaque Ahamad. 2013. Execscent: Mining
[15] FireEye. 2020. De�nition of Malware Family. https://vision.�reeye.com/editions/ for new c&c domains in live networks with adaptive control protocol templates.
06/06-m-trends-�reeye-mandiant.html. In USENIX Security Symposium. 589–604.
[16] Francois Fouss, Alain Pirotte, Jean-Michel Renders, et al. 2007. Random-Walk [46] Alina Oprea, Zhou Li, Ting-Fang Yen, et al. 2015. Detection of early-stage
Computation of Similarities between Nodes of a Graph with Application to enterprise infection by mining large-scale log data. In DSN. IEEE, 45–56.
Collaborative Recommendation. IEEE Trans. Knowl. Data Eng. 19, 3 (2007), 355– [47] Vern Paxson. 1999. Bro: a system for detecting network intruders in real-time.
369. Computer networks 31, 23-24 (1999), 2435–2463.
[17] Sergey Frolov and Eric Wustrow. 2019. The Use of TLS in Censorship Circum- [48] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel,
vention. In NDSS. The Internet Society. Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss,
[18] Sean Gallagher. 2021. Nearly half of malware now use TLS to conceal commu- Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. the
nications. https://news.sophos.com/en-us/2021/04/21/nearly-half-of-malware- Journal of machine Learning research 12 (2011), 2825–2830.
now-use-tls-to-conceal-communications/ Accessed November 20, 2021. [49] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning
[19] Ali Gezer, Gary Warner, Cli�ord Wilson, et al. 2019. A Flow-Based Approach of social representations. In Proceedings of the 20th ACM SIGKDD international
For Trickbot Banking Trojan Detection. Comput. Secur. 84 (2019), 179–192. conference on Knowledge discovery and data mining. 701–710.
[20] Paul Giura and Wei Wang. 2012. A Context-Based Detection Framework for [50] Babak Rahbarinia, Roberto Perdisci, and Manos Antonakakis. 2015. Segugio: E�-
Advanced Persistent Threats. In CyberSecurity. IEEE Computer Society, 69–74. cient behavior-based tracking of malware-control domains in large ISP networks.
[21] Palash Goyal and Emilio Ferrara. 2018. Graph embedding techniques, applications, In DSN. IEEE, 403–414.
and performance: A survey. Knowl. Based Syst. 151 (2018), 78–94. [51] Radim Rehuurek, Petr Sojka, et al. 2011. Gensim—statistical semantics in python.
[22] Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for Retrieved from genism. org (2011).
networks. In SIGKDD. ACM, 855–864. [52] Eric Rescorla. 2018. The Transport Layer Security (TLS) Protocol Version 1.3.
[23] Aric Hagberg and Drew Conway. 2020. NetworkX: Network Analysis with RFC 8446. https://doi.org/10.17487/RFC8446
Python. URL: https://networkx. github. io (2020). [53] Ronald L Rivest and Jean Vuillemin. 1976. On recognizing graph properties from
[24] William L Hamilton. 2020. Graph representation learning. Synthesis Lectures on adjacency matrices. Theoretical Computer Science 3, 3 (1976), 371–384.
Arti�cal Intelligence and Machine Learning 14, 3 (2020), 1–159. [54] Martin Roesch et al. 1999. Snort: Lightweight intrusion detection for networks..
[25] William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive Representation In Lisa, Vol. 99. 229–238.
Learning on Large Graphs. In NeurIPS. 1025–1035. [55] Samuel Schüppen, Dominik Teubert, Patrick Herrmann, and Ulrike Meyer. 2018.
[26] Cobaltstrike Helpsystem. 2021. Beacon Covert C2 Payload. https://www. FANCI: Feature-based Automated NXDomain Classi�cation and Intelligence. In
cobaltstrike.com/help-beacon Accessed November 20, 2021. USENIX Security. 1165–1181.
[27] Ralph Holz, Jens Hiller, Johanna Amann, et al. 2020. Tracking the deployment of [56] Gaurav Sood. 2021. virustotal: R Client for the virustotal API. R package version
TLS 1.3 on the Web: A story of experimentation and centralization. SIGCOMM 0.2.2.
50, 3 (2020), 3–15. [57] Gianluca Stringhini, Christopher Kruegel, and Giovanni Vigna. 2013. Shady paths:
[28] Dan Jiang and Kazumasa Omote. 2015. An Approach to Detect Remote Access Leveraging sur�ng crowds to detect malicious web pages. In CCS. 133–144.
Trojan in the Early Stage of Communication. In AINA. IEEE Computer Society, [58] Laya Taheri, Andi Fitriah Abdul Kadir, and Arash Habibi Lashkari. 2019. Exten-
706–713. sible android malware detection and family classi�cation using network-�ows
[29] İlker Kara and Murat Aydos. 2019. The ghost in the system: technical analysis of and API-calls. In ICCST. IEEE, 1–8.
remote access trojan. International Journal on Information Technologies & Security [59] Robert Tarjan. 1972. Depth-�rst search and linear graph algorithms. SIAM journal
11, 1 (2019), 73–84. on computing 1, 2 (1972), 146–160.
[30] Catherine Knowles. 2021. End of 2021 marks drop in cyber attacks, and increase [60] Florian Tegeler, Xiaoming Fu, Giovanni Vigna, and Christopher Kruegel. 2012.
in remote access malware. https://securitybrief.asia/story/end-of-2021-marks- Bot�nder: Finding bots in network tra�c without deep packet inspection. In
drop-in-cyber-attacks-and-increase-in-remote-access-malware. CoNEXT. 349–360.
[31] Alexander Küchler, Alessandro Mantovani, Yufei Han, Leyla Bilge, and Davide [61] Michael Carl Tschantz, Sadia Afroz, Vern Paxson, et al. 2016. Sok: Towards
Balzarotti. 2021. Does Every Second Count? Time-based Evolution of Malware grounding censorship circumvention in empiricism. In 2016 IEEE Symposium on
Security and Privacy (SP). IEEE, 914–933.
RAID 2022, October 26–28, 2022, Limassol, Cyprus Fu and Liu, et al.

[62] Tobias Urban, Dennis Tatang, Thorsten Holz, and Norbert Pohlmann. 2018. To- C SAMPLES FOR MANUAL ANALYSIS
wards understanding privacy implications of adware and potentially unwanted
programs. In ESORICS. Springer, 449–469. Table 6 presents information on the 30 samples analysed manually.
[63] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro
Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint Table 6: Families and Md5 of Malware Samples.
arXiv:1710.10903 (2017).
[64] Binghui Wang and Neil Zhenqiang Gong. 2019. Attacking graph-based classi�ca- Family Related md5
tion via manipulating the graph structure. In CCS. 2023–2040. ab775c62c8d03b33f9d9b60e013d54f5
[65] Junnan Wang, Liu Qixu, Wu Di, Ying Dong, and Xiang Cui. 2021. Crafting
3eeace60ad9f357dc8b77981465381c3
Adversarial Example to Bypass Flow-&ML-based Botnet Detector via RL. In 24th
International Symposium on Research in Attacks, Intrusions and Defenses. 193–204. Adware 3185657ca1707f2364f34fea46afc455
[66] Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge 75e3d279cc20419f2af57975f755614c
graph embedding by translating on hyperplanes. In AAAI, Vol. 28. 04436941856e5a8c3296b55292c47636
[67] Nigel Williams, Sebastian Zander, and Grenville Armitage. 2006. A preliminary
fc92be6f976b598cf65a241c8520cfd6
performance comparison of �ve machine learning algorithms for practical IP
tra�c �ow classi�cation. SIGCOMM 36, 5 (2006), 5–16. f18ac8264e592a4949c5fe09979234be
[68] James Wyke. 2016. The ZeroAccess rootkit. https://nakedsecurity.sophos.com/ e8766491e8c8a9fc79e86197b1a55a76
zeroaccess4/ Accessed March 20, 2022. d5bea7ed4�c27366e218a99d5709987
[69] Zhixing Xu, Sayak Ray, Pramod Subramanyan, and Sharad Malik. 2017. Mal- Wacatac
fe3bea4366ecc34ddbf90291762cba88
ware detection using machine learning based analysis of virtual memory access
patterns. In DATE, 2017. IEEE, 169–174. d63d7bceed0da682db6170c24663b3b0
[70] Simon Conant Yaron Samuel. 2017. Ewind – Adware in Applications’ Cloth- df5ed0925bfb4e141a134bfdf4b2e0ce
ing. https://unit42.paloaltonetworks.com/unit42-ewind-adware-applications- d32e7100988f924a0070418051de053f
clothing/ Accessed January 22, 2022.
[71] Yanfang Ye, Tao Li, Donald A. Adjeroh, and S. Sitharama Iyengar. 2017. A Survey
d31ca0d08a4bc600c51ecd6e891551eb
Minerd
on Malware Detection Using Data Mining Techniques. ACM Comput. Surv. 50, 3 331dfc88f7d056b2667875199eb2d504
(2017), 41:1–41:40. 332265a774e2ad113cbf4d05189d2ee0
[72] Si Zhang, Hanghang Tong, Jiejun Xu, and Ross Maciejewski. 2019. Graph convo- CobaltStrike 363eddcc28509a08c039833a9b6e2a04
lutional networks: a comprehensive review. Computational Social Networks 6, 1
(2019), 1–23. 79a4c854d00928024f9ce3020a041451
[73] Futai Zou, Siyu Zhang, Weixiong Rao, and Ping Yi. 2015. Detecting malware Flystudio 67b4843d49d60e16372160cc10f80cf8
based on DNS graph mining. International Journal of Distributed Sensor Networks gamhak c6752�eb1ebc6bba468a71c36e69c85
11, 10 (2015), 102687.
01372c76417280c2c7a524edd268d5a0
krypyik
ceb684549a97dae140df9e4cef10c308
A MACHINE LEARNING ALGORITHMS hoax 464852e25233ece2e3ed769e727f3ef3
We compared seven common algorithms: Logistic Regression, Sup- Rbot e90fb38bd6c50f517d6d1fc00b445f91
port Vector Machine (SVM), Naive Bayes, Arti�cial Neural Network avaddoncrypt 33874816e3eb31b874e1301fcf73bb72
(ANN), k-nearest neighbours (k-NN), Decision Tree and Random lockscreen 4cf1bf8ebf10d596f7ecbb1c24258eef
Forest ensemble. We used an implementation of Scikit-learn for all graftor 71fb3ed4bf17e328c045a062fbf0895e
19435957f4a3d1380bfcd4c087e40a93
algorithms except ANN, which uses Keras. All also used the default csdi
cc07157af9a75f492748baed0d22a9fe
hyperparameters. The experimental results are shown in Table 5.
NetWorm 97acceb1b93ace58c250901ebd55aadd
Table 5: Classi�cation Result of 7 algorithms

Precision(%) Recall(%) FPR(%) D EVALUATION RESULTS


Logistic Regression 97.8118 98.0980 0.2417 Figure 11 visualizes the results of embedding the tra�c graphs into
SVM 97.4046 93.3431 0.2740 six of these families, Figure 12 shows the experiment results of
Naive Bayes 92.7736 98.6101 0.8461 parameter sensitivity, and Figure 13 shows malware family classi�-
ANN 95.9901 98.9759 0.1653 cation results of ETA and FS-Net in EncMal2021.
k-NN 99.8524 98.9759 0.0161
Decision Tree 99.3426 99.4879 0.0725
Random Forest 99.9805 99.9221 0.0045 Generickdz
Minerd
RelevantKnowledge

B PERFORMANCE METRICS Sality


Brocoiner
We de�ne True Positive (TP) as malicious connections predicted as Installcore
malicious and False Positive (FP) as benign connections predicted
as malicious. True Negative (TN) represents benign connections
classi�ed correctly and False Negative (FN) represents malicious
connections classi�ed incorrectly. We use the following metrics to
evaluate the detection performance: (i) precision, (ii) recall and (iii)
false-positive rates (FPR).
)%
%A428B8>= =
)% + %
)%
'420;; = ) %' =
)% + # Figure 11: Visualization of the 6 malware families.
%
%' =
% +)#
Encrypted Malware Tra�ic Detection
via Graph-based Network Analysis RAID 2022, October 26–28, 2022, Limassol, Cyprus

(a) Dimensionality of host vectors ⇡⌘ (b) Random walk path length %! (c) Random walk path nums per edge % #
Figure 12: Parameter sensitivity results.

(a) Results of ETA. (b) Results of FS-Net.


Figure 13: Malware family classi�cation result of ETA and FS-Net.

E THE STREAM ATTRIBUTES 1


Table 7 shows the single-stream attributes used in our model (See
§5) and their descriptions.
Table 7: The Stream Attributes.
Attribute Type Attribute Description
Domain Length Total number of characters in the domain.
Domain Level Level of domain. Sub, second-level or top-level.
Domain features
Punctuation/Upperletter/
The proportion of di�erent character types to all characters.
Lowerletter/Digit Ratio
Vowel/Consonant Ratio The proportion of vowel/consonant letters to all letters.
Client/Server TLS version The TLS version selected by the client and server.
Number of Client Ciphersuites The number of the o�ered cipher suites by the client.
Client Ciphersuite G The Gth in the list of cipher suites provided (G = 1, ..., 10).
Server Chosen Ciphersuite The cipher suite chosen by the server during the connection.
Client/Server Compression Method 2 method selected by the client and server.
The compression
TLS handshake features
Number of Client Extensions The number of the client extensions.
Number of Signature Algorithms The number of the signature algorithms o�ered by client.
Signature Algorithm G The Gth in the list of signature algorithms provided (G = 1, ..., 4).
Number of ec Point Formats The number of point formats o�ered by the client.
Number of Elliptic Curve The number of elliptic curves o�ered by the client.
TLS Packet Length G Length of the Gth packet in the stream (G = 1, ..., 25).
TLS Time Interval G The Gth time interval in the stream (G = 1, ..., 25).
Statistical features
Number of Packets The number of the total packets in the stream.
Max/Min/Average Packet Length The max/min/average length of total packets in the stream.

You might also like