0% found this document useful (0 votes)
49 views13 pages

MalClassifier Malware Family Classification Using Network Flow Sequence

MalClassifier is a system that classifies malware families using network flow sequences without requiring access to the infected host or malware binary. It abstracts malware families' network behaviors into network flow profiles and uses these as features to build machine learning classifiers. Evaluating on ransomware and botnet datasets, MalClassifier achieves 96% accuracy in classifying malware families based only on their network flows. This approach provides insights into recurring malware network patterns while avoiding privacy and access issues of analyzing malware binaries directly.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views13 pages

MalClassifier Malware Family Classification Using Network Flow Sequence

MalClassifier is a system that classifies malware families using network flow sequences without requiring access to the infected host or malware binary. It abstracts malware families' network behaviors into network flow profiles and uses these as features to build machine learning classifiers. Evaluating on ransomware and botnet datasets, MalClassifier achieves 96% accuracy in classifying malware families based only on their network flows. This approach provides insights into recurring malware network patterns while avoiding privacy and access issues of analyzing malware binaries directly.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

MalClassifier: Malware Family Classification Using

Network Flow Sequence Behaviour


Bushra A. AlAhmadi and Ivan Martinovic
Department of Computer Science
University of Oxford, Oxford, United Kingdom
{bushra.alahmadi, ivan.martinovic}@cs.ox.ac.uk

Abstract—Anti-malware vendors receive daily thousands of analysis to understand its behaviour. The number of unique
potentially malicious binaries to analyse and categorise before profiles for malware variants of a family is huge, however,
deploying the appropriate defence measure. Considering the their malicious behaviour is similar and consistent within the
limitations of existing malware analysis and classification meth-
ods, we present MalClassifier, a novel privacy-preserving system malware family [34]. Grouping samples into families and
for the automatic analysis and classification of malware using extracting their distinctive behaviour patterns can help security
network flow sequence mining. MalClassifier allows identifying analysts in understanding the evolution of the various families.
the malware family behind detected malicious network activity Analysts can then assess the potential damage of a newly dis-
without requiring access to the infected host or malicious ex- covered sample, prioritise encountered threats, in turn enabling
ecutable reducing overall response time. MalClassifier abstracts
the malware families’ network flow sequence order and semantics the development of effective mitigation mechanisms [2]. The
behaviour as an n-flow. By mining and extracting the distinctive incident response procedure for ransomware is different from
n-flows for each malware family, it automatically generates a botnet, and being able to determine the family can speed up
network flow sequence behaviour profiles. These profiles are the remediation process.
used as features to build supervised machine learning classifiers Existing dynamic analysis tools for malware classification,
(K-Nearest Neighbour and Random Forest) for malware family
classification. We compute the degree of similarity between a flow such as [26], [29], use a sand-box approach that requires the
sequence and the extracted profiles using a novel fuzzy similarity existence of the malicious binary and running it in a controlled
measure that computes the similarity between flows attributes environment for classification. Today, more companies are
and the similarity between the order of the flow sequences. outsourcing the security management and network monitoring
For classifier performance evaluation, we use network traffic to companies that offer Security Operations Centre (SOC) as a
datasets of ransomware and botnets obtaining 96% F-measure for
family classification. MalClassifier is resilient to malware evasion service to escape the significant investments and lack of cyber-
through flow sequence manipulation, maintaining the classifier’s security expertise [19], [1]. Although these companies monitor
high accuracy. Our results demonstrate that this type of network the client’s systems and networks for signs of malicious
flow-level sequence analysis is highly effective in malware family activity, they may not have direct access to the client’s hosts
classification, providing insights on reoccurring malware network due to privacy or policy reasons. When malware-infected hosts
flow patterns.
are the source of the detected malicious traffic, the analysts
I. I NTRODUCTION need to negotiate access to the infected host with the client.
Most cyber-attacks leverage malware to perpetrate attacks Even if access is possible, the time required to negotiate this
that may lead to financial, privacy, and even human life loss. and running the executable in a sand-box for classification is
More than 317 million new variants of malware were observed not efficient, particularly in situations where rapid detection
in 2014, an estimate of 1 million unique malware each day, and response is critical. In addition, malware may produce
increasing the total number of malware to approximately 1.7 different behaviours depending on the environment they are
billion [35]. Perhaps the main challenge for anti-malware run in, or even may not act at all [18]. Therefore, analysts
vendors is the growing stream of incoming malware samples need on-the-wire malware classification systems capable of
due to the use of polymorphic and metamorphic techniques, matching detected malicious network traffic to a malware
allowing the same malware to be modified avoiding detection. family, thus not requiring to access the infected host.
Although the binary classification of an unknown executable Previous approaches such as [22], [23], [18] proposed sys-
to either malicious or benign (malware detection) is important, tems for malware classification using network traffic. Perhaps
accurately determining which family the malicious executable the most related work to ours is [18] that observes the high-
belongs to is also a much sought after application [33], [18]. level network features of malware and applied n-gram doc-
Malware classification determines whether a malicious bi- ument analysis for family classification. Our work improves
nary is a member of a family seen previously, or whether on [18], by (1) using non-payload features, making our system
it is a novel, previously unseen sample requiring further privacy aware and robust against encryption; (2) adapting to
malware behaviour changes; (3) not requiring the execution of
the malware in a sand-box, thus performing classification on-
978-1-5386-4922-0/18/$31.00 
c 2018 IEEE the-wire. Similarly, previous approaches are either not content

Authorized licensed use limited to: TU Delft Library. Downloaded on October 14,2021 at 12:15:15 UTC from IEEE Xplore. Restrictions apply.
agnostic, relying on features from payload (e.g. [23], [22]) or classifying malware based on the malware binary content or
protocol dependent (e.g. cover only HTTP traffic) such as [22]. sequence of bytes in binary files. For example, the authors
We compare the related work to ours in detail in Section II. in [14] performed malware family classification using n-grams
In this paper, we propose MalClassifier, a novel system of bytecode. However, such content-based approaches require
for the automatic extraction of network behavioural charac- dissembling the binary, which is a time-consuming process
teristics for malware family classification. As malware use and vulnerable to malware obfuscation. Dynamic analysis was
some form of network communication to propagate or contact applied for malware classification in [20] to construct system
their command and control (C&C) servers [24], MalClassifier calls behaviour graphs and control flow graph signatures [6].
derives the network behavioural characteristics for a malware Bayer et al. in [3] used dynamic analysis to generate malware
family, abstracting the malware family’s network behaviour to behavioural profiles used as features in a clustering algorithm
a flow sequence profile. MalClassifier is effective in classifying to group malware based on behaviour. Rieck et al. in [26]
malicious network flow sequences to a malware family on-the- applied dynamic analysis to identify the unique behavioural
wire, thus negating the need for access and sand-box execution features of malware samples and use these features to build
of the malware binary. With the increasing use of encryption Support Vector Machine (SVM) classifiers that map unknown
by malware for obfuscation and by hosts for privacy (e.g. samples to known malware families. Similarly, Rieck et al.
HTTPS), malware classification based on non-content agnostic in [28] proposed a framework for malware behavioural anal-
systems become challenging. Therefore, MalClassifier uses ysis, using clustering (to identify novel malware with similar
non-identifiable high-level network traffic for training the behaviour) and classification (to assign unknown malware to
classifiers, making it resilient to encryption. This also makes these clusters). Although these approaches are effective in
the data required to train the classifiers accessible and easier classifying malware by observing its host-level behaviour,
to share than full PCAP traces due to privacy concerns. In they require access to the binary executable which might not
designing MalClassifier, we make our approach IP-agnostic, be possible in certain scenarios. In contrast, MalClassifier
as sophisticated malware, such as exploit kits, apply dynamic classifies malware based on the network behaviour, requiring
DNS and Domain Generation Algorithms (DGA) to change only the high-level network traffic.
their communications destination [13]. Network-based Malware Family Classification. In this
Similar to [17], we apply n-grams to sort network flows section, we focus on recent contributions in malware fam-
into groups of n consecutive flows (i.e. n-flows). Such a ily classification using network traffic. Malware behavioural
representation provides granularity to the extracted malware clustering approaches aim to group malware behaviour to
behaviour. For example, a single failed SMTP flow might be reveal similarities between malware samples that may not be
benign, but multiple consecutive ones exhibit the behaviour captured using system-level malware analysis. Perdisci et al.
of a bot sending spam. MalClassifier mines n-flows that are in [22], proposed a network-level clustering system to derive
distinctive for each malware family. Such distinctive n-flows HTTP network behaviour similarity of HTTP-based malware
are used as features to train a supervised model capable for detection. FIRMA [23] used a similar approach to cluster
of classifying unseen n-flows to the malware family. The and generate network signatures for malware network traffic.
models learn re-occurring network flow patterns that capture The most directly related work to ours is CHATTER [18].
the characteristics of a malware family’s network behaviour. Such system considers the order of high-level network events
The contributions of this paper are three-fold: as features and applies n-gram document analysis to produce
• We propose a novel fuzzy flow sequence similarity mea- the classifier achieving 90% accuracy. The malware network
sure, that calculates the value similarity of two flow behaviour profile is represented as fine-grained events, thus
sequences. each attribute of a single packet is considered an event.
• We propose an order sequence similarity measure robust For example, an inbound packet using TCP on port 80 is
against malware evasion through flow sequence manipu- represented as A1 A3 A6, where A1 refers to the inbound
lation. connection event, A3 refers to the TCP protocol usage event,
• We develop a prototype of MalClassifier, and evaluate its and A6 refers to the usage of port number 80 event. Instead,
performance using a dataset of malicious network traffic, MalClassifier uses a similar approach to [17], using coarse-
achieving more than 95% F-measure for malware family grained groups to represent a network behaviour, i.e. A1A3A6
classification. is considered a single event. Therefore, our system is able to
capture behaviours such as SMTP flow followed by another
II. R ELATED W ORK SMTP flow that might indicate an email spam.
Malware is a constant cyber challenge for organisations, and Moreover, the numeric features are mapped into one of
the research literature is rich with contributions on its analysis the four quartiles that are pre-determined based on the train-
and classification [9], [7]. We discuss the most relevant contri- ing data. However, there is a risk of new malware variants
butions that apply a sequencing analysis approach or malware falling out of the quartiles ranges leading to inaccurate family
network flows as features for malware family classification. classifications. In addition, this abstraction may result in loss
Host-based Malware Analysis and Family Classification. of underlying distinctive behaviour that could have resulted
Early malware family classification contributions focused on in more accurate classifications. Instead, MalClassifier uses

Authorized licensed use limited to: TU Delft Library. Downloaded on October 14,2021 at 12:15:15 UTC from IEEE Xplore. Restrictions apply.
Cosine Similarity to compare the similarity of the numeric TABLE I: Comparison of related work on malware family
features, thus not requiring pre-determined ranges or thresh- classification and MalClassifier against our design goals.
olds.

Evasion Resiliency
Content-agnostic
The main limitation of previous work is their use of payload

Use sand-box
Use n-grams
IP-agnostic
features, meaning they are vulnerable to malware obfuscation

Accuracy
through encryption and that they are not privacy-preserving.
In Table I, we compare the beforehand related work to Data
MalClassifier. Specifically, we identify the data used to train Rieck et al. [26] Behavioural reports N N N N Y 80.7%
the classifier, whether the approach is IP and content agnostic, Rieck et al. [28] Behavioural reports N N N N Y 95%
Perdisci et al. [22] HTTP Traffic N Y N N N N/A
uses n-grams, and if the approach requires running the sample FIRMA [23] Network traffic N N N N N 98.8%
in a sandboxed environment. We also identify whether related CHATTER [18] Network traffic N Y N Y Y 90%
work addressed malware evasion through noise injection in MalClassifier Bro conn logs Y Y Y Y N 96%
their system design.
Malware Network Flow Detection. MalClassifier is a
malware family classification system, meaning it classifies communication events to build malware classification models.
malware to a family based on its network behaviour. For MalClassifier uses a similar approach in representing malware
completeness, we discuss malware detection systems that have network communications as a sequence of flows of length n.
applied a similar methodology or have applied network flows
as features for detection. Mekky et al. in [17] used a similar III. S YSTEM OVERVIEW
approach to [18] by first isolating malware traffic from the Malware exhibit diverse and complex network traffic be-
benign traffic using Independent Component Analysis (ICA). haviour. Yet, malware variants of the same family have been
However, their approach differs from [18] as they use coarse- known to have common behavioural patterns reflecting their
grained groups of network events so that inbound, DNS, origin and purpose [11], [26]. In our system, we aim to exploit
and port number are considered one network event. Similar these shared patterns, particularly their network behaviour, for
to [18], they use a count-based approach for network flow malware family classification.
identification. MalClassifier can be deployed by security analysts to
Botzilla [27] monitored malware network traffic, specifically understand and classify the behaviour of a malicious exe-
a bots’ communication to the C&C server, in a sandboxed cutable by observing its network flows when access to that
environment to find patterns and generate network signatures. executable itself is not possible. It maps malicious network
Similarly, BotSniffer [12] is a network-based anomaly de- flow sequences (n-flows) to a malware family by comparing
tection system, focused on detecting C&C network commu- these sequences to previously observed malware activity. In
nication. BotFinder [36] monitors bots’ network traffic in general, MalClassifier operates in three phases as illustrated
a controlled environment and generates malware detection in Figure 1.
models that can be deployed at network egress points. These 1) Pre-processing and Sub-Sequence Extraction: Network
models then detect individual, bot-infected hosts by monitor- traffic of malware variants of malware family y is input
ing their network traffic. BotMiner [10] detects botnet C&C into Bro for network flow reassembly. The assembled
communications by applying a clustering approach to detect flows are then encoded to a textual sequence. The
correlated C&C communications, malicious activities, and it sequence is then divided to sub-sequences (n-flows) of
performs cross-cluster correlation to detect hosts that have length n.
similar malicious activity. Disclosure [5] extracts flow size- 2) Malware Family Profile Extraction: The Value Similar-
based features, client access patterns, and temporal behaviour ity of each individual flow in the encoded sequence to
features from netflow communication to detect botnet C&C all other flows is computed, in turn the sub-sequence
communications. The main purpose of these systems is to similarity is determined. The most distinctive flow sub-
detect botnet C&C communications. However, MalClassifier sequences (n-flows) are selected as profiles for malware
extracts malware network flow sequence behaviour during its family y.
various infection stages, and classifying unknown traffic to a 3) Training and Building Models: Using the profiles, the
malware family based on that sequence behaviour. supervised machine learning model is trained for classi-
Sequential Pattern Mining. Research on extracting rel- fying unseen n-flows to a malware family.
evant patterns between malware samples where the values
are in a sequence format for malware classification is well A. Design Goals and Requirements
established. For example, Santos et al. in [30] used n-grams, The main limitation of existing malware family classifica-
which are sub-strings of a larger string with length n, to tion approaches is the need to obtain the malware executable,
generate file signatures for malware detection. Wressnegger run it in a sandbox to classify it to a malware family.
et al. in [38] studied the applicability of n-grams for anomaly Unfortunately, this timely process hinders its adoption in real-
detection and classification. Mekky et al. in [17] used n-gram world application where access to the infected host is not
to encode the order of sub-sequences of malware network possible or where a rapid analysis and response is crucial.

Authorized licensed use limited to: TU Delft Library. Downloaded on October 14,2021 at 12:15:15 UTC from IEEE Xplore. Restrictions apply.
)% & &  "   
 
 
 !    
 !
!! ! &   !  


#  "  !  

 " 
" 
  " 
" 
 ! 
" ""
(#
 $ #
** (# ($
&-$"&.3- ($/
 
 
"2  
2      !  ! 
 
3   "3  
! ! 
"!
,, 
    ,, 
     ! ! 
"!
"  
        
!" !
       
     
!" !
 ,         ! 
" ""

+%  


*% "
1 ,, ,, 
2 $ $ $ $
'(  %( #
    3 $ $ $ $ ' &#
  ,, $ $ $ $
 
 # $
…  $ $ $ $ ) ,, ,, 
   !   # !
  
) $ $ $ $
#-!# $$ ) ,, ,,  
#&! * $ $ $ $  (&
#*#%*  ,  0&-
) $ $ $ $
#
 
1 ,, ,,   ,, $ $ $ $
#%* * $ $ $ $
  2 $ $ $ $  $ $ $ $ "&  #%
$ (# ,, $ $ $ $
  3 $ $ $ $ * )%#%   $ $ $ $
. +/
…
  
'$%$% ,, $ $ $ $
   $ $ $ $
) ,, ,, 

  
   $#%* ) $ $ $ $ %( # '(

 1 ,, ,,  * $ $ $ $ ' &#
… 2 $ $ $ $ ,, $ $ $ $ # $ %( #

 
  %#- ($% 3 $ $ $ $  $ $ $ $ ' &#
,, $ $ $ $ # $
 $ $ $ $

 
 
   ( ) ,, ,, 

… #-!# $$ $ $ $ $
  "& )

  
1 ,, ,,  0&- * $ $ $ $
2 $ $ $ $ #%* 
"& ,, $ $ $ $
3 $ $ $ $  (    
)%#%   $ $ $ $
,, $ $ $ $  &$# 



 $ $ $ $
$ (#
*

Fig. 1: MalClassifier System.

To foster its real-world use, we consider this limitation and • On-the-wire classification. Obtaining the malicious ex-
the following requirements when designing MalClassifier to ecutable and running it in a sandbox should not be
ensure that our system is resilient to malware evasion and required. Instead, it should be able to classify malicious
classifies n-flows with a high accuracy. We compare the most network traffic on-the-wire to a malware family.
related contributions to MalClassifier, and summarise their • Resilient to malware evasion attacks and adaptability
deployment of the identified design goals in Table I. to malware behaviour changes. The system must be
• IP-agnostic. The system must not use destination IP as a adaptive to malware evolution and behavioural changes.
feature. Malware rapidly changes its (C&C) and deploy Meaning, if the malware network flow behaviour changes
sophisticated domain generation algorithms and domain slightly (e.g. change protocol UDP to TCP or increase/de-
shadowing of legitimate domains to evade reputation crease in size), then the system model should still be able
filtering. to classify to the correct malware family. In addition,
• Non-privacy invasive. SOC-as-a-service clients require the system should be robust against malware evasion
privacy preserving monitoring and systems that observe through flow field manipulation by using tamper-resistant
the payload are privacy invasive. Therefore, the system features [4]. Malware may attempt to change the order
must not rely on network traffic payload to extract sequence of the flows to avoid detection. Thus, the system
features. In addition to reducing the storage space, this should consider flow order deception, and be able to still
makes the system resilient to encryption which sophis- classify manipulated sequences with high accuracy.
ticated malware and benign hosts (e.g. HTTPS) use. • High classification accuracy. The classifier must aim
Moreover, the non-identifiable network data required for to provide acceptable classification accuracy using only
training the models are accessible, which is critical for sub-sequences of network traffic, thus not requiring a
supervised classifier training and for the potential adop- malware’s full packet captures.
tion and acceptance of the system at scale.
• Automatically identify distinctive malware network
IV. MalClassifier D ESIGN
behaviour. The system must be able to automatically Considering the design goals in Section III-A, we discuss
identify and extract distinctive network behaviour (i.e., the design of each module of the MalClassifier system in detail
profiles) of each malware family. in the following sections.

Authorized licensed use limited to: TU Delft Library. Downloaded on October 14,2021 at 12:15:15 UTC from IEEE Xplore. Restrictions apply.
TABLE II: Description of the fields in conn.log generated by y, we encode the log x to a long sequence of network flows,
Bro network monitoring framework and used in MalClassifier. E(x) = f1 → f2 → .. → fi , where fi represents a single flow
Field Description in the log x. Formally, we define a flow fi as:
resp port Destination port
fi :=< resp port, proto, service, orig bytes, resp bytes,
proto Transport layer protocol (TCP, UDP) conn state, history, orig pkts, orig ip bytes, resp pkts,
service Application protocol being sent over the connection.
orig bytes Number of payload bytes the originator sent
resp bytes >
resp bytes Number of payload bytes the responder sent As we show in Figure 1, the result of the Flow Encoding
conn state State of the Connection.
13 different states (e.g. connection attempt rejected) module is a textual sequence representation of the flows in the
history State history of connections as a string of letters. conn.log of malware samples of a malware y.
orig pkts Number of packets that the originator sent
orig ip bytes Number of IP level bytes that the originator sent 3) Flow Sub-Sequence Extraction: We represent the be-
resp pkts Number of packets that the responder sent haviour of malware as a sub-sequence of flows. This provides
resp ip bytes Number of IP level bytes that the responder sent
higher granularity of the captured malware network behaviour.
For example, a single icmp flow does not map to a particular
behaviour, but a sequence of multiple icmp flows represent
A. Pre-processing
a malware performing an icmp scan. To capture the malware
In order to convert the network flows into a format that can flow-level behaviour, we consider sub-sequences of the mal-
be applied to sequence mining methods, we first need to pre- ware network flows of length n, called n-flows as shown in
process the data by reassembling and encoding the network Figure 1. Therefore, when n = 1, the 1-flow represents a
flows. single network flow, and when n = 2, the bi-flow represents
1) Network Flow Reassembly: Organisations deploy net- two consecutive network flows, and so on. This results in a
work monitoring systems such as Bro 1 , which generates group of consecutive network flows of length n, that reflect
statistical and behavioural logs about the network communica- flow-level behavioural patterns. Such an approach allows us to
tions, the application level protocols, and exchanged payload capture the network sequence behaviour and extract the unique
of each network flow. Maintaining full network traces (PCAPs) sub-sequence (i.e n-flow) profiles.
requires a huge amount of storage, making it challenging for
organisations to keep full network traces for more than a cou- B. Malware Family Profile Extraction
ple of days. On the other hand, to investigate security breaches,
whose effects might show up much later, logs may be stored MalClassifier extracts the malware family’s distinctive net-
for a longer period and reduce storage costs. We envision work flow sequence behaviour, abstracting that behaviour as
MalClassifier to be deployed with network monitoring systems a sequence profile. We discuss below how these profiles are
for on-the-wire malware classification. generated for each malware family. To clarify the different
We note that the Network Flow Reassembly module is only steps involved in generating the behavioural profiles, we use
needed when converting network PCAP traces to Bro conn a running example. Thus, we show in Figure 1 an example of
logs, and therefore is not needed if the Bro logs are available the extracted sub-sequences or n-flows (when n = 2) of two
or when Bro is applied in the network for network monitoring. malware samples (Malware A and Malware B).
To extract behavioural features from the malware PCAP 1) Flow Sequence Similarity: Similarity measures are de-
traces, we use Bro to reassemble the network flows. A flow fined as ‘functions that quantify the extent to which objects
is a sequence of packets from a source host and port to a resemble each other, taking as an argument object pair and
destination host/port that is part of a unique TCP/UDP session. return numerical values that are higher as the objects are
Packets in a flow are either going to (or coming from) the alike’ [15]. Choosing the similarity measures relies on the
same destination IP address and port. As input, Bro takes data nature itself, whether binary, numerical or structured
the captured malware PCAP network traces and generates a data (e.g. sequences, trees, graphs). Similarity measures for
number of logs for each malware sample. These logs include binary data are used to determine the presence or absence
information that is useful in understanding malware behaviour, of characteristics in the object pair (i.e. pair of flows), thus
such as C&C communication statistics, DNS queries and fast take a value 1 if the flow possesses the characteristic, and 0
fluxing, unusual communications (e.g. unknown protocols) and otherwise. For example, the authors in [18] applied a binary
port-host scanning. In MalClassifier, we leverage the Bro similarity approach, by counting the number of times the exact
conn.log file that shows non-identifiable network flow header n-gram occurs. We argue that network flows are structured
information of TCP/UDP/ICMP connections. Each row in the data, and thus binary similarity are not suitable as they do not
log represents an individual flow f and is described by 20 capture the malware underlying network flow semantic.
attributes representing the column fields. We use 11 of the Instead, MalClassifier applies a fuzzy Value Similarity mea-
attributes derived from the conn.log and described in Table II. sure. Thus, instead of determining is a sub-sequence for
2) Flow Encoding: For each log x ∈ X, where X is the malware A and a sub-sequence for malware B the same?,
set of all Bro conn.log logs for samples of a malware family fuzzy similarity determines how similar a sub-sequence in
malware A to a sub-sequence in malware B by computing the
1 https://www.bro.org degree of similarity. Value Similarity computes the similarity

Authorized licensed use limited to: TU Delft Library. Downloaded on October 14,2021 at 12:15:15 UTC from IEEE Xplore. Restrictions apply.
of each flow fi A in a sub-sequence for Malware A to its S0|ShAdDaf F ) is (2, 4) respectively, resulting in an
corresponding flow fi B in the sub-sequence for Malware B. average Levenshtein Distance of 3, scaled to 0.03.
A single flow f is composed of a number of attributes as • Cosine Similarity: The numeric attributes of the two
we defined in Table II. Each attribute is semantically different flows are represented as two vectors. Cosine Similarity
in data type and value distribution, and thus when choosing measures the similarity of two non-zero vectors by cal-
a similarity measure the underlying differences between the culating the cosine of the angle between them. Given the
attributes need to be considered. For example, history is vectors x and y of length n = 6 (number of numeric
represented as a string of characters, where each character has attributes), the cosine similarity is represented as:
a meaning and the order of the characters also has a meaning n
xy i=1 xi yi
that should be considered. In contrast, numeric attributes such cos(x, y) = =  n  n
xy i=1 (xi )
2
i=1 (yi )
2
as (orig bytes, resp bytes, orig pkts, orig ip bytes, resp pkts,
(1)
resp ip bytes), are represented as numeric vectors, thus nu-
The Cosine Similarity of x = [367, 3547, 6, 615,
meric similarity measures such as Cosine Similarity should
7, 3835] ∈ f0 A , y = [1801, 15606, 14, 2369, 18, 16334] ∈
be applied.
f0 B is 0.00025. Similarly, the Cosine Similarity of x =
Although resp port is a numeric, the underlying meaning
[322, 464, 5, 530, 4, 632] ∈ f1 A , y = [5274, 1370, 18,
differs from other numeric fields. For example, although the
5975, 28, 456] ∈ f1 B is 0.284. Thus, the average Cosine
values of orig bytes = 100 or orig bytes = 99 should be
Similarity for Seq1 and Seq2 is 0.142.
considered similar, a small difference in resp port does not
• Inter-flow Distance (resp port): Inter-flow distance cal-
indicate a similarity (port 80 for HTTP could be considered
culates the distance between the resp port in every two
related to port 443 for HTTPS, whilst port 23 being closer
consecutive flows of a sub-sequence. This helps identify
to port 80 is semantically different). Therefore, applying one
malware network behavioural attributes such as perform-
similarity measure to all attributes is not sufficient.
ing a port scan (e.g. when the difference of resp port of
We propose a hybrid value similarity measure to determine
two consecutive flows is 1). To calculate the inter-flow
the similarity of two flow sequences. Our hybrid approach
similarity, we first calculate the distance of resp port in
takes into account the semantic differences of the flow at-
each consecutive flows in a sub-sequence. For example,
tributes and applies a similarity measure suitable to each
the resp port distance between f0 A , f1 A ∈ Seq1 is 0,
attribute. In general, we apply four similarity measures, de-
as the resp port in both flows is the same. However,
pending on the flow attributes.
We list in the following each similarity measure providing the resp port distance between f0 B , f1 B ∈ Seq1 is 55,
an example of how the similarity of Seq1 of Malware A and which is the difference between port 80 and port 25. The
Seq2 of Malware B shown in Figure 1 is calculated. inter-flow similarity of Seq1 and Seq2 is the distance of
(0,55), thus is 55.
• Binary similarity (resp port, protocol, service): The
The Inter-flow distance is normalised to obtain a value
similarity is 1 if the attribute values are the same,
in [0, 1] to define a dissimilarity. The simplest and
otherwise 0. Thus, the Binary Similarity in our example
most common normalisation method uses a linear trans-
of (f0 A ∈ Seq1 : 80|tcp|http, f0 B ∈ Seq2 : 80|tcp|http)
formation, known as feature scaling where η(d) =
and (f1 A ∈ Seq1 : 80|tcp|http, f1 B ∈ Seq2 : 25|tcp|ssl) d−d min
is (1, 0.33) respectively, resulting in an average Binary d max−d min . However, the drawback of applying this
method is that it is sensitive to outliers [15]. Therefore,
Similarity of 0.665.
to normalise the distance value, we apply another normal-
• Levenshtein Distance [16] (history,conn state): Lev-
isation method that overcomes this drawback by defining
enshtein Distance is a fuzzy string similarity measure that
parameter Z = (m, M ), where m and M are user-
measures the minimum number of modifications required
defined values and are interpreted as a tolerance thresh-
(insertions, deletions, and substitutions) to change one
old, where ηZ (d) = min(max( M −m , 0), 1) [15]. Any
d−m
string into the other, divided by the maximum length of
distance value d less than min means that there is zero
the same two strings. It also takes into consideration the
dissimilarity, thus distinct points (that have a non-zero
order of the characters in the string. Assuming the cost
distance) could be determined identical. Consequently,
of insertion, deletion, and modification is the same (= 1),
distance values higher than max are considered totally
then the Levenshtein Distance of making ShADdF a into
dissimilar [15]. In our example, we set Z = (10, 100),
ShADadR is 3. The trivial implementation has a runtime
resulting in a dissimilarity of 0.5.
and space complexity of O(nm). The distance value
ranges from [0,100], where 0 indicates a low distance We define the hybrid similarity approach in Algorithm 1:
thus higher similarity and 100 indicates a low similarity. Value Similarity. The Value Similarity function takes as input
We scale the distance value to be in the range [0,1] instead two flow sub-sequences, X and Y . For each flow f in sub-
of [0,100]. sequence X and Y , we extract the port protocol service,
Using our example, the Levenshtein Distance of (f0 A ∈ history state, and numeric attributes, and calculate the
Seq1 : SF |ShADdF a, f0 B ∈ Seq2 : SF |ShADadf F ) Binary similarity, Levenshtein Distance, and Cosine Similarity,
and (f1 A ∈ Seq1 : SF |ShADdF a, f1 B ∈ Seq2 : and Inter-flow Distance consequently. The similarity of the pair

Authorized licensed use limited to: TU Delft Library. Downloaded on October 14,2021 at 12:15:15 UTC from IEEE Xplore. Restrictions apply.
Algorithm 1 Value Similarity apply the value similarity measure defined in Section IV-B1.
1: procedure VALUE S IMILARITY(X,Y) To reduce memory and processing complexity, we pre-
2: X ← [x1 , x2 , ..., xi ], where xi = [f0 , f2 , ..., f10 ] compute the similarity of each pair of flows in S and store
3: Y ← [y1 , y2 , ..., yi ], where yi = [f0 , f2 , ..., f10 ]
4: attributes ← [], attributes of flow in conn.log as a key-value store. Consequently, when determining the
5: Sim ← [] similarity of two n-flows, the similarity of each flow is pre-
6: for i in range (0,n) do computed and is fetched from the key-value store. Therefore,
7: xi ← X[i]
8: yi ← Y [i] only unique flow pairs are stored and only the similarity of
9: xi+1 ← X[i + 1] new flow pairs are computed.
10: yi +1 ← Y [i + 1] We are interested in mining the malware family’s network
11: port prot ser ← (xi [0 : 2], yi [0 : 2])
12: history state ← (xi [3 : 4], yi [3 : 4]) traffic for the highly frequent n-flows, i.e., flows that occur
13: numeric ← (xi [5 :], yi [5 :]) in all or the majority of the samples of a malware family.
14: if port prot ser[0] == port prot ser[1] then Therefore, we represent the n-flow i as a vector in a high-
15: binary = 1
16: else dimensional space, where the elements (features) in the n-flow
17: binary = 0 vector corresponds to all n-flows of a malware family y.
18: ld = 1−L EVENSTEIN D ISTANCE(history state) 3) Profile Selection: In order to extract the malware family
19: cosine = 1−C OSINE D ISTANCE(numeric)
20: interf low = abs((xi [0] − xi+1 [0]) − (yi [0] − yi +1 [0])) profiles, we select the most relevant n-flows (profiles) that
21: s = w0 binary + w1 ld + w2 cosine + w3 interf low represent that malware family’s network behaviour. For each
22: Sim.append(s) malware family, we mine the network flows for all samples of
return Average(Sim) that malware family, then select n-flows that are significant.
We apply a modified version of class-wise feature selec-
Algorithm 2 Order Similarity tion approach as proposed in [25]. The class-wise document
1: procedure O RDER S IMILARITY(X,Y ,n)
frequency is the number of network flows in a malware
2: X ← [x1 , x2 , ..., xi ], where xi = [f0 , f2 , ..., f10 ] family y that contain an n-flow s. We assign each n-flow
3: Y ← [y1 , y2 , ..., yi ], where yi = [f0 , f2 , ..., f10 ] a class-dependent weight according to its coverage in the
4: n ← sequence length
5: if length(X) == 1 then malware family network traffic (tendency). As we apply a
1 fuzzy similarity approach, we multiply the average similarity
6: return VALUE S IMILARITY(X,Y )∗
n of an n-flow s in a malware family y to its tendency. We then
7: else
length(X) select the top k n-flows as profiles for that malware family.
8: s0 = VALUE S IMILARITY(X,Y )∗
n
9: s1 = O RDER S IMILARITY(X[1 :],Y [1 :],n)
10: s2 = O RDER S IMILARITY(X[: −1],Y [: −1],n) C. Building Malware Family Classification Models
11: return Max(s0 ,s1 ,s2 )
In situations where an analyst detects a malicious network
behaviour and needs to determine its origin, the malware’s
full network traffic may not have been captured. Therefore, to
of flows is the sum of the four similarities, each multiplied avoid misclassifying malware binaries as a result of not having
by a weight w. The weight w is defined as the number access to its full packet capture, we classify the network n-
of attributes that were given to the similarity measure, e.g. flows. This ensures that the malicious binary can be classified
f low sim = 3 binary+2 ld+6 cosine sim+1 inter f low. based on a sub-set of its network flow communication, thus
The highest possible similarity score using this approach is not requiring the malware’s whole network packet capture.
12. The weights w could also be used to give importance to Training. We show in Figure 1 how the classifier is trained
a similarity measure over another. to produce the model used for classification of future unseen
Using our example, X would be Seq1 and Y is Seq2 . The n-flows. To train the classifier, a collection of malicious n-
similarity of Seq1 and Seq2 is calculated as follows: flows of malware samples M = m1 , m2 , ..., ml and their cor-
sim = 3(0.665)+2(1−0.03)+6(1−0.142)+1(1−.5) = 9.5 responding malware families Y = y1 , y2 , ..., yk are required.
As Levenshtein Distance, Cosine Similarity, and Inter- Hence, we train a multi-class supervised classifier with the
flow Distance are distance measures, we derive the simi- aim of determining if an n-flow i of a malware sample m
larity measure through decreasing functions as S(x, y) = belongs to a malware family y, where the selected profiles S
1 − LevenshteinDistance(x, y) and S(x, y) = 1 − represent the features in our classifier.
CosineDistance(x, y). Thus the similarity measure is the Classification. The trained model is then deployed to clas-
complement to 1 of dissimilarity [15]. sify unseen n-flows to a family. One of the main design goals
2) Malware Family n-flow Mining: To extract the network for MalClassifier is resiliency to malware evasion. Malware
flow sequence behaviour for a malware family y, we derive authors may try to evade the sequence detection by changing
the set S of n-flows of all the network flows in the log x that the order of the flows or injecting noise flows. Therefore, when
belong to the malware family y. We calculate the similarity deploying the trained model, in addition to calculating the
sim(i, s) of each n-flow i in a malware sample mj to each Value Similarity we also introduce the Order Similarity defined
n-flow s in the set S. To calculate the value sim(i, s), we as Algorithm 2. The Order Similarity algorithm computes the

Authorized licensed use limited to: TU Delft Library. Downloaded on October 14,2021 at 12:15:15 UTC from IEEE Xplore. Restrictions apply.
highest Value Similarity score of all possible sub-sequence in TABLE III: Description of datasets.
X and Y . Dataset Family # Samples # Flows # Unique Flows
CTU-13 Murlo 1 37019 422
Using our example, the possible sub-sequences for Rbot 4 46,184,716 1697
X are (f0 A ), (f0 B , f1 B ), (f1 A ) and for Y are (f0 B ), Virut 2 358,378 4088
(f0 B , f1 B ), (f1 B ), where length(subsequence) <= n. Neris
Menti
3
1
839,077
291,677
24895
160
The Order Similarity computes the Value Similarity NSIS.ay 1 30,063 2591
of all possible sub-sequence orders, thus S(f0 A , f0 B ), Stratosphere
IPS Project
Miuref
Sality
7
1
1867273
6,073,775
17257
307,355
S((f0 B , f1 B ), (f0 B , f1 B )), S(f1 A ), (f1 B )). Then, the Value WannaCry 7 291,677 330
Similarity is multiplied by length(X)/n. Thus, the similarity Conficker 2 323,238 6,300
Notpetya 4 5,424 174
of the sub-sequence is at its highest when the length of sub- Total 33 56,302,317 365,289
sequence for X = n, and at its lowest when the length of
sub-sequence X = 1, thus is a single flow.
the nature of the dataset scenarios and how it was generated
V. E VALUATION see [8].
To evaluate our approach, we implement the system as a
multi-threaded Python application. To accelerate the analysis, B. Experiments
the application uses Python Multiprocessing with separate The aim of the experiments is to (1) determine how ac-
threads to calculate the similarity scores for each malware curately we can classify an n-flow to its malware family
sample. The experiments were conducted on a 40-core pro- using the extracted profiles as features; and (2) determine the
cessor with 126GB RAM, Centos OS. classifiers robustness to malware evasion. In particular, we
plan to explore the following:
A. Dataset
Impact of Flow Sequence Similarity. We measure the
We experiment with a dataset of popular ransomware and effect of applying each similarity approach (Levenshtein Dis-
botnets network traces, with a total of 11 malware family tance, Cosine Similarity, Binary Similarity, and Inter-flow Dis-
classes. We provide an overview of the malware families in tance), on the classifier performance. This will help determine
our datasets in Table III. which similarity measure has the highest positive influence on
Botnets are known to follow a certain infection life-cycle, the classification and thus will be assigned a higher weight w,
sharing similarities in network flow characteristics in each as discussed in Section IV-B1.
infection phase [11]. Thus, we evaluate the effectiveness of Impact of n-flows. We evaluate the impact of using an n-
our system in determining the unique network flow behaviour flow approach for malware family classification. We separate
of each botnet family in our dataset, and its accuracy in the network flows in our dataset into n-flows of length 1
classifying botnet network traffic to its family despite the (single flow) to 7 consecutive flows. The aim is to determine
botnets’ network behaviour similarities. To train our classifier if the malware family classification accuracy improves when
we used (1) the CTU-13 botnet traffic dataset [8] provided a sequence of flows approach such as n-flows is applied as
by the Malware Capture Facility project and (2) current opposed to a single flow.
botnets and ransomware (Miuref, WannaCry, Conficker, Sality, Robustness to Malware Evasion. Malware authors could
Notpetya) provided by Stratosphere IPS Project2 . attempt to evade detection by changing the malware network
The CTU-13 dataset contains 13 scenarios of botnet network flow behaviour, either by injecting noise packets and changing
traffic. The botnet families represented in the dataset employed the flow sequence order, affecting the numeric attributes in a
various protocols (e.g. IRC, P2P, HTTP and various techniques flow e.g. sent/received packets. To account for this, we explore
(e.g. sending SPAM, DDoS attacks, Fast-Fluxing). Our main the robustness of the classifiers’ to evasion by evaluating
motivation for using this dataset is that real botnet attacks the classifier’s performance in classifying n-flows when we
network traces were captured, providing reliable datasets for randomly shuffle the order of the flows in a sequence.
model building. In addition to running all samples on the same
network environment, precautions were taken to ensure that C. Classifier Design
the full bot network flows were captured and outgoing traffic
was not filtered or rate-limited. This ensures that the data is We build multi-class classifiers that map an unknown net-
clear from artefacts that can affect the classifier performance, work n-flow input as belonging to one malware family.
due to how the collection environment was set up. We applied Classifiers. We apply two supervised machine learning
network flows of scenarios 1 − 13 to train and evaluate the classifiers, K-Nearest Neighbour (KNN) and Random Forest
classification models. We excluded scenario 7 as it did not (RF). These classifiers were chosen due to their popularity in
contain a sufficient number of flows. We applied only the text classification using a vectorial representation of features.
malicious flows (C&C and botnet flows) to train the classifier KNN is a non-parametric lazy learning algorithm. It simply
for malware family classification. For more information about assumes that the classification of a sample is similar to other
samples that are nearby in the vector space. Random Forest is
2 https://www.stratosphereips.org/ an ensemble classifier that leverages multiple decision trees,

Authorized licensed use limited to: TU Delft Library. Downloaded on October 14,2021 at 12:15:15 UTC from IEEE Xplore. Restrictions apply.
TABLE IV: Example of the selected 2-flows for each malware represented by a binary classifier (e.g. One-vs-rest approach),
family in our dataset. and we average the binary metric across the set of classes.
Family Example 2-flow We also apply a macro-averaged Precision-Recall curve as an
Murlo
135.0-tcp-0-0-0-S0-S-2.0-96.0-0.0-0.0 evaluation metric, which gives equal weight to the classifica-
135.0-tcp-0-0-0-S0-S-2.0-96.0-0.0-0.0
14.0-icmp-0-0-0-OTH-0-0.0-0.0-0.0-0.0
tion of each label compared to micro-averaging which gives
Rbot equal weight to each per-family classification decision.
227.0-icmp-0-0-0-OTH-0-0.0-0.0-0.0-0.0
80.0-tcp-0-0-0-RSTO-ˆhR-2.0-80.0-1.0-44.0
Virut
443.0-tcp-0-0-0-REJ-Sr-2.0-96.0-1.0-40.0 VI. R ESULTS
80.0-tcp-http-318-368-RSTO-ShADadR-10.0-1052.0-3.0-496.0
Neris We discuss in the following our results from the experi-
25.0-tcp-0-0-0-S0-S-2.0-96.0-0.0-0.0
Menti
25.0-tcp-0-0-0-S0-S-4.0-192.0-0.0-0.0 ments outlined in Section V-B and the impact of the various
25.0-tcp-0-0-0-S0-S-2.0-96.0-0.0-0.0
system configurations and approaches on the classification
32234.0-udp-0-103-596-SF-Dd-1.0-131.0-2.0-652.0
NSIS.ay
31037.0-udp-0-103-596-SF-Dd-1.0-131.0-2.0-652.0 performance.
5353.0-udp-dns-1599-0-S0-D-36.0-2607.0-0.0-0.0
Miuref A. Malware Family Profile Extraction
136.0-icmp-0-0-0-OTH-0-1.0-72.0-0.0-0.0
80.0-tcp-0-0-0-S0-S-2.0-96.0-0.0-0.0
Sality
53.0-udp-dns-30-46-SF-Dd-1.0-58.0-1.0-74.0
Our results show that malware of a single family exhibit
445.0-tcp-0-0-0-REJ-Sr-1.0-52.0-1.0-40.0 network flow sequence regularities that can be used for
WannaCry
445.0-tcp-0-0-0-S0-S-1.0-52.0-0.0-0.0 malware family classification. We select 20 n-flows for each
3824.0-tcp-0-0-0-RSTOS0-R-2.0-80.0-0.0-0.0
Conficker
3821.0-tcp-0-0-0-RSTOS0-R-2.0-80.0-0.0-0.0
malware family using the method described in Section IV-B3.
130.0-icmp-0-0-0-OTH-0-1.0-72.0-0.0-0.0 We illustrate in Table IV an example of the selected 2-flows for
Notpetya
130.0-icmp-0-0-0-OTH-0-1.0-72.0-0.0-0.0 each malware family. Murlo traffic contained sub-sequences
of TCP flows to destination port 135 (Messenger Services).
The flows have a connection state S0, meaning there was
that are trained using different subsets of the training set, thus a connection attempt, but no reply. Therefore, the duration
overcoming over-fitting issues of an individual decision tree. of the flow is 0 and there was no payload. Similarly, Menti
Classifier Features. The extracted flow sequence profiles performed a multiple flow connections to destination port
for each malware family represent the features in our classi- 25 (SMTP) and port 21 (FTP). The connection state set to
fiers. We only use 20% of the n-flows for each malware family either RSTO, meaning the connection attempt was rejected
to generate and select (k = 20) behaviour profiles. This is to by the destination or S0. However, there was some successful
ensure that the profile selection does not bias the classifier TCP flow with the exchanged payload. The high number of
estimates, known as feature subset selection bias or selection outgoing flows to an SMTP port means that the botnet is
bias [31]. sending email spam, a behaviour linked to Menti that is known
Dimensionality Reduction. Due to the sparsity of the to send pharmaceutical and stock-based email spam. Although,
dataset (20 profiles * 11 (Number of classes) = 220 features), Neris 2-flows show HTTP flows with connection state set to
we apply Principle Component Analysis (PCA) [37]. PCA RST0. This means that the connection was established but
reduces the dimensionality of the dataset while retaining the the originator aborted. However, we do see other successful
variation present in the dataset, up to the maximum extent. HTTP connections with connection state SF, meaning normal
Thus, the 220 features are transformed to a new set of features, establishment and termination. WannaCry is a ransomware
known as the principal components. We set P CA = 10, known for sending SMB flows to other victim hosts on the
thus reducing the features space to 10 principle components. network. This is captured by the selected 2-flows of TCP flows
We evaluated different values for PCA, and the classifier with a destination port of 445. Miuref’s 2−flows sequences
performance when we apply P CA = 10 performs almost as were mostly HTTPS flows (port = 443), followed by DSN
well as when we use all features. The advantage of using PCA requests or DSN requests followed by an ICMP flow.
over using all features is the low memory complexity required
to run the machine learning algorithms due to reducing the B. Classifier Performance
dimensions of the features space used by the classifier. Impact of the Value of n in n-flows. We illustrate in
Classifier Performance Measures. To assess the perfor- Figure 2 the models’ performance for each value of n. Overall,
mance of the classifiers, we apply 10-fold cross-validation. the accuracy of the classifiers is at its best using 2-grams
Cross-validation removes any bias in the data while maximiz- with the KNN classifier and 6-grams with RF classifier. The
ing the number of score computations from a given dataset. As classifier accuracy was better when flows were represented as
the CTU-13 datasets consist of only a few PCAPs (executions) a sequence (n > 1) rather than individual flows (n = 1). This
per family, when performing cross-validation we compare the shows that malware behaviour is best captured when we look
different executions of the same family. at sequences of flows. For example, an individual SMTP flow
We employ evaluation measures such as Precision, Re- might not infer a malicious behaviour, or capture the behaviour
call and F-measure to evaluate the classifiers’ performance. of a particular malware class (e.g. Menti), whilst a sequence of
Although these metrics are defined for a binary classifier, rejected SMTP messages gives confidence of a maliciousness
they can be extended for multi-class problems. Each class is of the flow sequence.

Authorized licensed use limited to: TU Delft Library. Downloaded on October 14,2021 at 12:15:15 UTC from IEEE Xplore. Restrictions apply.
100.0

97.5

95.0
Classifier Performance (%)

92.5

90.0

87.5

85.0

82.5 K-nearest Neighbour (KNN)


Random Forest (RF)
80.0
1 2 3 4 5 6 7
Number of Flows (n)
Fig. 3: Macro-averaged Precision Recall Curves for each
Fig. 2: Average F-measure for Random Forest and KNN malware family (Random Forest classifier, n = 5).
classifiers, n = 1 − 7 for n-flows.

When choosing the value of n, the speed of classification classifiers, each using a similarity measure and a subset of
needs to be considered. Although a longer sub-sequence the flow attributes. For example, we train a classifier using
(higher n) results in a higher classification accuracy, it delays only resp port, protocol, service, applying only the Binary
the classification of the network behaviour to a malware Similarity measure. We show the F-measure of each of the
family. For example, choosing 7-grams requires waiting for four classifiers (n = 5) per malware class in Figure 4. In the
7 consecutive malware network communications to classify figure, a malware family with a classifier performance of 400%
and understand its behaviour, increasing the duration of the indicates that the Binary, Cosine, inter-flow, and Levenshtein
malware execution damage and delaying active remediation. classifiers each had an F-measure of 100%.
In situations where the attack implications are severe such as The highest performance resulted from the Cosine Simi-
in ransomware attacks, determining the malware class as soon larity classifier, with a 90% F-measure over most malware
as possible is critical for a quick reaction before it reaches the classes. Binary Similarity classifier was best for represent-
encryption stage. ing Murlo behaviour with 97.46% F-measure, Miuref with
Impact of Classification Algorithm. To determine the 95.86%, and Rbot with 99.26%. However, the low performance
most accurate classifier, we measured the performance based (11.78%) of the Binary Similarity for WannaCry shows that
on the machine learning algorithm used. KNN (n = 2) the flows’ attributes (resp port, protocol, service) that are
performed best (F-Measure = 95.74%) with shorter flow measured using this similarity approach might not be a unique
sequences, while RF classifier’s accuracy increased as the representative of its behaviour. However, WannaCry’s Inter-
number of flows in the sequence (n-flow) increases, reaching flow Distance classifier had the highest performance (93.59%).
95.83% when n = 6. To compare the precision and recall Thus, although the exact matching (i.e. Binary Similarity) of
trade-off of the Random Forest classifier, we present the the res port did not perform well, the average difference
macro-average Precision-Recall (PR) curve when n = 5 in of the res port between flows in the sequence (i.e Inter-
Figure 3. We employ macro-average that measures the overall flow Distance) was able to capture that malware family’s
precision and recall of all the classes to produce the macro- flow sequence behaviour. Levenshtein Distance had the lowest
averaged ROC curves. We build multi-class models, therefore, performance overall, with only having high accuracy with Rbot
averaging the evaluation measures can provide us with a view 97.52%.
of the general classifier performance. The model has an AUC
of 94%, indicating high recall and high precision. Overall, the
C. Robustness to Evasion
classification of all malware families performs well. However,
Notpetya had an AUC of 39%, with a high number of miss- Malware authors can attempt to evade sequence detection
classifications (42%). We will discuss the reasons for this miss- by changing the order of the communication flows or even
classifications and how to improve the classification accuracy injecting noise flows. We evaluate the use of the Order
in Section VII. Similarity, introduced in Section IV-B1. We randomly shuffle
Impact of Flow Sequence Similarity. We measure the the order of the flows in the 5−f lows of our malware families
effect of the four similarity measures (Binary Similarity, and test the model’s classification performance on the shuffled
Cosine Similarity, Levenshtein Distance, and Inter-flow Dis- n−flows. Change in the order of flows did not affect the
tance) and their associated flow attributes on the classifier classifier accuracy when applying Order Similarity, retaining
accuracy. Specifically, we train four Random Forest (RF) an F-measure of 95.36% for Random Forest Classifier (n = 5).

Authorized licensed use limited to: TU Delft Library. Downloaded on October 14,2021 at 12:15:15 UTC from IEEE Xplore. Restrictions apply.
400
Binary Similarity where incorrectly classified to Miuref. We identified the miss-
Levenshtein Distance

350
Cosine Similarity
Inter-flow Distance
classified Notpetya flows to a sequence of 5 flows of 445-
tcp-0-0-0-S0-S-4-192-0-0. However, such a sequence was not
300
selected as a profile for Notpetya, whilst a similar 5-flow 443-
tcp-0-0-0-S0-S-1-48-0-0 was selected as a profile for Miuref.
Classifier Performance (%)

250

Therefore, the classifier was trained to assign such an n-flow to


200 Miuref. To improve the classification, n-flows that are shared
150
by more than one malware family should be identified and
not selected beforehand. This could be done using clustering
100 approaches, which we consider for future work.
Lessons Learned. The performance of the classifier relies
50
on the quality of the profiles selected for each malware
0
Menti Conficker Virut NSIS.ay Rbot Miuref Sality Murlo Neris Notpetya WannaCry
family. We introduced a novel method for profile selection that
Malware Family
selects n-flows for each malware family using two metrics:
Fig. 4: The malware familys’ classification F-measure of the (1) average similarity score for that sequence; (2) tendency,
four Random Forest Classifiers (n = 5), each using one of the the number of times a sequence occurred. In our initial exper-
four similarity measures. iments, we noticed that selected flows for a malware family
were all similar, as they all have a high score and similar flow
attribute values except the destination port. Thus, the profile
VII. D ISCUSSION AND F UTURE W ORK selection was not capturing the various distinctive behaviour.
Accordingly, we amended the selection process to not include
We discuss how MalClassifier meets our design goals and the destination port field, selecting a distinctive set of profiles
identify potential limitations, suggesting approaches to address for each malware family. This increased the accuracy of the
them. classifier by 10%, as the profiles selected represented various
Meeting Design Goals. MalClassifier utilizes non-privacy stages of a malware family network behaviour.
invasive features to train and build its models, relying on We note that for extracting the sequence profiles we only
packet header information and not requiring deep packet looked at 20% of the traffic of each malware family. Using
inspection (i.e. content-agnostic). In addition, all identifiable these profiles, we evaluated the classifier performance in clas-
header fields such as IP addresses are removed in the network sifying the other 80% of traffic. However, in application, the
flow encoding module (i.e. IP-agnostic). As malware is known profile selection process should consider the various malware
to change its behaviour in order to evade detection, MalClassi- network behaviour stages, to ensure that the selected profiles
fier applies a fuzzy approach to flow sequence similarity to en- capture the malware behaviour in each infection stage.
sure that slight deviations in flow attribute values are detected. We measured the classifier performance using each simi-
The main challenge in malware classification is obtaining the larity measure used in the Value Similarity. The effect of the
required model training datasets. Therefore, MalClassifier uses similarity measure on each malware family differs. Although
only non-identifiable packet headers features making datasets Cosine Similarity had a positive effect on the classification
needed for training and building the models accessible. accuracy of each malware family, some families were also
MalClassifier achieved a high accuracy for malware family highly influenced by the Binary Similarity (e.g. Murlo), and
classification (F-measure ≈ 95.5%), demonstrating the effec- Inter-flow Distance (e.g. WannaCry). This provides insight on
tiveness of the system in identifying distinctive network n- the similarity measures that have the most positive influence
flows for each malware family. It is worth noticing that despite on the classifier performance, thus can be assigned a higher
the accuracy of our MalClassifier not improving significantly weight w as defined in Section IV-B1. Based on the results, we
over the state-of-the-art, it still provides a high accuracy while can assign Cosine Similarity the highest weight, followed by
preserving communications privacy and being robust against Inter-flow Distance, Binary Similarity, and finally Levenshtein
encryption. In addition, the classifier performance can be Distance.
improved by modifying the profile selection module, as we Malware Evasion. The main challenge in most behavioural-
will discuss in the next section. MalClassifier classifies n- based malware analysis approaches is malware behaviour
flows to a malware family, meaning it only requires a subset of obfuscation and manipulation, known as noise-injection at-
flows instead of obtaining the full packet trace of the malicious tacks [21]. Although malware evasion by altering the binary
binary for classification. itself might be feasible due to available obfuscation tools, we
Understanding Classifier Errors. We represent the con- believe that the network behaviour is more troublesome to
fusion matrix for the Random Forest Classifier (n = 5) in tamper with. We determine the feasibility of such an evasion
Figure 5. Each row in the confusion matrix represents the in- in respect of two associated costs: implementation complexity
stances of the actual class (i.e. True Label) while each column and effect on malware utility. The evasion complexity is based
represents the instances of the predicted class (i.e. Predicted on the ease which the malware author can modify the code
Label). The main observation is that 26% of Notpetya 5-flows to include the evasion tactic which may result in affecting it’s

Authorized licensed use limited to: TU Delft Library. Downloaded on October 14,2021 at 12:15:15 UTC from IEEE Xplore. Restrictions apply.
long periods of time without the need for modifications or
costly re-training.
VIII. C ONCLUSION
We present a novel approach for analysing and classifying
network traffic of malware variants based on their network
flow sequence behaviour. Considering the limitations of ex-
isting approaches, we proposed a system that is privacy-
preserving, time efficient, and resilient to malware evasion. We
showed MalClassifier’s effectiveness in identifying frequent
malware network n-flows and its robustness against malware
evasion by flow order alteration. MalClassifier eliminates the
need to have access to the malicious binary. This allows SOC
analysts to classify malicious network flow sequences on-the-
wire, reducing the time and effort required in other dynamic
analysis approaches while maintaining a high classification
accuracy.
IX. ACKNOWLEDGMENTS
Bushra A. AlAhmadi is supported by the Ministry of Higher
Education in the Kingdom of Saudi Arabia, the Saudi Arabian
Cultural Bureau in London, and King Saud University.
Fig. 5: Normalized confusion matrix showing actual classes
R EFERENCES
vs. predicted classes for the Random Forest Classifier (n = 5).
[1] L. Axon, B. Alahmadi, J. Nurse, M. Goldsmith, and S. Creese, “Soni-
fication in security operations centres: what do security practitioners
think?” Internet Society, 2018.
utility [32]. Malware classification systems that apply super- [2] M. Bailey, J. Oberheide, J. Andersen, Z. M. Mao, F. Jahanian, and
vised machine learning approach require continuous training J. Nazario, “Automated classification and analysis of internet malware,”
in International Workshop on Recent Advances in Intrusion Detection.
of new malware variants to adapt to behavioural changes. Springer, 2007, pp. 178–197.
However, applying a fuzzy similarity measure allows a degree [3] U. Bayer, P. M. Comparetti, C. Hlauschek, C. Kruegel, and E. Kirda,
of flexibility in malware behavioural change, thus only needs “Scalable, behavior-based malware clustering.” in NDSS, vol. 9. Cite-
seer, 2009, pp. 8–11.
training on samples of a new malware family. In addition, [4] Z. Berkay Celik, R. J. Walls, P. McDaniel, and A. Swami, “Malware
MalClassifier adapts to flow sequence order manipulation by traffic detection using tamper resistant features,” in Military Communi-
applying the Order Similarity approach. cations Conference, MILCOM. IEEE, 2015, pp. 330–335.
[5] L. Bilge, D. Balzarotti, W. Robertson, E. Kirda, and C. Kruegel, “Disclo-
Limitations and Future Work. We identified that the sure: detecting botnet command and control servers through large-scale
reason for the classifier misclassifications was due to the fea- netflow analysis,” in Proceedings of the 28th Annual Computer Security
ture selection not considering similarities of n-flows between Applications Conference. ACM, 2012, pp. 129–138.
[6] S. Cesare and Y. Xiang, “Classification of malware using structured
families. Although we discussed how we ensure the selection control flow,” in Proceedings of the Eighth Australasian Symposium on
of distinctive flows for a malware family, these flows should Parallel and Distributed Computing - Volume 107, ser. AusPDC ’10.
also not be similar to selected flows for other families. For Darlinghurst, Australia, Australia: Australian Computer Society, Inc.,
2010, pp. 61–70.
example, Neris, Virut and Sality are all bots that send email [7] E. Gandotra, D. Bansal, and S. Sofat, “Malware analysis and classifica-
spam, and identifying network flow sequences distinctive for tion: A survey,” Journal of Information Security, vol. 2014, 2014.
each family can avoid n-flow misclassifications. Thus, to [8] S. Garcia, M. Grill, J. Stiborek, and A. Zunino, “An empirical compar-
ison of botnet detection methods,” computers & security, vol. 45, pp.
improve the profile selection process, we plan to use K-means 100–123, 2014.
clustering to identify flows that are similar in more than one [9] S. Garcı́a, A. Zunino, and M. Campo, “Survey on network-based botnet
family, to avoid using these flows as profiles. Moreover, we detection methods,” Security and Communication Networks, vol. 7, no. 5,
pp. 878–903, 2014.
plan on identifying the frequent n-flows in benign traffic, to [10] G. Gu, R. Perdisci, J. Zhang, W. Lee et al., “Botminer: Clustering
reduce the false positives. analysis of network traffic for protocol-and structure-independent botnet
As a future work, we also aim to measure the evolution of detection.” in USENIX Security Symposium, vol. 5, no. 2, 2008, pp. 139–
154.
malware network flow behaviour sequence and determine to [11] G. Gu, P. A. Porras, V. Yegneswaran, M. W. Fong, and W. Lee,
what extent does change the sequence behaviour affect the “Bothunter: Detecting malware infection through ids-driven dialog cor-
classification accuracy. In particular, we will measure how relation.” in Usenix Security, vol. 7, 2007, pp. 1–16.
[12] G. Gu, J. Zhang, and W. Lee, “Botsniffer: Detecting botnet command
the network behaviour of malware samples of a family has and control channels in network traffic,” 2008.
changed. Identifying behavioural changes of malware samples [13] F. Howard. (2015, jul) A closer look at the angler exploit kit.
will assist in measuring how MalClassifier classifier performs [14] J. Z. Kolter and M. A. Maloof, “Learning to detect malicious executables
in the wild,” in Proceedings of the tenth ACM SIGKDD international
against these changes. The capability of classifiers to adapt to conference on Knowledge discovery and data mining. ACM, 2004, pp.
network behavioural changes ensures classifier accuracy for 470–478.

Authorized licensed use limited to: TU Delft Library. Downloaded on October 14,2021 at 12:15:15 UTC from IEEE Xplore. Restrictions apply.
[15] M. Lesot, M. Rifqi, and H. Benhadda, “Similarity measures for binary [37] S. Wold, K. Esbensen, and P. Geladi, “Principal component analysis,”
and numerical data: a survey,” Int. J. Knowl. Eng. Soft Data Paradigm., Chemometrics and intelligent laboratory systems, vol. 2, no. 1-3, pp.
vol. 1, no. 1, pp. 63–84, Dec. 2009. 37–52, 1987.
[16] V. I. Levenshtein, “Binary codes capable of correcting deletions, inser- [38] C. Wressnegger, G. Schwenk, D. Arp, and K. Rieck, “A close look on
tions and reversals,” in Soviet physics doklady, vol. 10, 1966, p. 707. n-grams in intrusion detection: anomaly detection vs. classification,” in
[17] H. Mekky, A. Mohaisen, and Z.-L. Zhang, “Separation of benign and Proceedings of the 2013 ACM workshop on Artificial intelligence and
malicious network events for accurate malware family classification,” in security. ACM, 2013, pp. 67–76.
Communications and Network Security (CNS), 2015 IEEE Conference
on. IEEE, 2015, pp. 125–133.
[18] A. Mohaisen, A. G. West, A. Mankin, and O. Alrawi, “Chatter: Classify-
ing malware families using system event ordering,” in Communications
and Network Security (CNS). IEEE, 2014, pp. 283–291.
[19] J. Oltsik, “Soc-as-a-service for midmarket and small enterprise organi-
zations,” The Enterprise Strategy Group, Tech. Rep., mar 2015.
[20] Y. Park, D. Reeves, V. Mulukutla, and B. Sundaravel, “Fast malware
classification by automated behavioral graph matching,” in Proceedings
of the Sixth Annual Workshop on Cyber Security and Information
Intelligence Research, ser. CSIIRW ’10. New York, NY, USA: ACM,
2010, pp. 45:1–45:4.
[21] R. Perdisci, D. Dagon, W. Lee, P. Fogla, and M. Sharif, “Misleading
worm signature generators using deliberate noise injection,” in 2006
IEEE Symposium on Security and Privacy (S&P’06). IEEE, 2006, pp.
15–pp.
[22] R. Perdisci, W. Lee, and N. Feamster, “Behavioral clustering of http-
based malware and signature generation using malicious network traces.”
in NSDI, 2010, pp. 391–404.
[23] M. Z. Rafique and J. Caballero, “Firma: Malware clustering and network
signature generation with mixed network behaviors,” in International
Workshop on Recent Advances in Intrusion Detection. Springer, 2013,
pp. 144–163.
[24] M. Z. Rafique, P. Chen, C. Huygens, and W. Joosen, “Evolutionary
algorithms for classification of malware families through different net-
work behaviors,” in Proceedings of the 2014 conference on Genetic and
evolutionary computation. ACM, 2014, pp. 1167–1174.
[25] D. K. S. Reddy and A. K. Pujari, “N-gram analysis for computer virus
detection,” Journal in Computer Virology, vol. 2, no. 3, pp. 231–239,
2006.
[26] K. Rieck, T. Holz, C. Willems, P. Düssel, and P. Laskov, Detection of
Intrusions and Malware, and Vulnerability Assessment: 5th International
Conference, DIMVA 2008, Paris, France, July 10-11, 2008. Proceedings.
Berlin, Heidelberg: Springer Berlin Heidelberg, 2008, ch. Learning and
Classification of Malware Behavior, pp. 108–125.
[27] K. Rieck, G. Schwenk, T. Limmer, T. Holz, and P. Laskov, “Botzilla:
Detecting the phoning home of malicious software,” in proceedings of
the 2010 ACM Symposium on Applied Computing. ACM, 2010, pp.
1978–1984.
[28] K. Rieck, P. Trinius, C. Willems, and T. Holz, “Automatic analysis
of malware behavior using machine learning,” Journal of Computer
Security, vol. 19, no. 4, pp. 639–668, 2011.
[29] C. Rossow, C. J. Dietrich, H. Bos, L. Cavallaro, M. Van Steen,
F. C. Freiling, and N. Pohlmann, “Sandnet: Network traffic analysis of
malicious software,” in Proceedings of the First Workshop on Building
Analysis Datasets and Gathering Experience Returns for Security.
ACM, 2011, pp. 78–88.
[30] I. Santos, Y. K. Penya, J. Devesa, and P. G. Bringas, “N-grams-based
file signatures for malware detection.” ICEIS (2), vol. 9, pp. 317–320,
2009.
[31] S. K. Singhi and H. Liu, “Feature subset selection bias for classifica-
tion learning,” in Proceedings of the 23rd international conference on
Machine learning. ACM, 2006, pp. 849–856.
[32] E. Stinson and J. C. Mitchell, “Towards systematic evaluation of the
evadability of bot/botnet detection methods.” WOOT, vol. 8, pp. 1–9,
2008.
[33] G. Suarez-Tangil, J. E. Tapiador, P. Peris-Lopez, and J. Blasco, “Den-
droid: A text mining approach to analyzing and classifying code struc-
tures in android malware families,” Expert Systems with Applications,
vol. 41, no. 4, pp. 1104–1117, 2014.
[34] Symantec, “Adaptive Behavior-Based Malware Protection,” Tech. Rep.
[35] ——. (2016) Internet security threat report.
[36] F. Tegeler, X. Fu, G. Vigna, and C. Kruegel, “Botfinder: Finding bots in
network traffic without deep packet inspection,” in Proceedings of the
8th International Conference on Emerging Networking Experiments and
Technologies, ser. CoNEXT ’12. New York, NY, USA: ACM, 2012,
pp. 349–360.

Authorized licensed use limited to: TU Delft Library. Downloaded on October 14,2021 at 12:15:15 UTC from IEEE Xplore. Restrictions apply.

You might also like