Using Deep Learning For Information Security
Using Deep Learning For Information Security
In the next few years, deep learning will transform and expand as a decision engine across
every enterprise business layer from product development to operations to finance to sales.
While, internet biggies like Google, Facebook, Microsoft, and Salesforce have already
embedded deep learning into their products, the cybersecurity industry is also catching up to
leverage it for various use cases.
In this Part 1 of the technical white paper series, we briefly introduce Deep Learning (DL)
along with a few existing InfoSec applications it enables. We then take a deep dive into the
interesting problem of anonymous tor traffic detection. We present a DL-based solution to
detect TOR traffic detection.
Deep learning is not a silver bullet that can solve all the InfoSec problems because it
needs extensive labeled datasets and no such labeled datasets are readily available.
However, there are several InfoSec use cases where the deep learning networks are making
significant improvements to the existing solutions. Malware detection and network intrusion
detection are two such areas where deep learning has shown significant improvements over
the rule-based and classic machine learning-based solutions. Network intrusion detection
systems are typically rule-based and signature-based controls that are typically deployed at
the perimeter to detect known threats. Adversaries change the malware signatures and
easily evade the traditional network intrusion detection systems. Quamar et al. [1], in their
IEEE transaction paper showed that deep learning (DL)-based systems using self-taught
learning to be promising in detecting unknown network intrusions. Traditional security use
cases such as malware detection and spyware detection have been tackled with deep neural
net-based systems [2].
Figure 1: Greatest inspiration that we can get is from the nature- figure depicts a biological
neuron and artificial neuron.
A set of neuron layers can be used to create a neural network. The network architecture
differs based on the objective it needs to achieve. A common network architecture is an
Feed Forward Neural Network (FFN). Neurons are arranged linearly without any cycles to
form a FFN. It is called a feed forward because information travels forward direction inside
the network, first through the input neurons layer, then through the hidden neurons layers,
and the output neurons layer (Figure 2). Like any supervised machine learning model, the
FFN needs to be trained using the labeled data. The training is in the form of optimizing the
parameters by reducing the error between the output value and the true value. One such
important parameter to optimize is the weight each neuron gives to each of its input signals.
For a single neuron, the weight can be easily computed using the error. However, when a
set of neurons are collated in multiple layers, it is challenging to optimize the neuron weights
in multiple layers based on the error computed at the output layer. The backpropagation
algorithm helps to address this issue [6]. Backpropagation is an old technique which comes
under branch of computer algebra, automatic differentiation which is used calculate
radient that is needed in the calculation of the weights to be used in the network. In an
the g
FFN, based on activation of each linked neuron, the output is obtained. The error is
propagated layer by layer. Based on the correctness of the output with the final outcome, the
error is calculated. This error is then in turn back propagated to fix errors of internal neurons.
For each of the data instance, the parameters are optimized by going through multiple
iterations.
Figure 3: An illustration of TOR communication between Alice and destination server. The
communication starts with Alice requesting a path to the server. TOR network gives
path which is AES encrypted. The randomization of path happens inside the TOR
network. The encrypted path of the packet is shown the red. Upon reaching the exit
node, which is the periphery node of the TOR network, the plain packet is transferred
to the server.
Anonymous network/traffic can be accomplished through various means [8]. They can be
broadly classified into two: network based (TOR, I2P,Freenet) or custom os
based(subgraph OS, Freepto). Among them, TOR is one of the popular choices. TOR is a
free software that enables anonymous communication over the internet through a
specialized routing protocol known as the onion routing protocol [9]. The protocol depends
on redirecting internet traffic over various freely hosted relays across the world. During the
relay, like the layers of onion peel, each HTTP packet is encrypted using the public key of
the receiver. At each receiver point, the packet can be decrypted using the private key. Upon
decryption, the next destination relay address is revealed. This carries on until the exit node
of TOR network is met, where the decryption of the packet ends, and plain HTTP packet is
forwarded to the original destination server. An example routing scheme between Alice and
the server is depicted in Figure 3 for illustration.
The original intent of launching TOR was to safeguard the privacy of the users. However,
adversaries have hijacked the good Samaritan objective to use it for various nefarious
means instead. As of the 2016 report, around 20% of the tor traffic accounts for illegal
activities [9]. In an enterprise network, TOR traffic is curtained by not allowing the installation
of TOR client or blocking the Guard or Entry node IP address. However, there are numerous
means through which adversaries and malware can get access to TOR network to transfer
data or information. The IP blocking strategy is not a sound strategy. Adversaries can spawn
different IPs to carry out the communication. A bad bot landscape report by distil networks
[5] shows that 70% of automated attacks in 2015 used multiple IPs, and 20% of automated
attacks used over 100 IPs.
Another way to detect TOR traffic is through traffic analysis. This requires big data
technologies to boil the ocean. However, using Acalvio’s patented Shadowplex deception
solution, TOR traffic can be detected without any of these challenges. To enable this
detection, we leverage Deep Learning-based classification models.
TOR traffic can be detected by analyzing the traffic packets. This analysis can be on the
TOR node, or in between the client and the entry node. The analysis is done on a single flow
of packet. Each flow constitutes a tuple of source address, source port, destination address,
and destination port. Network flows for different time intervals are extracted and analysis is
carried on them. G. He et al. [10] in their paper “Inferring Application Type Information from Tor
Encrypted Traffic” extracted burst volumes and directions to create HMM model to detect the
TOR applications that might be generating that traffic. Most of the popular works in this area
leverages time-based features along with other features like size and port information
[11,13,14] to detect TOR traffic. We take inspiration from Habibi et al’s “ Characterization of
Tor Traffic using Time based Features” [11] paper and follow a time-based approach over
extracted network flow to detect TOR traffic for this paper. However, our architecture uses a
plethora of other meta-information that can be obtained to classify the traffic. This is
inherently due to Deep Learning architecture that has been chosen to solve this problem.
More about this will follow later in the article.
ACTIVE
The amount of time time a flow was active
before going idle (mean, min, max, std).
Apart from these parameters, other flow-based parameters are also included. A sample
instance from the dataset is shown in Figure 4.
Source IP, Source Port, Destination IP, Destination Port, Protocol, Flow Duration, Flow
Bytes/s, Flow Packets/s, Flow IAT Mean, Flow IAT Std, Flow IAT Max, Flow IAT Min,Fwd
IAT Mean, Fwd IAT Std, Fwd IAT Max, Fwd IAT Min,Bwd IAT Mean, Bwd IAT Std, Bwd IAT
Max, Bwd IAT Min,Active Mean, Active Std, Active Max, Active Min,Idle Mean, Idle Std, Idle
Max, Idle Min,label
10.0.2.15,53913,216.58.208.46,80,6,435,0,4597.7011494253,435,0,435,435,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,nonTOR
Figure 5: Deep learning network representation used for TOR traffic detection.
The hidden layers are varied between 2 to 10. We found N=5 to be optimal. For activation,
Relu is used for all the hidden layers. Each layer of Hidden layer is dense in nature of
dimension 100.
model = Sequential()
model.add(Dense(feature_dim, input_dim= feature_dim, kernel_initializer='normal',
activation='relu'))
for _ in range(0, hidden_layers-1):
model.add(Dense(neurons_num, kernel_initializer='normal', activation='relu'))
model.add(Dense(1,kernel_initializer='normal', activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=["accuracy"])
Figure 6: A Python Code Snippet of the FFN in Keras.
The output node is activated by a sigmoid function. This was used so as the output is binary
classification - Tor or Non-Tor.
We use Keras with Tensorflow in the backend to train the DL module. Binary cross entropy
loss was used for optimizing the FFN. The model was trained for different epochs. Figure 7
below shows training simulation for a run depicting the increasing performance and
decreasing loss value as the number of epochs increase.
Figure 7: Tensorboard generated statics depicting the network training process
The results of the deep learning system are compared with various other estimators.
Standard classification metrics of Recall, Precision and F-Score is used to measure the
efficacy of the estimators. DL-based system is able to detect TOR class well. However, it is
the Non-Tor class that we need to give more importance. It is seen that Deep
Learning-based system can reduce the false positives cases for Non-Tor category samples.
The results are shown in the table below:
Table 2: The output of ML and DL Models for the Tor Traffic Detection
Among various classifiers, Random Forest and Deep learning based approach perform
better than the rest. The result shown is based on 55K training instances. The dataset used
in this data experiment is comparatively a smaller dataset for the DL-based systems. As the
training data increases, performance would increase further for both DL-based and Random
forest classifier. However, for large datasets, DL-based classifier typically outperform other
classifiers, and they can be generalised for similar types of applications. For example, if one
needs to train a classifier to detect the application used by TOR, then only the output layer
needs retraining, and other layers can be kept same whereas other ML-classifiers need to
be retrained for the entire dataset. Retraining the model may take significant computing
resources for large datasets.
Conclusion:
Anonymized traffic detection is a nuance that every enterprise face. The adversaries use
TOR channels to exfiltrate data in anonymous mode. Current approaches by tor traffic
detection vendors depend on blocking known entry nodes of the TOR network. This is not a
scalable approach and can be easily bypassed. A generic method is to use deep
learning-based techniques. In this paper, we presented a deep learning-based system to
detect the TOR traffic with high recall and precision.
Acalvio’s Shadowplex deception solution can detect the lateral movement, privilege
escalation and data exfiltration by the adversaries that have already crossed the perimeter
and hiding within the enterprise network. Shadowplex has the capabilities to engage with the
threats using different types of high interaction deceptions, e.g., hosts, databases, and
shares. When the adversary tries to exfiltrate the content from the high interaction
deceptions, Shadowplex detects them using a combination of the host intrusion detection
systems and deep learning-based techniques. In the next paper, we will share more such
capabilities of deep learning-based models in detecting hidden threats to improve the
security defense.
Reference:
[1]: Quamar Niyaz, Weiqing Sun, Ahmad Y Javaid, and Mansoor Alam, “A Deep Learning Approach
for Network Intrusion Detection System,” IEEE Transactions on Emerging Topics in Computational
Intelligence, 2018.
[2]: Daniel Gibert, “Convolutional Neural Networks for Malware Classification,” Thesis 2016.
[3]: Wookhyun Jung, Sangwon Kim,, Sangyong Choi, “Deep Learning for Zero-day Flash Malware
Detection,” IEEE security, 2017.
[4]: Paweł Kobojek and Khalid Saeed, “Application of Recurrent Neural Networks for User
Verification based on Keystroke Dynamics,” Journal of telecommunications and
information technology, 2016.
[5]:Deep Learning Security Papers,
http://www.covert.io/the-definitive-security-datascience-and-machinelearning-guide/#deep-learning-an
d-security-papers, accessed on May 2018.
[6]: “Deep Learning,” Ian Goodfellow, Yoshua Bengio, Aaaron Courville; pp 196, MIT
Press, 2016.
[7]: “The Onion Ransomware,”
https://www.kaspersky.co.in/resource-center/threats/onion-ransomware-virus-threat, Retrieved on
November 29, 2017.
[8]: “5 best alternative to TOR.,”
https://fossbytes.com/best-alternatives-to-tor-browser-to-browse-anonymously/, Retrieved on
November 29,2017.
[9]: Tor. Wikipedia., https://en.wikipedia.org/wiki/Tor_(anonymity_network), Retrieved on November
24, 2017.
[10]: He, G., Yang, M., Luo, J. and Gu, X., “ Inferring Application Type Information from Tor
Encrypted Traffic,” Advanced Cloud and Big Data (CBD), 2014 Second International Conference
on (pp. 220-227), Nov. 2014.
[11]: Habibi Lashkari A., Draper Gil G., Mamun M. and Ghorbani A., “Characterization of Tor Traffic
using Time based Features,” Proceedings of the 3rd International
Conference on Information Systems Security and Privacy - Volume 1,pages
253-262, 2017.
[13]: Juarez, M., Afroz, S., Acar, G., Diaz, C. and Greenstadt, R., “A critical evaluation of website
fingerprinting attacks,” Proceedings of the 2014 ACM SIGSAC Conference
on Computer and Communications Security (pp. 263-274), November 2014
[14]: Bai, X., Zhang, Y. and Niu, X., “Traffic identification of tor and web-mix,” Intelligent Systems
Design and Applications, ISDA'08. Eighth International Conference on (Vol. 1, pp.
548-551). IEEE, November 2008