0% found this document useful (0 votes)
10 views2 pages

Are - CNN - Based - Malware - Detection - Models - R

The document proposes a framework to improve the robustness of deep learning based malware detection models against adversarial attacks. It develops CNN based malware detection models using android permission and intent features. An attack called ECO-FGSM is designed to generate adversarial samples to fool the models. Adversarial retraining is then used as a defense to develop robust models.

Uploaded by

Andi Novianto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views2 pages

Are - CNN - Based - Malware - Detection - Models - R

The document proposes a framework to improve the robustness of deep learning based malware detection models against adversarial attacks. It develops CNN based malware detection models using android permission and intent features. An attack called ECO-FGSM is designed to generate adversarial samples to fool the models. Adversarial retraining is then used as a defense to develop robust models.

Uploaded by

Andi Novianto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Poster Abstract: Are CNN based Malware Detection Models

Robust? Developing Superior Models using Adversarial Attack


and Defense
Hemant Rathore, Taeeb Bandwala, Sanjay K. Sahay, Mohit Sewak
Dept. of CS & IS, Goa Campus, BITS Pilani, India
Security & Compliance Research, Microsoft R & D, India

ABSTRACT
The tremendous increase of malicious applications in the android
ecosystem has prompted researchers to explore deep learning based
malware detection models. However, research in other domains sug-
gests that deep learning models are adversarially vulnerable, and
thus we aim to investigate the robustness of deep learning based
malware detection models. We first developed two image-based
E-CNN malware detection models based on android permission and
intent. We then acted as an adversary and designed the ECO-FGSM
evasion attack against the above models, which achieved more than
50% fooling rate with limited perturbations. The evasion attack con-
verts maximum malware samples into adversarial samples while
minimizing the perturbations and maintaining the sample’s syntac-
tical, functional, and behavioral integrity. Later, we used adversarial Figure 1: Framework for Robust Malware Detection Models
retraining to counter the evasion attack and develop adversarially
superior malware detection models, which should be an essential
years. The current malware detection models are based on signa-
step before any real-world deployment.
ture, heuristics, etc., and cannot handle the ever-increasing volume
and veracity of android malware. Thus anti-malware community in-
CCS CONCEPTS
vestigated malware detection models based on deep learning, which
· Security and privacy → Malware and its mitigation. have shown encouraging results [3]. However, research in other
domains like image recognition, object classification, etc., suggests
KEYWORDS that deep learning models might be adversarially vulnerable [1].
Adversarial Learning, Malware Analysis and Detection, Machine Hence, malware detection models based on deep learning should be
Learning, Smartphones investigated for adversarial robustness before integrating them into
real-world solutions. The adversarial threat modeling can be used
ACM Reference Format:
Hemant Rathore, Taeeb Bandwala, Sanjay K. Sahay, Mohit Sewak. 2021. to find vulnerabilities in classification models, and it is performed
Poster Abstract: Are CNN based Malware Detection Models Robust? De- based on the adversary’s goal, knowledge, and capability.
veloping Superior Models using Adversarial Attack and Defense. In The
19th ACM Conference on Embedded Networked Sensor Systems (SenSys ’21), 2 PROPOSED FRAMEWORK
November 15–17, 2021, Coimbra, Portugal. ACM, New York, NY, USA, 2 pages. We propose a modularized framework (Fig: 1) to improve the ro-
https://doi.org/10.1145/3485730.3492867
bustness of any malware detection model. In the first step, android
samples (malware and benign) are collected from various verified
1 INTRODUCTION sources. The dataset used in this work is discussed in detail in our
The last decade witnessed an enormous increase in the number of past paper [2]. We then perform static analysis to extract permission
active android smartphone users coupled with a limitless growth and intent features from each android sample present in the dataset.
in the android application space. These devices store a vast amount These are then fed into the designed classification pipeline that
of general and personal user data that can be exploited by android processes the feature vectors for training and testing the proposed
malware, which have also grown exponentially in the last few E-CNN malware detection models. The E-CNN model implicitly
transforms processed feature vector into image that is fed into
Permission to make digital or hard copies of part or all of this work for personal or CNN layers for malware detection. Next, we design the ECO-FGSM
classroom use is granted without fee provided that copies are not made or distributed adversarial agent to perform evasion attacks against the above
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for third-party components of this work must be honored. detection models. The ECO-FGSM agent takes malware samples
For all other uses, contact the owner/author(s). and transforms them into adversarial samples that are forcibly mis-
SenSys ’21, November 15–17, 2021, Coimbra, Portugal classified as benign by E-CNN models. The agent ensures that the
© 2021 Copyright held by the owner/author(s).
ACM ISBN 978-1-4503-9097-2/21/11. perturbations do not break the syntactical, functional, and behav-
https://doi.org/10.1145/3485730.3492867 ioral integrity of the malware sample. We execute the ECO-FGSM

355
SenSys ’21, November 15–17, 2021, Coimbra, Portugal Rathore et al.

evasion attack against the above E-CNN models and measure their
performance. In the final step, we execute the adversarial defense
strategy, namely adversarial retraining to counter the evasion attack
and develop adversarially superior malware detection models. We
evaluate the performance of the models before and after the attack
using accuracy, fooling rate, etc.

3 ATTACK AND DEFENSE STRATEGY


We propose a novel, targeted, white-box, evasion attack namely
Embedding COsine similarity based Fast Gradient Sign Method (ECO-
FGSM) against E-CNN malware detection models. The attack first
computes a signed gradient image of the loss function w.r.t. to the Figure 2: Fooling rate during ECO-FGSM attack against per-
input image entering the CNN layers of the detection model. The mission / intent based E-CNN malware detection model
gradient image is an array of adjacent embedding gradient vectors,
each corresponding to a specific feature, i.e., permission/intent used
by the application. For each feature, the attack agent then computes
the cosine similarity between the embedding gradient vector and
the difference between 1-bit and 0-bit pretrained embedding vectors.
This gives a similarity score which is a measure of the vulnerability
of adding that permission or intent in the original sample. The agent
then selects the most vulnerable permission/intent and adds it to
the malware sample to convert it into an adversarial sample. The
agent continues to add perturbations step by step till the potential
adversarial sample successfully fools the detection model.
Literature suggests many studies claiming adversarial retrain- Figure 3: Accuracy variations during the proposed workflow
ing as an effective strategy for developing adversarially superior
deep learning models [1]. In the light of existing works, we also
The baseline malware detection models achieved high accuracy
use adversarial retraining as the defense strategy against the pro-
(blue bar). Then there was a sudden drop in the accuracy due to the
posed ECO-FGSM attack. We retrain the E-CNN malware detection
ECO-FGSM attack on detection models (red bar). The adversarially
models on an augmented dataset containing all the samples from
retrained detection models achieved higher accuracy (yellow bar)
the original dataset and the adversarial samples generated during
and showed superior robustness after reattack on retrained models
the evasion attack. It is expected that adversarial retraining will
(green bar). Therefore we can infer that the adversarial retrain-
influence parameters like model weights etc., thereby improving
ing of both the E-CNN malware detection models improved their
the robustness and generalization ability of the E-CNN malware
robustness against evasion attacks than their original counterparts.
detection models.
5 CONCLUSION
4 EXPERIMENTAL RESULTS Deep learning models have shown superior performance in many
Figure 2 shows the fooling rate achieved by the proposed ECO- domains, but the literature suggests they might be vulnerable to
FGSM attack against permission and intent based E-CNN malware adversarial attacks. Therefore we first constructed two custom
detection models. The ECO-FGSM attack achieved a fooling rate E-CNN malware detection models using android permission and
(blue line) of 23%, 42% and 56% with a maximum of 10, 20 and 30, intent. Then ECO-FGSM evasion attack was performed on both
perturbations respectively against permission-based E-CNN mal- the above models, which exploited vulnerabilities and significantly
ware detection model. On the other hand, the ECO-FGSM attack decreased their performance. Finally, adversarial retraining was
achieved an even higher fooling rate (red line) of 54%, 99%, and used to counter the attack and produce adversarially superior mal-
almost 100% with a maximum of 10, 20 and 30, perturbations respec- ware detection models. We also concluded that such investigations
tively against intent-based E-CNN malware detection model. The are crucial for malware detection models and should be performed
results suggest that the proposed ECO-FGSM attack is successful as before their real-world deployment.
it achieved more than 50% fooling rate against both the detection
models. Also, the permission-based detection model is more robust REFERENCES
than intent-based detection model. [1] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and
Graph 3 depicts the performance of permission/intent based E- harnessing adversarial examples. International Conference on Learning Representa-
tions (ICLR) (2015).
CNN malware detection model during baseline, after ECO-FGSM [2] Hemant Rathore, Sanjay K Sahay, Piyush Nikam, and Mohit Sewak. 2020. Robust
attack on detection model, after adversarial retraining of detec- Android Malware Detection System Against Adversarial Attacks Using Q-Learning.
Information Systems Frontiers (2020), 1ś16.
tion model, and after ECO-FGSM reattack on the adversarially [3] Yanfang Ye, Tao Li, Donald Adjeroh, and S Sitharama Iyengar. 2017. A survey on
retrained model. Both the permission and intent based detection malware detection using data mining techniques. ACM Computing Surveys (CSUR)
models showed a similar pattern during the above experiments. 50, 3 (2017), 1ś40.

356

You might also like