0% found this document useful (0 votes)
2 views24 pages

Final Presentation

The document discusses the development of advanced techniques for detecting Android malware, highlighting the vulnerabilities of the Android operating system due to third-party app installations. It outlines the objectives, methodology, and results of using machine learning classifiers on the TUANDROMD dataset to improve malware detection accuracy. The findings indicate that the KNN-RandomOverSampler model achieved the highest accuracy, emphasizing the importance of preprocessing and explainable AI in enhancing model interpretability and performance.

Uploaded by

ashutoshmali900
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views24 pages

Final Presentation

The document discusses the development of advanced techniques for detecting Android malware, highlighting the vulnerabilities of the Android operating system due to third-party app installations. It outlines the objectives, methodology, and results of using machine learning classifiers on the TUANDROMD dataset to improve malware detection accuracy. The findings indicate that the KNN-RandomOverSampler model achieved the highest accuracy, emphasizing the importance of preprocessing and explainable AI in enhancing model interpretability and performance.

Uploaded by

ashutoshmali900
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Android Malware Detection

G Sai Soumith
190911168
Information Technology

Under the guidance of


Dr. Diana Olivia
Associate Professor
Department of I & CT
Manipal Institute of Technology
Manipal, India

May 13, 2024

Department of Information and Communication Technology 1/ 24


Presentation Outline

1 Introduction
2 Literature Survey
3 Problem Definition
4 Objectives
5 Scope
6 Methodology
7 Results
8 Conclusion

Department of Information and Communication Technology 2/ 24


Introduction

Android is currently one of the most widely used operating


systems.
The Android OS allows third-party app installation, making
the devices vulnerable to malware.
Android malware contributes to more than 46% of all mobile
malware categories.
Devices infected with malware pose a severe risk to not just
people but also organizations.

Department of Information and Communication Technology 3/ 24


Literature Survey

Studies have been conducted to create models that enhance the


Android operating system’s ability to detect malware.

Department of Information and Communication Technology 4/ 24


Problem Definition

There is a need for advanced techniques that can efficiently detect


and classify Android malware. Despite advancements in malware
detection techniques, there remain challenges in developing precise
and efficient models for identifying malicious events on Android
devices.
Performance Optimization
Feature Relevance and Selection
Interpretability

Department of Information and Communication Technology 5/ 24


Objectives

1 Using an updated dataset, create a precise model for


identifying malware events on Android devices.
2 Investigate the effect of dimensionality reduction to reduce
the time complexity and to improve the model accuracy.
3 Examine how well different classifiers detect malware on
Android smartphones.
4 Examine the characteristics that have the biggest impact on
malware classification using XAI methods.

Department of Information and Communication Technology 6/ 24


Scope

Android smartphones users stand to benefit significantly from


the development of robust malware detection models. Users
can safeguard their personal data, privacy, and overall device
security.
Organizations and enterprises relying on Android devices for
business operations will find value in this research. Businesses
can protect sensitive information.
Ultimately all stakeholders involved in the Android ecosystem
greatly benefit from this research.

Department of Information and Communication Technology 7/ 24


Methodology

The proposed methodology is outlined in the figure, where the dataset,


TUANDROMD, is to be utilized. Ten machine learning classifiers are to be trained
and assessed based on five metrics: accuracy, AUC, precision, recall, and F1-score.

Department of Information and Communication Technology 8/ 24


Dataset Description
In this study, the TUANDROMD dataset is used, consisting of
241 android permission and API features crucial for
distinguishing between benign Android apps and malware.
The dataset encompasses features extracted from 4,465 app
samples, of which 899 are benign and 3,565 are Android
malware resulting in highly imbalanced data.

Department of Information and Communication Technology 9/ 24


Data Balancing Techniques
Employing data balancing methods is crucial because of the
significant imbalance present in the TUANDROMD dataset.
Techniques such as RandomOversampler, SMOTETomek, and
RandomUndersampler, rectify this imbalance, ensuring that
the model is trained on a more representative and balanced
dataset.

Department of Information and Communication Technology 10/ 24


Department of Information and Communication Technology 11/ 24
Dimensionality Reduction
Principal Component Analysis simplifies the complexity of
datasets by transforming high-dimensional data into a
lower-dimensional representation while retaining most of the
relevant information.
This allows for the removal of redundant and less informative
features.
The dimensionality reduction phase has given impressive
results. With the application of PCA, the features were
reduced to a mere 25.

Department of Information and Communication Technology 12/ 24


Classifiers
In this study, a set of ten different classifiers are used. These
classifiers include RandomForest Classifier, K-Nearest
Neighbors Classifier, DecisionTreeClassifier,
LogisticRegression, AdaBoost Classifier, CatBoost Classifier,
LightGBM Classifier, XGBoost Classifier, and Support Vector
Classifier.
Additionally, a stacked ensemble model is utilized, combining
the predictions of multiple base classifiers to improve overall
performance.

Department of Information and Communication Technology 13/ 24


Explainable Artificial Intelligence
Explainable Artificial Intelligence (XAI) serves as a critical
bridge between machine learning models and their
interpretability.
XAI techniques like SHAP, ELI5 and DALEX are utilized in
this study.

Department of Information and Communication Technology 14/ 24


Results

Using original data, three sampling techniques and ten


different ML models, a total of forty combinations are
analysed.
LightGBM-RandomOverSampler and
KNN-RandomOverSampler have the best performance with
accuracy of 99.63%, outperforming all previously reported
models on the TUANDROMD dataset.
Given that fast performance is crucial for addressing security
concerns using malware detection, it is evident that the
KNN-RandomOverSampler model, with a training time of
0.0037s, outper-forms the LightGBM-RandomOverSampler
model, which has a training time of 0.4052s.

Department of Information and Communication Technology 15/ 24


The stacked ensemble model achievied an accuracy of 99.53%.
Offers robustness and generalization capabilities crucial for
real-world applications.
Side 1: RF, KNN, DT, and Logistic Regression
Side 2: AdaBoost, CatBoost, LightGBM, and XGBoost

Department of Information and Communication Technology 16/ 24


The significance of preprocessing techniques in this study
cannot be ignored.
In addressing the data imbalance issue, the utilization of
RandomOverSampler proved to be the most effective method,
resulting in improved accuracy for nine out of ten of the
models.

Department of Information and Communication Technology 17/ 24


Accuracies of top two models using different sampling techniques

Department of Information and Communication Technology 18/ 24


The application of PCA in the dimensionality reduction phase
further enhanced the efficiency of the dataset by reducing the
number of features to just 25.
This reduction in feature space contributes to a significant
decrease in data complexity, ultimately facilitating better
model performance.

Department of Information and Communication Technology 19/ 24


Results table

Department of Information and Communication Technology 20/ 24


KNN-RandomOverSampler model interpretation

Department of Information and Communication Technology 21/ 24


LGBM-RandomOverSampler model interpretation

Department of Information and Communication Technology 22/ 24


Conclusion

With the rising popularity of Android-based mobile devices,


there’s a corresponding increase in the risk of malicious
attacks. Therefore it is crucial to build a fast and efficient
malware detection model.
Understanding an ML model is as important as building it,
which can achieved by the use of XAI techniques.

Department of Information and Communication Technology 23/ 24


Thank You

Department of Information and Communication Technology 24/ 24

You might also like