0% found this document useful (0 votes)

27 views21 pages

Second

Uploaded by

Raphael Kuayi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views21 pages

Second

Uploaded by

Raphael Kuayi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

DEPARTMENT OF COMPUTER ENGINEERING

TITLE: Permission-Based Malware Detection in Android Using

Machine Learning

GROUP 17
Kuayi Raphael (10970285)
Doe Kelvin (10970187)

SUPERVISOR: Mrs. Gifty Osei

TEACHING ASSISTANT: Mr. Desmond Xeflide
JULY, 2024.
ABSTRACT
This project tackles the pressing issue of malware detection in Android devices by examining
the permission usage patterns of apps. Traditional signature-based detection methods have
proven ineffective against sophisticated threats, necessitating innovative approaches. This study
explores the efficacy of machine learning for permission-based malware detection,
demonstrating its potential in identifying malicious apps based on their permission usage
patterns. A comprehensive dataset of permissions extracted from over 29,000 Android apps
(2010-2019) was utilized, comprising 86 features and a binary target variable. Following an
extensive exploratory data analysis (EDA), various machine learning models (SVM, KNN,
Random Forest) were evaluated and optimized through hyperparameter tuning. The results
show that the Random Forest model outperformed others, achieving significant improvements
in F1 Score (0.95) and AUC-ROC Score (0.98). While promising, this study also encountered
challenges and limitations, including class imbalance and feature correlation. The project
concludes by summarizing key findings, contributions, and recommendations for future work
to enhance the accuracy and robustness of permission-based malware detection in Android
devices

INTRODUCTION
The proliferation of Android smartphones, which hold over 70% of the mobile OS market
share, has made them a prime target for malware attacks. Traditional malware detection
methods, such as signature-based detection, have proven ineffective against sophisticated and
rapidly evolving threats. This project aims to address the challenge of detecting malware in
Android applications by leveraging machine learning techniques to analyse app permission
usage patterns.

The problem of malware detection in Android apps is complex and pressing. As the popularity
of Android devices continues to grow, so does the number of malicious apps seeking to exploit
them. These apps often request excessive or suspicious permissions to gain access to sensitive
data or functionalities, making permission-based analysis a promising approach for detection.
OBJECTIVES
The primary goal of this project was to design and develop a machine learning model that
accurately predicts whether an Android app is benign or malicious based on its permission
requests. To achieve this, the following key objectives were pursued:

• Investigate the viability of machine learning for permission-based malware detection in

Android apps, examining the potential for accurate classification and identification of
malicious patterns.

• Evaluate and compare the performance of various machine learning models in

classifying Android apps as benign or malware based on their permission requests,
determining the most effective approach.

• Identify key permissions that are indicative of malicious behaviour, providing insights
into the specific permission patterns and correlations that are strongly associated with
malware.

• Assess the overall effectiveness of permission-based analysis for malware detection,

highlighting the strengths and limitations of this approach and its potential for
integration into existing security measures.

DATA COLLECTION AND PREPROCESSING

3.1 Data Sources

The dataset used for this project was sourced from multiple repositories containing Android
apps released between 2010 and 2019. It includes permissions extracted from over 29,000 apps,
classified into benign and malware categories.
3.2 Data Description

The dataset consists of 86 features representing various permissions that an app may request,
Each feature is encoded in a binary format (1 for granted, 0 for not granted).

.
Each feature in the dataset had two unique values. The dataset had 29332 rows.

Fig(i)

The target variable, Result, represents the classification of an app as either benign or malware.
The dataset is well-balanced, with nearly equal numbers of instances for each class:

● Class 1 (Malware): 14,700 instances

● Class 0 (Benign): 14,632 instances
fig(ii)

3.3 Data Cleaning

Given the binary nature of the dataset, no scaling was required.

The data was thoroughly checked for outliers and null values, ensuring high integrity and
quality. All columns had its datatype to be int64.

. No outliers were present since data was binary, and all fields were complete, contributing to
the robustness of the data.

The data had no missing values.

3.4 Data Transformation and Feature Engineering

Feature engineering involved analysing the correlation between features using a

heatmap(correlation) to inform decisions on feature selection and dimensionality reduction.
Techniques such as PCA were considered to reduce the dimensionality of the dataset while
retaining essential information. The PCA result was visualised to inform the appropriate
algorithms to use by plotting a scatter plot of the datapoints. The pair-feature correlation
viewed by the aid of the heatmap showed dimensionality reduction would be possible since
most features had a high correlation.

3.5 Data Splitting

The dataset was split into training, validation, and test sets to evaluate model performance
effectively. The training set was used to train the models, the validation set for hyperparameter
tuning, and the test set for final evaluation.

The dataset was split into training and testing sets to evaluate the model's performance on
unseen data. An 80-20 split was used, with 80% of the data used for training and 20% for
testing.
METHODOLOGY

4.1 Model Selection

Three machine learning models were selected for this project:

• Support Vector Machine (SVM)

• K-Nearest Neighbours (KNN),
• Random Forest (RF).

Each model was chosen for its unique characteristics and ability to handle binary classification
tasks.

4.2 ALGORITHM DESCRIPTIONS

• Support Vector Machine (SVM): SVM is a supervised learning algorithm used for
classification tasks. It finds the optimal hyperplane that separates classes in a high-
dimensional space. For this project, two types of SVM kernels were evaluated:
1.Linear: Assumes a linear relationship between features and class labels.

2.Polynomial: Captures non-linear relationships by introducing polynomial features. (A

3rd Order polynomial was used since it gave the best performance measure).

Reason for selection:

1.SVMs are powerful classifiers that can handle high-dimensional data effectively,
which is be the case with many app permissions.
2.They are particularly good at finding a clear separation between the two classes
(malware and benign) in the feature space, even if the separation is non-linear.

The 86 features from the dataset were reduced to two principal components to visualise
how SVM algorithm will perform on the data using a polynomial kernel. The number of
principal components of 2 was selected because it had majority of the variance of the
data present which depicted the actual nature of the 86 features.
• K-Nearest neighbours (KNN): KNN is an instance-based learning algorithm that
classifies instances based on the majority vote of their nearest neighbours.
The performance of KNN was evaluated with different numbers of neighbours
1.K=3: Evaluates the model with 3 nearest neighbours.
2.K=5: Evaluates the model with 5 nearest neighbours.
3.K=7: Evaluates the model with 7 nearest neighbours.
4.K=9: Evaluates the model with 9 nearest neighbours.
The distance metric used was Euclidean distance.

Reason for Selection

KNN doesn't require complex assumptions about the underlying data distribution.

• Random Forest (RF): RF is an ensemble learning method that constructs multiple

decision trees during training and outputs the class that is the majority vote of the
individual trees. The performance was evaluated with different numbers of trees:
1.100 Trees: Evaluates the model with 100 trees.
2.200 Trees: Evaluates the model with 200 trees.
3.300 Trees: Evaluates the model with 300 trees.

Reason for Selection

RF often achieves high accuracy and handles high-dimensional data well,
making it a strong candidate for malware detection.
4.3 Hyperparameter Tuning

Hyperparameter tuning was performed to optimize the performance of each model. This
involved adjusting parameters such as the kernel type for SVM, the number of neighbours for
KNN, and the number of trees for RF.

4.4 Evaluation Metrics

The following metrics were used to evaluate model performance:

• Accuracy: Measures the proportion of correct predictions out of total predictions.

• Precision: Measures the proportion of true positives (correctly predicted instances) out
of all positive predictions made. It evaluates the model's ability to avoid false positives.
• Recall: Measures the proportion of true positives out of all actual positive instances. It
evaluates the model's ability to detect all relevant instances.
• F1 Score: The harmonic mean of precision and recall, particularly useful for
imbalanced datasets.
• AUC-ROC Score: Indicates the model's ability to discriminate between positive and
negative classes.
IMPLEMENTATION

5.1 Tools and Libraries

The following tools, libraries, and frameworks were used for this project:

• Programming Language: Python

• Libraries: Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn, SciPy

5.2 Model Training

The models were trained using the training set with the optimal hyperparameters identified
during the tuning process. The training process included configuring model parameters, fitting
the models to the training data, and ensuring reproducibility by setting random seeds.

5.3 Model Validation

Model validation was performed using the validation set to fine-tune the models and prevent
overfitting. Cross-validation techniques were used to ensure the models generalize well to
unseen data.

5.4 Model Testing

The final evaluation of the models was conducted on the test set. Performance metrics,
including accuracy, F1 score, and AUC-ROC score, were computed to assess the effectiveness
of each model.
RESULTS

6.1 Performance Metrics

The performance metrics for each model were as follows:

Support Vector Machine (SVM) - Linear Kernel:

• Training F1 Score: 0.958
• Testing F1 Score: 0.960
• Training AUC-ROC Score: 0.988
• Testing AUC-ROC Score: 0.989
Support Vector Machine (SVM) - Polynomial Kernel:
• Training F1 Score: 0.960
• Testing F1 Score: 0.964
• Training AUC-ROC Score: 0.989
• Testing AUC-ROC Score: 0.991
K-Nearest neighbours (KNN):

• K=3:
o Training F1 Score: 0.958
o Testing F1 Score: 0.961
o Training AUC-ROC Score: 0.985
o Testing AUC-ROC Score: 0.986
• K=5:
o Training F1 Score: 0.960
o Testing F1 Score: 0.962
o Training AUC-ROC Score: 0.986
o Testing AUC-ROC Score: 0.987
• K=7:
o Training F1 Score: 0.961
o Testing F1 Score: 0.963
o Training AUC-ROC Score: 0.987
o Testing AUC-ROC Score: 0.988
• K=9:
o Training F1 Score: 0.964
o Testing F1 Score: 0.964
o Training AUC-ROC Score: 0.987
o Testing AUC-ROC Score: 0.987
Random Forest (RF):

• 100 Trees:
o Training F1 Score: 0.968
o Testing F1 Score: 0.970
o Training AUC-ROC Score: 0.992
o Testing AUC-ROC Score: 0.993
• 200 Trees:
o Training F1 Score: 0.969
o Testing F1 Score: 0.971
o Training AUC-ROC Score: 0.993
o Testing AUC-ROC Score: 0.993
• 300 Trees:
o Training F1 Score: 0.970
o Testing F1 Score: 0.970
o Training AUC-ROC Score: 0.993
o Testing AUC-ROC Score: 0.993
6.2 Comparison of Models

The results presented in the table are achieved through careful hyperparameter tuning and
the strategic use of the Area Under the Receiver Operating Characteristic Curve
(AUROC) as the primary evaluation metric.

Reason for Using AUROC

1. ROC-AUC is a valuable metric for evaluating classification models because it
considers performance across all thresholds, unlike accuracy, recall, precision and f1
score. In malware detection, this helps balance the trade-offs between false positives
(wrongly flagging benign apps) and false negatives (failing to detect viruses). High
ROC-AUC values indicate a model's consistent ability to distinguish between benign
and malicious apps, reducing both types of errors. This metric supports context-
dependent threshold selection, ensuring a robust detection system that enhances
security while maintaining user trust.
2. In tasks like malware detection, the key is distinguishing malicious apps from safe
ones. ROC-AUC directly reflects this by measuring how well the model separates the
positive (malware) and negative (benign) classes.

Model Accuracy Precision Recall F1 Score AUROC Train Time(s) Predict Time(s)
Random Forest 0.970 0.980 0.980 0.970 0.993 7.10 0.70
KNN 0.960 0.970 0.965 0.964 0.987 0.02 7.48
SVM 0.960 0.960 0.960 0.960 0.989 47.78 2.52

• Random Forest shows the best overall performance in terms of accuracy, precision,
recall, F1 Score, and AUROC. It also has a reasonably quick training time and the
fastest prediction time, making it an excellent choice for both training and inference.
• KNN has a very fast training time but suffers from the longest prediction time, making
it less suitable for real-time predictions despite having good precision and recall.
• SVM performs well across most metrics but has a significantly longer training time,
which could be a drawback.

DISCUSSION

7.1 Interpretation of Results

The results indicate that the Random Forest model outperformed the other models in terms of
F1 Score and AUC-ROC Score. This can be attributed to its ensemble nature, which reduces
overfitting and enhances generalization.
7.2 Model Insights

• SVM Kernels: The Polynomial kernel provided a better fit for the data than the Linear
kernel, capturing non-linear relationships more effectively.
• KNN neighbours: Increasing the number of neighbours generally improved the
model's performance by reducing variance.
• RF Trees: More trees improved model robustness and performance, highlighting the
importance of ensemble size.

STATISTICAL ANALYSIS OF RESULTS

ANOVA (ANALYSIS ON VARIANCE)

F1 Scores:
ANOVA Results: F=21.075, p=0.0019. The results indicate significant differences in
F1 Scores among the classification algorithms, with Random Forest showing superior
performance compared to SVM and KNN.

AUC-ROC Scores:
ANOVA Results: F=33.813, p=0.0005. The results demonstrate significant
differences in AUC-ROC Scores, confirming Random Forest's higher discriminatory
power.

T-Tests
F1 Scores:
• SVM vs. KNN: t=0.696, p=0.525. No significant difference was observed between
SVM and KNN (Since p was greater than 5% (0.05)).
REASON
This was because both SVM and KNN use a distance metric in their evaluation.

• SVM vs. RF: t=-4.346, p=0.012. A significant difference was found, with Random
Forest performing significantly better than SVM. (Negative (-) t value indicates that
RF performs SVM)
• KNN vs. RF: t=-13.840, p=0.0002. A highly significant difference was found, with
Random Forest outperforming KNN. (Negative (-) t value indicates that RF
outperforms KNN).
AUC-ROC Scores:
• SVM vs. KNN: t=3.701, p=0.021. Significant difference observed, with SVM
slightly outperforming KNN.
• SVM vs. RF: t=-4.210, p=0.014. Significant difference found, with Random
Forest showing superior performance.
• KNN vs. RF: t=-9.538, p=0.0007. A highly significant difference was noted, with
Random Forest outperforming KNN.

CONCLUSION
This study demonstrated the effectiveness of using machine learning for permission-based
malware detection in Android apps. The project comprehensively evaluated and compared
three classification algorithms—Support Vector Machine (SVM), K-Nearest neighbours
(KNN), and Random Forest (RF)—using various performance metrics and statistical analysis.
Random Forest emerged as the top performer, demonstrating the highest F1 Score and AUC-
ROC Score.

REFERENCES
[1] Hareram Kumar (2022) The Research Paper, Android Malware Prediction using Machine
Learning Techniques: A Review.
[2] Neamat Al Sarah (2021) Online Paper, An Efficient Android Malware Prediction Using
Ensemble machine learning algorithms
[3] Machine Learning for Android Malware Detection Using Permission and API Calls
{4] Android Permission Dataset

Knowledge & Practical Interests PDF
100% (3)
Knowledge & Practical Interests PDF
204 pages
A Comprehensive Survey On Machine Learning Techniques For Android Malware Detection
No ratings yet
A Comprehensive Survey On Machine Learning Techniques For Android Malware Detection
12 pages
Expository Writing Notes 2
No ratings yet
Expository Writing Notes 2
21 pages
Malwarepjct PDF
No ratings yet
Malwarepjct PDF
70 pages
Malicious Application Detection Using Machine Learning
No ratings yet
Malicious Application Detection Using Machine Learning
59 pages
Form Inspeksi Scissors
No ratings yet
Form Inspeksi Scissors
4 pages
Electrochemistry
100% (1)
Electrochemistry
78 pages
Hybrid Machine Learning Model For Malware Analysis in
No ratings yet
Hybrid Machine Learning Model For Malware Analysis in
18 pages
CHE134P FINAL EXAM 2013 14 4t
No ratings yet
CHE134P FINAL EXAM 2013 14 4t
10 pages
Electronics 09 00435 PDF
No ratings yet
Electronics 09 00435 PDF
20 pages
Mining Based Learning Framework For Android Malware Detection
No ratings yet
Mining Based Learning Framework For Android Malware Detection
12 pages
Android Malware Detection
No ratings yet
Android Malware Detection
12 pages
Department of Education: Republic of The Philippines
No ratings yet
Department of Education: Republic of The Philippines
16 pages
Malware Detection Using Machine Leaning
No ratings yet
Malware Detection Using Machine Leaning
9 pages
Feature Extraction From Android Application Packages and Its Usage in Machine Learning For Malware Classification
No ratings yet
Feature Extraction From Android Application Packages and Its Usage in Machine Learning For Malware Classification
40 pages
Android Malware Detection by Correlated Real Permission Couples Using FP Growth Algorithm and Neural Networks
No ratings yet
Android Malware Detection by Correlated Real Permission Couples Using FP Growth Algorithm and Neural Networks
15 pages
Android Malware Detection Using Deep Learning
No ratings yet
Android Malware Detection Using Deep Learning
6 pages
Elizabeth Walkup, MacMalware
No ratings yet
Elizabeth Walkup, MacMalware
5 pages
Android Malware
No ratings yet
Android Malware
62 pages
Significant Permission Identification For Android Malware Detecti
No ratings yet
Significant Permission Identification For Android Malware Detecti
61 pages
Android Malware Detection Using Machine Learning Techniques
No ratings yet
Android Malware Detection Using Machine Learning Techniques
50 pages
PythonMalware FirstReview
No ratings yet
PythonMalware FirstReview
25 pages
Agrawal-Trivedi2021 Chapter MachineLearningClassifiersForA
No ratings yet
Agrawal-Trivedi2021 Chapter MachineLearningClassifiersForA
13 pages
Group 17
No ratings yet
Group 17
10 pages
LinRegDroid Detection of Android Malware Using Multiple Linear Regression Models-Based Classifiers
No ratings yet
LinRegDroid Detection of Android Malware Using Multiple Linear Regression Models-Based Classifiers
14 pages
16.experimental Comparison of Features and Classifiers For Android Malware Detection
No ratings yet
16.experimental Comparison of Features and Classifiers For Android Malware Detection
12 pages
PermDroid - A Framework For Android Malware Detection
No ratings yet
PermDroid - A Framework For Android Malware Detection
38 pages
Malware - Me Project Document
No ratings yet
Malware - Me Project Document
31 pages
Malware - Me Project Document
No ratings yet
Malware - Me Project Document
28 pages
An Efficient Android Malware Detection Using Adaptive Red Fox Optimization Based CNN
No ratings yet
An Efficient Android Malware Detection Using Adaptive Red Fox Optimization Based CNN
22 pages
LE 451 Law of Succession Trusts and Wills (WILLS)
100% (2)
LE 451 Law of Succession Trusts and Wills (WILLS)
81 pages
Machine Learning Aided Android Malware Classification
No ratings yet
Machine Learning Aided Android Malware Classification
21 pages
Malware Detection Using ML
No ratings yet
Malware Detection Using ML
19 pages
Malware Classification Using Naïve Bayes Classifier For Android OS
No ratings yet
Malware Classification Using Naïve Bayes Classifier For Android OS
5 pages
Android Malware Detection Using DeepLearning
No ratings yet
Android Malware Detection Using DeepLearning
34 pages
IJCRT2405073
No ratings yet
IJCRT2405073
3 pages
BlackBook-Report FY-ML MalwareDetection1
No ratings yet
BlackBook-Report FY-ML MalwareDetection1
48 pages
Major Project 1
No ratings yet
Major Project 1
11 pages
Malware Application Detection Using Machine Learning
No ratings yet
Malware Application Detection Using Machine Learning
7 pages
Improved Chimp Optimization Algorithm (ICOA) Feature Selection and Deep Neural Network Framework For Internet of Things (IOT) Based Android Malware Detection
No ratings yet
Improved Chimp Optimization Algorithm (ICOA) Feature Selection and Deep Neural Network Framework For Internet of Things (IOT) Based Android Malware Detection
8 pages
Harsha
No ratings yet
Harsha
13 pages
P.E.S. College of Engineering, MANDYA, 571401: Identifying The Android Malware Using Machine Learning Algorithm
No ratings yet
P.E.S. College of Engineering, MANDYA, 571401: Identifying The Android Malware Using Machine Learning Algorithm
34 pages
Malware Detection in Android in Different Application Categories
No ratings yet
Malware Detection in Android in Different Application Categories
6 pages
7.analysis and Detection of Malware in Android Applications Using Machine Learning
No ratings yet
7.analysis and Detection of Malware in Android Applications Using Machine Learning
55 pages
Unified Approach For Android Malware Detection: Feature Combination and Ensemble Classifier
No ratings yet
Unified Approach For Android Malware Detection: Feature Combination and Ensemble Classifier
11 pages
Malware Analysis On Android Using Supervised Machine Learning Techniques
No ratings yet
Malware Analysis On Android Using Supervised Machine Learning Techniques
12 pages
V25I0107
No ratings yet
V25I0107
6 pages
Malware Detection Using ML
No ratings yet
Malware Detection Using ML
20 pages
Final Research
No ratings yet
Final Research
12 pages
SRS
100% (1)
SRS
11 pages
Significant Permission Identification For Machine Learning Based Android Malware Detection
No ratings yet
Significant Permission Identification For Machine Learning Based Android Malware Detection
10 pages
Android Based Malware Detection Technique Using Machine Learning Algorithms
No ratings yet
Android Based Malware Detection Technique Using Machine Learning Algorithms
6 pages
Towards A Fair Comparison and Realistic Evaluation Framework of Android Malware
No ratings yet
Towards A Fair Comparison and Realistic Evaluation Framework of Android Malware
18 pages
Deepak S
No ratings yet
Deepak S
2 pages
Research BT4260
No ratings yet
Research BT4260
5 pages
Malware - Detection - Research - Paper - Updated Soheb6
No ratings yet
Malware - Detection - Research - Paper - Updated Soheb6
8 pages
3116 Analisis Statis Deteksi Malware Jurnal Cybersecurity - Id.en
No ratings yet
3116 Analisis Statis Deteksi Malware Jurnal Cybersecurity - Id.en
5 pages
Malware Detection
No ratings yet
Malware Detection
24 pages
Wireless Security 1
No ratings yet
Wireless Security 1
16 pages
Malware Detection Research Paper Updated Soheb6
No ratings yet
Malware Detection Research Paper Updated Soheb6
6 pages
Android Malware Detection Using Machine Learning
No ratings yet
Android Malware Detection Using Machine Learning
4 pages
TSP Csse 52875
No ratings yet
TSP Csse 52875
21 pages
Mal Ware Analysis and Dect I On
No ratings yet
Mal Ware Analysis and Dect I On
48 pages
Hybrid ML-DL Approach For Android Malware Detection
No ratings yet
Hybrid ML-DL Approach For Android Malware Detection
9 pages
Fraud App
No ratings yet
Fraud App
3 pages
Diseases of Potato
No ratings yet
Diseases of Potato
8 pages
ĐỀ SỐ 7- ĐỀ LƯƠNG THẾ VINH HÀ NỘI KHÓA 8+-CÔ PHẠM LIỄU
No ratings yet
ĐỀ SỐ 7- ĐỀ LƯƠNG THẾ VINH HÀ NỘI KHÓA 8+-CÔ PHẠM LIỄU
6 pages
Convocation 2024 Letter Registration LIST 25112024
No ratings yet
Convocation 2024 Letter Registration LIST 25112024
28 pages
DC-Tutorial Sheet 2
100% (2)
DC-Tutorial Sheet 2
2 pages
Periodic Assessment I 2025 2026-5-12
No ratings yet
Periodic Assessment I 2025 2026-5-12
1 page
2021 Albatros NG Datasheet
No ratings yet
2021 Albatros NG Datasheet
2 pages
Ge CWP PH 2 O&m Manual
No ratings yet
Ge CWP PH 2 O&m Manual
2 pages
Simon's Favorite Factoring Trick: Eugenis May 31, 2015
No ratings yet
Simon's Favorite Factoring Trick: Eugenis May 31, 2015
2 pages
Final Book 10
No ratings yet
Final Book 10
79 pages
Rate Analysis of Ms Maqbool Ahme03122021095727
No ratings yet
Rate Analysis of Ms Maqbool Ahme03122021095727
6 pages
Old Home: Story Draft
No ratings yet
Old Home: Story Draft
2 pages
Hearkenign To SIlece - Merlo Ponti
No ratings yet
Hearkenign To SIlece - Merlo Ponti
6 pages
Bohler Art of Interpretation PDF
No ratings yet
Bohler Art of Interpretation PDF
20 pages
Human Behavior Insights (Ft. Kunal Shah) X
No ratings yet
Human Behavior Insights (Ft. Kunal Shah) X
64 pages
Adidas - Color Manual Version 03 - May 2021
No ratings yet
Adidas - Color Manual Version 03 - May 2021
22 pages
Jasmine B Resume Revised
No ratings yet
Jasmine B Resume Revised
2 pages
Zia Ul Islam
No ratings yet
Zia Ul Islam
2 pages
LMHC
No ratings yet
LMHC
1 page
Scaphoid Fractures and Nonunions - RP's Ortho Notes
No ratings yet
Scaphoid Fractures and Nonunions - RP's Ortho Notes
3 pages
Geopolymer
No ratings yet
Geopolymer
17 pages
D) Capitalism vs. Communism (Animal Farm) (FORMATIVE 1)
No ratings yet
D) Capitalism vs. Communism (Animal Farm) (FORMATIVE 1)
10 pages
Daria Reflection
No ratings yet
Daria Reflection
1 page
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet

Second

Uploaded by

Second

Uploaded by

DEPARTMENT OF COMPUTER ENGINEERING

TITLE: Permission-Based Malware Detection in Android Using

SUPERVISOR: Mrs. Gifty Osei

• Investigate the viability of machine learning for permission-based malware detection in

• Evaluate and compare the performance of various machine learning models in

• Assess the overall effectiveness of permission-based analysis for malware detection,

DATA COLLECTION AND PREPROCESSING

3.1 Data Sources

● Class 1 (Malware): 14,700 instances

3.3 Data Cleaning

Given the binary nature of the dataset, no scaling was required.

The data had no missing values.

Feature engineering involved analysing the correlation between features using a

3.5 Data Splitting

4.1 Model Selection

Three machine learning models were selected for this project:

• Support Vector Machine (SVM)

4.2 ALGORITHM DESCRIPTIONS

2.Polynomial: Captures non-linear relationships by introducing polynomial features. (A

Reason for selection:

Reason for Selection

• Random Forest (RF): RF is an ensemble learning method that constructs multiple

Reason for Selection

4.4 Evaluation Metrics

The following metrics were used to evaluate model performance:

• Accuracy: Measures the proportion of correct predictions out of total predictions.

5.1 Tools and Libraries

• Programming Language: Python

5.2 Model Training

5.3 Model Validation

5.4 Model Testing

6.1 Performance Metrics

The performance metrics for each model were as follows:

Support Vector Machine (SVM) - Linear Kernel:

Reason for Using AUROC

7.1 Interpretation of Results

STATISTICAL ANALYSIS OF RESULTS

ANOVA (ANALYSIS ON VARIANCE)

You might also like