0% found this document useful (0 votes)

15 views10 pages

Group 17

mACHINE LEARNING

Uploaded by

Raphael Kuayi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views10 pages

Group 17

mACHINE LEARNING

Uploaded by

Raphael Kuayi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

GROUP 17

ID: 10970285 (Kuayi Rapheal) and 10970187 (Kelvin Doe)

Machine Learning Model Comparison Report

ABSTRACT

This study investigates the performance of three machine learning models—Random Forest, SVM,
and KNN—in predicting whether a software will be benign or malware from android permission. The
dataset is preprocessed to handle missing values, outliers. Hyperparameter tuning is employed to
optimize the models' configurations. The models are evaluated based on accuracy, precision, recall,
and F1 score on a test
set. Additionally, the Area Under the Receiver Operating Characteristic (AUROC) curve is computed.
Results indicate that all the models performs well on test features in terms of predictive accuracy and
AUC. The study contributes insights into the selection and tuning of machine learning models for malware
detection, offering a foundation for further research in malware detection.

Introduction

The objective of this report is to compare the performance of three machine learning models: Random
Forest, SVM, and KNN, for the task of malware prediction. This comparison aims to identify the most
effective model for predicting whether an app is benign or malware based on its permissions.
Pre-processing Steps

1. Data Cleaning
The dataset was examined for missing values and outliers. Fortunately, no missing values ,outliers,
incorrect datatype were found

2. Train-Test Split
The dataset was split into training and testing sets to evaluate the model's performance on
unseen data. An 80-20 split was used, with 80% of the data used for training and 20% for
testin
Model Selection
Possible models:
Based on research, the team came up with 3 models to train with the data:

• Random Forest Classifier

• K-Nearest Neighbor
• Scalar Vector Machine

REASONS FOR SELECTION

1. Random Forest
Ensemble Method:
i)Random Forest is an ensemble method that combines multiple decision trees to form a robust
and accurate model.
ii)It is less prone to overfitting compared to individual decision trees.
Handles Non-linearity: Random Forest can capture complex non-linear relationships between
features, which is crucial in datasets where relationships might be non-linear, such as permissions
and their association with app behavior.
2. Support Vector Machine (SVM)
Interpretability: SVM provides a simple and interpretable model. The support vectors and the
decision boundary can be easily interpreted in terms of their impact on classifying apps as benign
or malware.
Efficiency: SVM is computationally efficient, especially with large datasets. It is often a good
choice when interpretability and speed are essential.
3. k-Nearest Neighbors (KNN)
Interpretability: KNN is interpretable, and the decision-making process is easy to understand.
Each prediction is based on the similarity of the nearest neighbors, making it clear how decisions
are made.
Handling Non-linear Relationships: KNN can naturally handle non-linear relationships between
features and the target variable. It is flexible in capturing complex decision boundaries, which is
important for distinguishing between benign and malware apps based on permissions.
Model Training
Each model was trained on the training set with the specified hyperparameters.

Hyperparameters:

Random Forest
We used Grid Search with cross-validation to explore different values for the n_estimators
hyperparameter. The values tested were [50, 100, 200, 300, 400, 500].
Grid Search systematically works through multiple combinations of parameter values, cross-
validating each combination to determine the best parameter value.
• Number of Trees (n_estimators): 300
Hence the parameter n_estimators:300 was used because it gave the best performance measure.

SVM
Type of hyperplane: Linear(Default)
This was used because there was used because the data was a 2D data. So the hyperplane needed
was supposed to be linear.

KNN

Number of neighbors :5
This was used in order to ensure the model does not overfit and also to make it generalize better.
Against all the values tested (number of neighbors of 5) gave the best performance measure
Model Evaluation
The models were evaluated on the testing set using the following metrics:

• Accuracy: Proportion of correctly predicted instances.

• Precision: Proportion of true positive predictions among all positive predictions.
• Recall: Proportion of true positive predictions among all actual positive instances.
• F1 Score: Harmonic mean of precision and recall.

Results

Model Accuracy Precision Recall F1 Score

Random Forest 0.97 0.97 0.98 0.97
KNN 0.96 0.96 0.97 0.97
SVM 0.97 0.96 0.96 0.96

Random Forest Evaluation Outcome

A DIAGRAM OF CONFUSION MATRIX FROM RANDOM FOREST CLASSIFIER

A DIAGRAM OF AUROC FOR RANDOM FOREST CLASSIFIER
K Nearest Neighbors Evaluation Outcome

A DIAGRAM OF THE CONFUSION MATRIX FROM KNN CLASSIFIER

A DIAGRAM OF AREA UNDER ROC FROM KNN CLASSIFIER

Support Vector Machine Evaluation Outcome

A GRAPH OF AREA UNDER ROC FOR SVM CLASSIFIER

CONFUSION MATRIX FROM SVM CLASSIFIER
Discussions and Conclusion
The performance comparison of the three models—Random Forest, SVM, and KNN—revealed several key
insights:
1. Accuracy: Both Random Forest and SVM achieved high accuracy scores of 0.97, slightly
outperforming KNN, which had an accuracy of 0.96. This indicates that Random Forest and SVM
are better at correctly classifying apps as benign or malware.
2. Precision and Recall: Random Forest showed the highest recall (0.98), making it particularly
effective at identifying actual malware apps. While SVM and KNN both had similar precision and
recall values, Random Forest's higher recall suggests it is less likely to miss malware instances.
3. F1 Score: All models demonstrated strong F1 scores, with Random Forest and KNN both achieving
0.97. SVM had a slightly lower F1 score of 0.96. The F1 score balances precision and recall, and
the high scores across all models indicate a well-rounded performance.

Conclusion
• Random Forest: The ensemble nature of Random Forest, which combines multiple decision trees,
allows it to capture complex, non-linear relationships between permissions and app behavior,
leading to its superior performance in recall and AUROC. Its robustness and high recall make it the
best choice for scenarios where the cost of missing malware is high.
• SVM: The SVM model's performance was commendable, especially considering its efficiency and
interpretability. Its accuracy and precision were on par with Random Forest, making it a viable
option for large datasets where speed and simplicity are essential.
• KNN: While KNN performed slightly lower in accuracy compared to the other models, its
interpretability and ability to handle non-linear relationships make it a useful model in scenarios
where the decision process needs to be easily understood.
Overall, the Random Forest model is recommended for its superior performance in most metrics,
particularly in identifying malware apps with high recall and AUROC. However, both SVM and KNN
have their merits and could be preferred depending on specific use-case requirements such as
interpretability, computational efficiency, and the nature of the dataset. Future research could further
explore hyperparameter tuning and the integration of additional features to enhance model performance.

Lab 7
No ratings yet
Lab 7
12 pages
Apply and Innovate 2018 Honda Kawabe
No ratings yet
Apply and Innovate 2018 Honda Kawabe
41 pages
Jaipur Knowledge City
No ratings yet
Jaipur Knowledge City
31 pages
20 Coding Patterns To Master MAANG Interviews
No ratings yet
20 Coding Patterns To Master MAANG Interviews
22 pages
As 1683.11-2001 Methods of Test For Elastomers Tension Testing of Vulcanized or Thermoplastic Rubber
No ratings yet
As 1683.11-2001 Methods of Test For Elastomers Tension Testing of Vulcanized or Thermoplastic Rubber
4 pages
Feature Extraction From Android Application Packages and Its Usage in Machine Learning For Malware Classification
No ratings yet
Feature Extraction From Android Application Packages and Its Usage in Machine Learning For Malware Classification
40 pages
Malware Detection
No ratings yet
Malware Detection
29 pages
ISAA Lab DA 5 KRISH
No ratings yet
ISAA Lab DA 5 KRISH
11 pages
Android Malware Detection Using Deep Learning
No ratings yet
Android Malware Detection Using Deep Learning
6 pages
Elizabeth Walkup, MacMalware
No ratings yet
Elizabeth Walkup, MacMalware
5 pages
20011f0023 Prem
No ratings yet
20011f0023 Prem
16 pages
Integrative Programming and Technology 1
No ratings yet
Integrative Programming and Technology 1
4 pages
16.experimental Comparison of Features and Classifiers For Android Malware Detection
No ratings yet
16.experimental Comparison of Features and Classifiers For Android Malware Detection
12 pages
The Comparison With Existing Literatures
No ratings yet
The Comparison With Existing Literatures
16 pages
Latex
No ratings yet
Latex
27 pages
Agrawal-Trivedi2021 Chapter MachineLearningClassifiersForA
No ratings yet
Agrawal-Trivedi2021 Chapter MachineLearningClassifiersForA
13 pages
Malware Classification Using Naïve Bayes Classifier For Android OS
No ratings yet
Malware Classification Using Naïve Bayes Classifier For Android OS
5 pages
Complex Digital Signal Processing in Telecommunications
No ratings yet
Complex Digital Signal Processing in Telecommunications
23 pages
SZALAY Et Al-ICTINROADVEHICLESOBDvsCAN
No ratings yet
SZALAY Et Al-ICTINROADVEHICLESOBDvsCAN
8 pages
Anis E-Waste Management
No ratings yet
Anis E-Waste Management
122 pages
Indexing Structures For Files: Database Design Database Design
No ratings yet
Indexing Structures For Files: Database Design Database Design
9 pages
Malware Detection in Android in Different Application Categories
No ratings yet
Malware Detection in Android in Different Application Categories
6 pages
PC DMIS Software de Masura PDF
No ratings yet
PC DMIS Software de Masura PDF
24 pages
Clinical Chemistry Analyzer
No ratings yet
Clinical Chemistry Analyzer
2 pages
Malware Detection Using Supervised Machine Learning: Submitted To
No ratings yet
Malware Detection Using Supervised Machine Learning: Submitted To
8 pages
WIRES-X Connection Kit HRI-200 (Includes New DG-ID Feature) Instruction Manual
No ratings yet
WIRES-X Connection Kit HRI-200 (Includes New DG-ID Feature) Instruction Manual
109 pages
Results
No ratings yet
Results
4 pages
PythonMalware FirstReview
No ratings yet
PythonMalware FirstReview
25 pages
Mkt4218: New Product and Innovation
No ratings yet
Mkt4218: New Product and Innovation
36 pages
Abdulrahman El Moughrabi Resume
No ratings yet
Abdulrahman El Moughrabi Resume
2 pages
BDA Worksheet 5 Arman
No ratings yet
BDA Worksheet 5 Arman
5 pages
Inception Requirement Gathering and Risk Analysis
No ratings yet
Inception Requirement Gathering and Risk Analysis
1 page
Second
No ratings yet
Second
21 pages
Soal Uas Ujian-1
No ratings yet
Soal Uas Ujian-1
3 pages
Android Malware
No ratings yet
Android Malware
62 pages
Difference Between Microkernel and Exokernel
No ratings yet
Difference Between Microkernel and Exokernel
4 pages
UNIT 2-Part2
No ratings yet
UNIT 2-Part2
9 pages
JCS2121 Prog in C Syllabus
No ratings yet
JCS2121 Prog in C Syllabus
2 pages
Bytedance Ai Lab Ava Challenge 2019 Technical Report
No ratings yet
Bytedance Ai Lab Ava Challenge 2019 Technical Report
2 pages
Improved Chimp Optimization Algorithm (ICOA) Feature Selection and Deep Neural Network Framework For Internet of Things (IOT) Based Android Malware Detection
No ratings yet
Improved Chimp Optimization Algorithm (ICOA) Feature Selection and Deep Neural Network Framework For Internet of Things (IOT) Based Android Malware Detection
8 pages
Problem 1: Cse352 AI Homework 3 Solutions
No ratings yet
Problem 1: Cse352 AI Homework 3 Solutions
31 pages
BlackBook-Report FY-ML MalwareDetection1
No ratings yet
BlackBook-Report FY-ML MalwareDetection1
48 pages
LinRegDroid Detection of Android Malware Using Multiple Linear Regression Models-Based Classifiers
No ratings yet
LinRegDroid Detection of Android Malware Using Multiple Linear Regression Models-Based Classifiers
14 pages
Unmasking The Common Traits: An Ensemble Approach For Effective Malware Detection
No ratings yet
Unmasking The Common Traits: An Ensemble Approach For Effective Malware Detection
1 page
Randomfore
No ratings yet
Randomfore
2 pages
Geisel Layout
No ratings yet
Geisel Layout
1 page
Data Algo Metrics
No ratings yet
Data Algo Metrics
5 pages
English Template JDLDE
No ratings yet
English Template JDLDE
6 pages
Hybrid Machine Learning Model For Malware Analysis in
No ratings yet
Hybrid Machine Learning Model For Malware Analysis in
18 pages
PermDroid - A Framework For Android Malware Detection
No ratings yet
PermDroid - A Framework For Android Malware Detection
38 pages
ML Assignment Report Prithvi D
No ratings yet
ML Assignment Report Prithvi D
15 pages
Mihir Patel - SaExperiments
No ratings yet
Mihir Patel - SaExperiments
57 pages
Attiq Ahmad Afsar Assignment 1
No ratings yet
Attiq Ahmad Afsar Assignment 1
12 pages
Malware Application Detection Using Machine Learning
No ratings yet
Malware Application Detection Using Machine Learning
7 pages
Governing Body College Presentation 2024
No ratings yet
Governing Body College Presentation 2024
39 pages
Strivers A2Z DSA Completion Plan
No ratings yet
Strivers A2Z DSA Completion Plan
2 pages
Final Report
No ratings yet
Final Report
17 pages
Review Paper
No ratings yet
Review Paper
3 pages
Presentation 12
No ratings yet
Presentation 12
11 pages
Android Based Malware Detection Technique Using Machine Learning Algorithms
No ratings yet
Android Based Malware Detection Technique Using Machine Learning Algorithms
6 pages
Unified Approach For Android Malware Detection: Feature Combination and Ensemble Classifier
No ratings yet
Unified Approach For Android Malware Detection: Feature Combination and Ensemble Classifier
11 pages
V25I0107
No ratings yet
V25I0107
6 pages
Final Research
No ratings yet
Final Research
12 pages
Week3 Report RSelga
No ratings yet
Week3 Report RSelga
13 pages
Hints of Assignment5 - Fall 2024
No ratings yet
Hints of Assignment5 - Fall 2024
11 pages
Intrusion Detection On Self Organizing Network Using Pca and Random Forest
No ratings yet
Intrusion Detection On Self Organizing Network Using Pca and Random Forest
16 pages
Android Malware Detection Using DeepLearning
No ratings yet
Android Malware Detection Using DeepLearning
34 pages
3116 Analisis Statis Deteksi Malware Jurnal Cybersecurity - Id.en
No ratings yet
3116 Analisis Statis Deteksi Malware Jurnal Cybersecurity - Id.en
5 pages
Journal Paper
No ratings yet
Journal Paper
5 pages
Malware Detection
No ratings yet
Malware Detection
24 pages
SQR Da 2
No ratings yet
SQR Da 2
11 pages
Dissertation Study - Deep Packet Inspection
No ratings yet
Dissertation Study - Deep Packet Inspection
14 pages
Paper 2.0
No ratings yet
Paper 2.0
6 pages
50.deep Learning For Secure Mobile Edge Computing in Cyber-Physical Transportation Systems
No ratings yet
50.deep Learning For Secure Mobile Edge Computing in Cyber-Physical Transportation Systems
9 pages
Machine Learning Model
No ratings yet
Machine Learning Model
2 pages
03 Design Apis
No ratings yet
03 Design Apis
16 pages
Malware - Detection - Research - Paper - Updated Soheb6
No ratings yet
Malware - Detection - Research - Paper - Updated Soheb6
8 pages
Malware Detection Research Paper Updated Soheb6
No ratings yet
Malware Detection Research Paper Updated Soheb6
6 pages
Aiml Nts
No ratings yet
Aiml Nts
33 pages
Log
No ratings yet
Log
20 pages
As3 Cs Daylo, Roque, Somera
No ratings yet
As3 Cs Daylo, Roque, Somera
5 pages
Report
No ratings yet
Report
2 pages
Simcenter 3d Solution Guide Ebook Tcm57 96838
No ratings yet
Simcenter 3d Solution Guide Ebook Tcm57 96838
176 pages
TSP Csse 52875
No ratings yet
TSP Csse 52875
21 pages
Android Malware Detection
No ratings yet
Android Malware Detection
17 pages
Hybrid ML-DL Approach For Android Malware Detection
No ratings yet
Hybrid ML-DL Approach For Android Malware Detection
9 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Exercises of Statistical Inference
From Everand
Exercises of Statistical Inference
Simone Malacrida
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet

Group 17

Uploaded by

Group 17

Uploaded by

GROUP 17

ID: 10970285 (Kuayi Rapheal) and 10970187 (Kelvin Doe)

• Random Forest Classifier

REASONS FOR SELECTION

• Accuracy: Proportion of correctly predicted instances.

Model Accuracy Precision Recall F1 Score

Random Forest Evaluation Outcome

A DIAGRAM OF CONFUSION MATRIX FROM RANDOM FOREST CLASSIFIER

A DIAGRAM OF THE CONFUSION MATRIX FROM KNN CLASSIFIER

A DIAGRAM OF AREA UNDER ROC FROM KNN CLASSIFIER

A GRAPH OF AREA UNDER ROC FOR SVM CLASSIFIER

You might also like