0% found this document useful (0 votes)

18 views25 pages

PythonMalware FirstReview

Uploaded by

Meenachi Sundaram

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views25 pages

PythonMalware FirstReview

Uploaded by

Meenachi Sundaram

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 25

SIGNIFICANT

PERMISSION
IDENTIFICATION FOR
ANDROID MALWARE
DETECTION
SYNOPSIS
•Project
introduces Significant Permission IDentification (SigPID),
a malware detection system using neural network
based on permission usage analysis
to cope with the rapid increase in the number of Android
malware.

•It develops three levels of pruning by mining the permission data

•to identify the most significant permissions that can be effective in

distinguishing between benign and malicious apps.

•AppliesLinear Regression, SVM, KNN and CNN classification on the

new data set.
OBJECTIVES
 Identification
of dangerous, benign as well as shutdown
enabled permission list.

 Feature reduction.

 To consider SVM/KNN Classification so that probability

of benign/suspicious apps in the given new test data is
possible.

 To reduce features (based on unique values in

permission list) before malware identification using
CNN is carried out.
EXISTING SYSTEM
 Theexisting system focuses on Linear Regression and
SVM classification algorithms to effectively detect
malware apps.

 The dataset is taken from kaggle. Preprocessed such as

zero value, N/A value and unicode character elimination
are not is not carried out here.

 Important features are not extracted out for better

classification.

 Confusion matrix is not prepared with accuracy score

calculation.
DISADVANTAGES OF
EXISTING SYSTEM
SVM Classification is not considered so that
probability of benign/suspicious apps in the
given new test data is not possible.

Feature reduction before malware identification

is not carried out.

Data columns with numeric values only take

from SVM classification.
PROPOSED SYSTEM
 The proposed system focuses on knn classification
algorithms as well as neural network to effectively
detect malware apps.
 The dataset is taken from kaggle and preprocessed such
as Unicode removal.

 Important features are extracted out for better

classification.

 Confusion matrix is prepared with accuracy score

calculation.
PROPOSED SYSTEM
(CONTD)
Accuracy prediction is also carried out.

Convolutional Neural Network based

prediction model is worked out to find
algorithm efficiency.

15600 training records and 4000 test records

are taken out for convolutional neural network
training.
ADVANTAGE OF PROPOSED SYSTEM
 KNN Classification is considered so that probability of
benign/suspicious apps in the given new test data is
possible.

 Feature reduction before malware identification is

carried out.

 KNN supports well even if the dataset size is big.

 Convolutional Neural Network based prediction model

is worked out to find algorithm efficiency.
HARDWARE SPECIFICATION
Processor : Intel Core 2 Duo

Hard Disk Capacity : 500 GB

RAM : 4 GB DDR RAM

Monitor : 17inch Color

Keyboard : 102 keys

Mouse : Optical Scroll

SOFTWARE SPECIFICATION

Operating System : Windows 10 Pro

Environment : IDLE/CoLabs

Language : Python 3.7

MODULES
DATA SET COLLECTION

DATA SET SUBSETTING BASED ON

MALWARE TYPES

Linear Regression, Support Vector Machine/K-

Nearest Neighbor CLASSIFICATION

Convolutional Neural Network based

CLASSIFICATION
1. DATA SET COLLECTION

The dataset which contains 79 columns,

(e_magic, e_cblp, e_cp, e_crlc, e_cparhdr,
e_minalloc, e_maxalloc, e_ss, e_sp, e_csum,
e_ip, e_cs, e_lfarlc, e_ovno, etc) are saved in
a single Excel workbook as records. This is
the input for the project.
2. DATA SET SUBSETTING
BASED ON MALWARE TYPES
The dataset which contains 79 columns, (e_magic,
e_cblp, e_cp, e_crlc, e_cparhdr, e_minalloc,
e_maxalloc, e_ss, e_sp, e_csum, e_ip, e_cs, e_lfarlc,
e_ovno, etc) are saved in a single Excel workbook as
records.

This is the input for the project in which

15600 (collectively (Malware 1 and 0) for
training records and
4000 (collectively (Malware 1 and 0) for testing
records are split and given for neural network.
3. LR/SVM/KNN CLASSIFICATION
In this module, 80% of the data in given data set is taken as
training data and 20% of the data is taken as test data.

The text (categorical) columns are converted into numerical

values.

Then the model is trained with training data and then

predicted with test data.

Of which, most of the apps are classified as Benign and

fewer apps are classified as Suspicious.
4. CNN CLASSIFICATION
Here the dataset is taken first. It can be seen that news data is stored in the
form of csv values (Comma Separated Values).
Each record contains 79 values for one virus definition.

Data Encoding: It converts the categorical column (label in out case) into
numerical values.

These are some variables required for the model training. Once the model is
created, it can be imported and then compiled using ‘model.compile’.

The model is trained for just five epochs but we can increase the number of
epochs.

After the training process is completed we can make predictions on the test
set. The accuracy value is displayed during iterations.
SYSTEM FLOW DIAGRAM
SIGNIFICANT PERMISSION IDENTIFICATION FOR ANDROID
MALWARE DETECTION

DATA SET PREPROCESS CLASSIFICATION

Remove N/A values LR/SVM/KNN

Classification

Subset train and test Feature Reduction

records for CNN for CNN

Select Excel File

CNN
Classification in
reduced feature
set
LITERATURE SURVEY PAPERS
[1] M.Grace, Y.Zhou, Q.Zhang, S.Zou and X.Jiang, “RiskRanker: Scalable
andaccuratezero-day android malware detection,”inProc.10thInt.Conf. Mobile Syst., Appl.,
Services, 2012, pp. 281–294.

[2] A. P. Felt, E. Chin, S. Hanna, D. Song, and D. Wagner, “Android permissions

demystiﬁed,” in Proc. 18th ACM Conf. Comput. Commun. Security, 2011, pp. 627–638.

[3] W. Enck et al., “TaintDroid: An information-ﬂow tracking system for

realtimeprivacymonitoringonsmartphones,”ACMTrans.Comput.Syst., vol. 32, no. 2, 2014,
Art. no. 5.

[4] D. Arp, M. Spreitzenbarth, M. H¨ubner, H. Gascon, K. Rieck, and C. Siemens,

“DREBIN: Effective and explainable detection of android malware in your pocket,”
presented at Annu. Symp. Netw. Distrib. Syst. Security, 2014.

[5] C. Yang, Z. Xu, G. Gu, V. Yegneswaran, and P. Porras, “DroidMiner: Automated mining
and characterization of ﬁne-grained malicious behaviors
inandroidapplications,”inProc.Eur.Symp.Res.Comput.Security,2014, pp. 163–182.
CONCLUSION
The project focuses on SVM classification algorithms to
effectively detect malware apps.

The dataset is taken from kaggle.

Preprocessed such as zero value, N/A value and unicode

character elimination are not is not carried out here.

Important features are extracted out for better classification.

CONCLUSION (Contd)
In addition, K-NN and CNN based classification
algorithms are caried out to effectively detect
malware apps are also carried out.

Confusion matrix is prepared with accuracy score

calculation.

Accuracy prediction is also carried out.

SAMPLE CODING
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import keras

from sklearn.model_selection import train_test_split

from tensorflow.keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Dense, Dropout
from keras.layers import Flatten, BatchNormalization
train_df = pd.read_csv('./dataset_malwares_train.csv')
test_df = pd.read_csv('./dataset_malwares_test.csv')
train_df.head()
train_data = np.array(train_df.iloc[:, 1:])
test_data = np.array(test_df.iloc[:, 1:])
train_labels =train_df.iloc[:, 0]# to_categorical(train_df.iloc[:, 0])
test_labels = test_df.iloc[:, 0]#to_categorical(test_df.iloc[:, 0])
rows, cols = 7, 5
train_data = train_data.reshape(train_data.shape[0], rows, cols, 1)
test_data = test_data.reshape(test_data.shape[0], rows, cols, 1)
train_data = train_data.astype('float32')
test_data = test_data.astype('float32')
train_data /= 255.0
SAMPLE CODING
test_data /= 255.0
train_x, val_x, train_y, val_y = train_test_split(train_data, train_labels, test_size=0.2)
batch_size = 32#256
epochs = 10
input_shape = (rows, cols, 1)
def baseline_model():
model = Sequential()
model.add(BatchNormalization(input_shape=input_shape))
model.add(Conv2D(32, (3, 3), padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2)))
model.add(Dropout(0.25))
model.add(BatchNormalization())
model.add(Conv2D(32, (3, 3), padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='softmax'))
return model
SCREEN SHOTS
LINEAR REGRESSION
SCREEN SHOTS
KNN ACCURACY SCORE
RECALL VALUE : 0.86
SCREEN SHOTS
THANK YOU

malware.ppt
No ratings yet
malware.ppt
10 pages
Malware Detection
No ratings yet
Malware Detection
24 pages
Malware Detection Using Machine Leaning
No ratings yet
Malware Detection Using Machine Leaning
9 pages
Malware Detection
No ratings yet
Malware Detection
37 pages
Android Malware Detection Using Deep Learning
No ratings yet
Android Malware Detection Using Deep Learning
6 pages
Thesis
No ratings yet
Thesis
76 pages
Second
No ratings yet
Second
21 pages
Malware Detection Using Machine Learning
No ratings yet
Malware Detection Using Machine Learning
2 pages
PermDroid - A framework for Android malware detection
No ratings yet
PermDroid - A framework for Android malware detection
38 pages
Malware - Detection - Using - Machine - Learning (2) - Removed
No ratings yet
Malware - Detection - Using - Machine - Learning (2) - Removed
31 pages
BlackBook-Report FY-ML MalwareDetection1
No ratings yet
BlackBook-Report FY-ML MalwareDetection1
48 pages
2659 Emmela Venkata Pavan Kalyan Compressed
No ratings yet
2659 Emmela Venkata Pavan Kalyan Compressed
34 pages
ISAA Lab DA 5 KRISH
No ratings yet
ISAA Lab DA 5 KRISH
11 pages
Lu Qikai 202109 MSC
No ratings yet
Lu Qikai 202109 MSC
57 pages
Android Malware Detection Using DeepLearning
No ratings yet
Android Malware Detection Using DeepLearning
34 pages
Malwarepjct PDF
No ratings yet
Malwarepjct PDF
70 pages
Android Malware Detection
No ratings yet
Android Malware Detection
12 pages
FYP GROUP 2 Presentation-Proposal 1 - Copy
No ratings yet
FYP GROUP 2 Presentation-Proposal 1 - Copy
23 pages
Mohak-RR
No ratings yet
Mohak-RR
57 pages
Major Project 1 - Copy[1]
No ratings yet
Major Project 1 - Copy[1]
11 pages
126001974
No ratings yet
126001974
11 pages
Booz 2018
No ratings yet
Booz 2018
6 pages
Presentation 12 (6)
No ratings yet
Presentation 12 (6)
11 pages
Final Research PPT
No ratings yet
Final Research PPT
12 pages
Convolutional Neural Networks For Malware Classification
100% (1)
Convolutional Neural Networks For Malware Classification
100 pages
Naal
No ratings yet
Naal
38 pages
Android Malware Detection Using Deep Learning
No ratings yet
Android Malware Detection Using Deep Learning
3 pages
Classifying_Malware_Represented_as_Control_Flow_Graphs_using_Deep_Graph_Convolutional_Neural_Network
No ratings yet
Classifying_Malware_Represented_as_Control_Flow_Graphs_using_Deep_Graph_Convolutional_Neural_Network
12 pages
sig_camera_ready
No ratings yet
sig_camera_ready
9 pages
Obfuscated Malware Detection Using Dilated Convolutional Network
0% (1)
Obfuscated Malware Detection Using Dilated Convolutional Network
6 pages
Automated Malware Detection Project R1
No ratings yet
Automated Malware Detection Project R1
10 pages
Malware Application Detection Using Machine Learning
No ratings yet
Malware Application Detection Using Machine Learning
7 pages
Malware Detection
No ratings yet
Malware Detection
29 pages
wireless security 1 (1)
No ratings yet
wireless security 1 (1)
16 pages
Are - CNN - Based - Malware - Detection - Models - R
No ratings yet
Are - CNN - Based - Malware - Detection - Models - R
2 pages
Adversarial Examples For Malware Detection: Abstract
No ratings yet
Adversarial Examples For Malware Detection: Abstract
18 pages
Malware Detection
No ratings yet
Malware Detection
38 pages
P.E.S. College of Engineering, MANDYA, 571401: Identifying The Android Malware Using Machine Learning Algorithm
No ratings yet
P.E.S. College of Engineering, MANDYA, 571401: Identifying The Android Malware Using Machine Learning Algorithm
34 pages
Supervised Malware Detection Model
No ratings yet
Supervised Malware Detection Model
21 pages
Convolutional Layer Implementation To Classify Malware in Banking Financial Services Industry
No ratings yet
Convolutional Layer Implementation To Classify Malware in Banking Financial Services Industry
100 pages
Network_Malware_Detection_Using_Deep_Learning_Netw
No ratings yet
Network_Malware_Detection_Using_Deep_Learning_Netw
26 pages
Final Synposis
No ratings yet
Final Synposis
10 pages
malware_detection_research_paper_updated Soheb6
No ratings yet
malware_detection_research_paper_updated Soheb6
8 pages
Comparing_Transformers_and_CNN_Approaches_for_Malware_Detection_A_Comprehensive_Analysis
No ratings yet
Comparing_Transformers_and_CNN_Approaches_for_Malware_Detection_A_Comprehensive_Analysis
6 pages
Radon Transform Based Malware Classification in Cyb 2024 Results in Control
No ratings yet
Radon Transform Based Malware Classification in Cyb 2024 Results in Control
14 pages
First Review B19
No ratings yet
First Review B19
24 pages
Information Security Project
No ratings yet
Information Security Project
7 pages
Malware_Detection_Using_Machine_Learning (1)
No ratings yet
Malware_Detection_Using_Machine_Learning (1)
4 pages
Malware Detection Using Machine Learning and Deep Learning
No ratings yet
Malware Detection Using Machine Learning and Deep Learning
10 pages
Malware Detection Research Paper Updated Soheb6
No ratings yet
Malware Detection Research Paper Updated Soheb6
6 pages
IEEE_Conference_Template__1_
No ratings yet
IEEE_Conference_Template__1_
4 pages
Malware - Detection - Using - Machine - Learning (3) - Removed
No ratings yet
Malware - Detection - Using - Machine - Learning (3) - Removed
31 pages
Tuning The K Value in K-Nearest Neighbors For Malware Detection
No ratings yet
Tuning The K Value in K-Nearest Neighbors For Malware Detection
8 pages
Malcode Detection
No ratings yet
Malcode Detection
5 pages
Robust Malware Detection For Iot Devices Using Deep Eigenspace Learning
No ratings yet
Robust Malware Detection For Iot Devices Using Deep Eigenspace Learning
12 pages
A07_27SI
No ratings yet
A07_27SI
13 pages
Improved Chimp Optimization Algorithm (ICOA) Feature Selection and Deep Neural Network Framework For Internet of Things (IOT) Based Android Malware Detection
No ratings yet
Improved Chimp Optimization Algorithm (ICOA) Feature Selection and Deep Neural Network Framework For Internet of Things (IOT) Based Android Malware Detection
8 pages
24 # Assignment (Nuclear Physics) Student Copy
No ratings yet
24 # Assignment (Nuclear Physics) Student Copy
5 pages
BS81xC-xv130
No ratings yet
BS81xC-xv130
35 pages
Diafaan SMS Server 4.3 Manual
No ratings yet
Diafaan SMS Server 4.3 Manual
177 pages
Land Slide Iot
No ratings yet
Land Slide Iot
74 pages
Minor Proj Report Template 2023 24
No ratings yet
Minor Proj Report Template 2023 24
62 pages
NEP9800 En1-8
No ratings yet
NEP9800 En1-8
76 pages
Linear Programming Transportation Models: 2024/4/15, 5 34 PM Page 1 of 39
No ratings yet
Linear Programming Transportation Models: 2024/4/15, 5 34 PM Page 1 of 39
39 pages
Notes Manager Formatted Final For Bharathidasan University
No ratings yet
Notes Manager Formatted Final For Bharathidasan University
55 pages
An Adventurer's Guide To Number Theory (The History of Science) by Richard Friedberg
No ratings yet
An Adventurer's Guide To Number Theory (The History of Science) by Richard Friedberg
6 pages
Recent Advances in Android Mobile Malware Detection: A Systematic Literature Review
No ratings yet
Recent Advances in Android Mobile Malware Detection: A Systematic Literature Review
32 pages
Retrosynthesis
No ratings yet
Retrosynthesis
3 pages
ApplicationProjects 2023 2024
No ratings yet
ApplicationProjects 2023 2024
18 pages
BasePaper Cervic
No ratings yet
BasePaper Cervic
18 pages
7 PythonDyslexia USETHIS jdr20230059
No ratings yet
7 PythonDyslexia USETHIS jdr20230059
9 pages
2079466-managing foundation object translation-EC
No ratings yet
2079466-managing foundation object translation-EC
3 pages
preparation of aspirin
No ratings yet
preparation of aspirin
17 pages
Atomic Radius
No ratings yet
Atomic Radius
2 pages
Building a GraphQL API with Django
No ratings yet
Building a GraphQL API with Django
15 pages
Design and Finite Element Analysis of Shell & Tube Heat Exchanger Using Nano Fluids
No ratings yet
Design and Finite Element Analysis of Shell & Tube Heat Exchanger Using Nano Fluids
87 pages
4760 - Moving Charges and Magnetism - Solution
No ratings yet
4760 - Moving Charges and Magnetism - Solution
10 pages
Presentation-2-Soil-Conservation
No ratings yet
Presentation-2-Soil-Conservation
39 pages
10 UrinaryBladderCancer PDF
No ratings yet
10 UrinaryBladderCancer PDF
5 pages
PR-YOLO Improved YOLO For Fast Protozoa Classifica
No ratings yet
PR-YOLO Improved YOLO For Fast Protozoa Classifica
10 pages
7 PythonDyslexia s2.0 S0169260720315595 Main
No ratings yet
7 PythonDyslexia s2.0 S0169260720315595 Main
9 pages
Car Price Prediction
No ratings yet
Car Price Prediction
8 pages
Old Journal Heart Disease
No ratings yet
Old Journal Heart Disease
8 pages
2022-Heart Disease Prediction Using Machine Learning Techniques Publication
No ratings yet
2022-Heart Disease Prediction Using Machine Learning Techniques Publication
7 pages
7 PythonDyslexia jdr20230059
No ratings yet
7 PythonDyslexia jdr20230059
10 pages
SCSI
No ratings yet
SCSI
16 pages
Boiler Question & Answer For COC Exam
No ratings yet
Boiler Question & Answer For COC Exam
46 pages
LarLinAlg8 LecturePPTs 06 01
No ratings yet
LarLinAlg8 LecturePPTs 06 01
29 pages
Implementing Model-View-Presenter in QT
No ratings yet
Implementing Model-View-Presenter in QT
4 pages
Cricket Stadium Seat Booking J SP Mysql
No ratings yet
Cricket Stadium Seat Booking J SP Mysql
3 pages
Reference Signal Power Boosting in LTE
100% (1)
Reference Signal Power Boosting in LTE
6 pages
Quarkxpress: More On Bezier Curves
No ratings yet
Quarkxpress: More On Bezier Curves
4 pages
Worksheeth - Extraction of Metal
No ratings yet
Worksheeth - Extraction of Metal
4 pages
Mac List
No ratings yet
Mac List
13 pages
19.3polansky On Klein
No ratings yet
19.3polansky On Klein
5 pages
Making Origami: Lm-Handicrafts
No ratings yet
Making Origami: Lm-Handicrafts
4 pages
Logo Design 45
No ratings yet
Logo Design 45
3 pages
Pre-Board Examination Electronics Engineering SET A: University of San Agustin
No ratings yet
Pre-Board Examination Electronics Engineering SET A: University of San Agustin
8 pages
Synthesis of Propylene Glycol by Reactive Distillation: S. Galán, R. de María & F. Domingo
No ratings yet
Synthesis of Propylene Glycol by Reactive Distillation: S. Galán, R. de María & F. Domingo
21 pages
External Alarm Schematic
No ratings yet
External Alarm Schematic
6 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet

PythonMalware FirstReview

Uploaded by

PythonMalware FirstReview

Uploaded by

SIGNIFICANT

•It develops three levels of pruning by mining the permission data

•to identify the most significant permissions that can be effective in

•AppliesLinear Regression, SVM, KNN and CNN classification on the

 To consider SVM/KNN Classification so that probability

 To reduce features (based on unique values in

 The dataset is taken from kaggle. Preprocessed such as

 Important features are not extracted out for better

 Confusion matrix is not prepared with accuracy score

Feature reduction before malware identification

Data columns with numeric values only take

 Important features are extracted out for better

 Confusion matrix is prepared with accuracy score

Convolutional Neural Network based

15600 training records and 4000 test records

 Feature reduction before malware identification is

 KNN supports well even if the dataset size is big.

 Convolutional Neural Network based prediction model

Hard Disk Capacity : 500 GB

RAM : 4 GB DDR RAM

Monitor : 17inch Color

Keyboard : 102 keys

Mouse : Optical Scroll

Operating System : Windows 10 Pro

Language : Python 3.7

DATA SET SUBSETTING BASED ON

Linear Regression, Support Vector Machine/K-

Convolutional Neural Network based

The dataset which contains 79 columns,

This is the input for the project in which

The text (categorical) columns are converted into numerical

Then the model is trained with training data and then

Of which, most of the apps are classified as Benign and

DATA SET PREPROCESS CLASSIFICATION

Remove N/A values LR/SVM/KNN

Subset train and test Feature Reduction

Select Excel File

[2] A. P. Felt, E. Chin, S. Hanna, D. Song, and D. Wagner, “Android permissions

[3] W. Enck et al., “TaintDroid: An information-ﬂow tracking system for

[4] D. Arp, M. Spreitzenbarth, M. H¨ubner, H. Gascon, K. Rieck, and C. Siemens,

The dataset is taken from kaggle.

Preprocessed such as zero value, N/A value and unicode

Important features are extracted out for better classification.

Confusion matrix is prepared with accuracy score

Accuracy prediction is also carried out.

from sklearn.model_selection import train_test_split

You might also like