0% found this document useful (0 votes)
18 views15 pages

API Frequency Detection Model

This paper proposes a combination method for Android malware detection using control flow graphs, machine learning algorithms, and three detection models: 1) An API frequency detection model that constructs training data from control flow graphs and trains a deep neural network classifier. 2) An API sequence detection model that digitizes API sequences, constructs data sets, and trains an LSTM model. 3) A combination method that leverages the three models to improve accuracy over the individual models while balancing resource consumption.

Uploaded by

Dalia mohammed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views15 pages

API Frequency Detection Model

This paper proposes a combination method for Android malware detection using control flow graphs, machine learning algorithms, and three detection models: 1) An API frequency detection model that constructs training data from control flow graphs and trains a deep neural network classifier. 2) An API sequence detection model that digitizes API sequences, constructs data sets, and trains an LSTM model. 3) A combination method that leverages the three models to improve accuracy over the individual models while balancing resource consumption.

Uploaded by

Dalia mohammed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

A Combination Method for Android Malware

Detection Based on Control Flow Graphs and


Machine Learning Algorithms
summarizing paper
By :Dalia Mohammad
Sondos alzeer
Aya alsharif
Supervisor : Emma Qumsiyeh
What is android malware?
2- API FREQUENCY DETECTION
MODEL
DATA SET CONSTRUCTION
describing a process for generating training data for machine learning
• This process :
• 1- constructing data sets of system APIs,
traversing the nodes of a control flow graph (CFG) using
a breadth-first search (BFS) algorithm .

2- storing the results in the form of key-value pairs


where the keys are the nodes represented as APIs and the
values are the frequencies at which the nodes appear
TRAINING STAGE
• This process :
1-use a deep neural network (DNN)
2- use a 2-class classification model and train the model using a DNN algorithm.

The input to the model will be a vector V consisting of the frequencies of the APIs and a label
indicating whether the application is malicious or benign.
TESTING STAGE
For each vector V, we calculate the classification result with input and
the coefficients trained in the training stage, and compare the result
with the input label.

• We use standard classification metrics such as Precision, Recall and


F-measure to test the performance of the model.
To evaluate the performance of the model
- Precision is a measure of the accuracy of the classifier when it predicts
the positive class.

- Recall is a measure of the classifier's ability to detect the positive class

- F-measure is a measure that combines precision and recall.


Static detection methods
for Android malware detection can be divided into several categories based on the approach used:

1- Permission-based methods
2- API-based methods
3- Structural feature-based methods
4-Machine learning-based methods

In summary, each of these approaches has its own strengths and limitations, and a combination of these
approaches may be necessary to effectively detect Android malware.
4- API sequence detection model

dummyMainMethod() > a() > d()


dummyMainMethod() > a() > b() > e()
dummyMainMethod() > a() > b() > f ()
dummyMainMethod() > b() > e()
dummyMainMethod() > b() > f ()
dummyMainMethod() > c() > f ()
dummyMainMethod() > c() > g()
The construction of the model:
• Route construction: • Route digitizing
R = {a, e, c, d}
1_ the last API in each route is the system API
2_The edges in the array are chronological. And the
S = {a, b, c, d, e, f }
DFS algorithms is used
R in digital = {1, 5, 3, 4}
• DATA SET CONSTRUCTION
R1 = {a, b, c, f }
R2 = {e, e, a, b}
S = {a, b, c, d, e, f }
the vector of R1 = {1, 2, 3, 6, 1}
Labels : 1 is malicious app
the vector of R2 = {5, 5, 1, 2, 0} 0 is benign app
• Training stage
• Testing stage
• LSTM approach

API sequence detection model with LSTM.


4- COMBINATION METHOD
the three detection models are far better than weak learners
1) ACCURACY COMPARISON
2) RESOURCE CONSUMPTION COMPARISON
Accuracy evaluation

You might also like