Malicious Application Detection Using Machine Learning
Malicious Application Detection Using Machine Learning
on
Malicious Application Detection Using Machine Learning
Women
(NBA Accredited EEE, ECE, CSE, IT B.Tech. Courses,
Accredited by NAAC with ‘A’ Grade)
(Approved by AICTE, New Delhi and Affiliated to JNTUH, Hyderabad)
Bachupally, Hyderabad – 500090
May 2021
BVRIT HYDERABAD College of Engineering for Women
(NBA Accredited EEE, ECE, CSE, IT B.Tech. Courses,
Accredited by NAAC with ‘A’ Grade)
(Approved by AICTE, New Delhi and Affiliated to JNTUH, Hyderabad)
Bachupally, Hyderabad – 500090
CERTIFICATE
This is to certify that the project work report entitled “Malicious Application Detection
Using Machine Learning” is a bonafide work carried out by Ms. B. Chinmai
(17WH1A0506), Ms. A. Meghana (17WH1A0558), Ms. K. Krishna Siri (17WH1A0560)
in partial fulfillment for the award of B.Tech degree in Computer Science & Engineering ,
BVRIT HYDERABAD College of Engineering for Women, Bachupally, Hyderabad,
affiliated to Jawaharlal Nehru Technological University Hyderabad, Hyderabad under my
guidance and supervision.
The results embodied in the project work have not been submitted to any other University or
Institute for the award of any degree or diploma.
External Examiner
DECLARATION
We hereby declare that the work presented in this project work entitled “Malicious
Application Detection Using Machine Learning” submitted towards completion of Project
work in IV Year of B.Tech of CSE at BVRIT HYDERABAD College of Engineering for
Women, Hyderabad is an authentic record of our original work carried out under the
guidance of Dr. Ganti Naga Satish, Professor, Department of Computer Science and
Engineering.
We would like to express our sincere thanks to Dr. K. V. N. Sunitha, Principal, BVRIT
HYDERABAD College of Engineering for Women, for her support by providing the
Our sincere thanks and gratitude to Dr. Srinivasa Reddy Konda, Head of the Department of
CSE, BVRIT HYDERABAD College of Engineering for Women, for all timely support and
We are extremely thankful to our Internal Guide, Dr. Ganti Naga Satish, Professor, CSE,
BVRIT HYDERABAD College of Engineering for Women, for her constant guidance and
We would like to record my sincere thankfulness to the major project coordinator Dr. Ganti
Naga Satish, Professor, CSE, for his valuable guidence and skillful management.
ABSTRACT i
LISTOF FIGURES ii
01 INTRODUCTION 01-04
1.1 Objective 01
1.2 Existing System 01
1.2.1 Disadvantages 01
1.3 Proposed System 02
1.3.1 Advantages 02
1.4 Methodology 02
1.5 Dataset 02
1.6 Permission 02
1.7 Combination of Permission 03
1.8 Feature Extraction 04
1.9 Classification 04
02 REQUIREMENTS 05-21
03 DESIGN 24-29
3.1 Introduction 24
3.2 Architecture diagram 24
3.3 UML diagrams 26
3.3.1 Usecase diagram 26
3.3.2 Sequence diagram 27
3.3.3 Activity diagram 28
3.3.5 Class diagram 29
04 IMPLEMENTATION 30-42
4.1 Coding 30
4.2 Data Collection and Balancing Data 31
4.3 Feature Selection 38
4.4 Building SVM Model 40
4.5 Building Decision Tree Classifier 40
4.6 Building Naive Bayes Model 41
4.7 SVM vs Decision Tree vs Naive Bayes 42
05 Testing 44-48
44
5.1 Testing Strategies 45
5.1.1 Unit Testing 45
5.1.2 System Testing 45
5.1.3 Integrated Testing 45
5.1.4 Regression Testing 46
5.2 Test Cases 47
5.3 Execution Screenshots 48
Android plays a vital role in the today's market. According to recent survey placed nearly 84.4% of
people stick to android which explosively become popular for personal or business purposes. It is no
doubt that the application is extremely familiar in the market for their amazing features and the
wonderful benefits of android applications makes the users to fall for it. Android imparts significant
responsibility to application developers for designing the application with understanding the risk of
security issues. When concerned about security, malware protection is a major issue in which android
has been a major target of malicious applications. In android based applications, permission control is
one of the major security mechanisms. In this project, the permission induced risk in application, and
the fundamentals of the android security architecture are explored, and it also focuses on the securit y
ranking algorithms that are unique to specific applications. Hence, we propose the system providing the
detection of malware analysis based on permission and steps to mitigate from accessing unwanted
permission (limits the permission). It is also designed to reduce the probability of vulnerable attacks.
i
LIST OF FIGURES
ii
20 5.3.1 Test Case 42
21 5.4.1 Execution Screenshots 48
22 5.4.2 Dashboard 49
23 5.4.3 Input 50
24 5.4.4 Output 51
iii
1. INTRODUCTION
In recent years, the usages of smart phones are increasing steadily and also growth of Android
application users are increasing. Due to growth of Android application user, some intruder are
creating malicious android application as tool to steal the sensitive data and identity theft /
fraud mobile bank, mobile wallets. There are so many malicious applications detection tools
and software are available.
But an effectively and efficiently malicious applications detection tools needed to tackle and
handlenew complex malicious apps created by intruder or hackers. In this paper we came up
with idea of using machine learning approaches for detecting the malicious android
application. First we have togather dataset of past malicious apps as training set and with the
help of Support vector machine algorithm and decision tree algorithm make up comparision
with training dataset and trained datasetwe can predict the malware android apps
1.1 Objective:
The ultimate aim of the project is to improve permission for detecting the malicious android
mobileapplication using machine learning algorithms.
1.2.1 Disadvantages:
1.3.1 Advantages
Improves the percentages of detection malicious application.
Machine learning is better efficient than Non machine learning algorithm.
Able to detect new malware android applications.
We only need to consider 22 out of 135 permissions to improve the runtime
performance by85.6.
1.4 Methodology
To classify malicious application from benign application a decent dataset is required.The
dataset can be downloaded from debrin dataset. We construct massive experiments, including
516 benign applications and 528 malicious applications. In this section the methodology
followed is discussed in detail.
1.5 Dataset
Proper and large dataset is required for all classification research during the training and the
testing phase. The dataset for the experiment is downloaded from the debrin data set, which
contains different android application permissions and their values. It contains a collection of
android permissions with their names.
1.6 Permission
Permission characterize existing Android malware from various aspects, including the
permissions requested. They identified individually the permissions that are widely requested in
both malicious and benign apps.
According to this work, malicious apps clearly tend to request more frequently on the SMS-
related permissions, such as ‘READ SMS’, ‘WRITE SMS’, ‘RECEIVE SMS’, and ‘SEND
SMS’. we found that malicious apps tend to request more permissions than benign ones. we
found no strong correlation between applications categories and requested permissions, and
introduce a method to visualize permissions usage in different app categories.
The aim of their work is to classify Android applications into several categories such as
entertainment, society, tools, and productivity, multimedia and video, communication, puzzle
and brain games. Mentions a method that analyses manifest files in Android application by
extracting four types of keyword lists:
(1) Permission
1.9 Classification
by combining results from various classifiers, it can be a quick filter to identify more
suspicious applications. And propose a framework that intends to develop a machine learning-
based malware detection system on Android to detect malware applications and to enhance
security and privacy of smart-phone users. This system monitors various permission-based
features and events obtained from the android applications, and analyses these features by
using machine learning classifiers to classify whether the application is benign or malware.
Once, the Support Vector Machine trained online on a dedicated system and only it is
transferred the learned model to the smart-phone for detecting malicious applications.
2 REQUIREMENTS
Software requirements is a field within software engineering that deals with establishing the
needs of stakeholders that are to be solved by software. The IEEE Standard Glossary of
Software Engineering Terminology defines a requirements:
Functional requirements: These are the requirements that the end user specifically
Demands as basic facilities that the system should offer.
Non-functional requirements: These are basically the quality constraints that the
system must satisfy according to the project contract.
The most common set of requirements defined by any operating system or software
application is the physical computer resources, also known as hardware, A hardware
requirements list isoften accompanied by a hardware compatibility list (HCL), especially in
case of operating systems. An HCL lists tested, compatible, and sometimes incompatible
hardware devices for a particular operating system or application. The following sub-sections
discuss the various aspectsof hardware requirements.
The hardware requirement specifies each interface of the software elements and the hardware
elements of the system. These hardware requirements include configuration characteristics.
Storage - 1TB
2.3 Technology Description
2.3.1 Python
Python features a dynamic type system and automatic memory management. It supports
multiple programming paradigms, including object-oriented, imperative, functional and
procedural, and has a large and comprehensive standard library.
Python is Interpreted − Python is processed at runtime by the interpreter. You do not need to
compile your program before executing it. This is similar to PERL and PHP. Pythonis Interactive
− you can actually sit at a Python prompt and interact with the interpreterdirectly to write your
programs.
Python also acknowledges that speed of development is important. Readable and terse code is
part of this, and so is access to powerful constructs that avoid tedious repetition of code.
Maintain ability also ties into this may be an all but useless metric, but it does say something
abouthow much code you have to scan, read and/or understand to troubleshoot problems or
tweak behaviors. This speed of development, the ease with which a programmer of other
languages canpick up basic Python skills and the huge standard library is key to another area
where Python excels. All its tools have been quick to implement, saved a lot of time, and
several of them have later been patched and updated by people with no Python background
without breaking.
2.3.2 Html
HTML stands for Hypertext Markup Language, and it is the most widely used language to
write WebPages.
Hypertext refers to the way in which Web pages (HTML documents) are linked
together. Thus,the link available on a webpage is called Hypertext.
As its name suggests, HTML is a Markup Language which means you use HTML to simply
"mark-up" a text document with tags that tell a Web browser how to structure it to display.
Originally, HTML was developed with the intent of defining the structure of documents like
headings,paragraphs, lists, and so forth to facilitate the sharing of scientific information
between researchers.
Now, HTML is being widely used to format web pages with the help of different tags
available inHTML language.
2.3.3 CSS
Cascading Style Sheets, fondly referred to as CSS, is a simple design language intended to
simplify theprocess of making web pages presentable.
CSS handles the look and feel part of a web page. Using CSS, you can control the color of the
text, the style of fonts, the spacing between paragraphs, how columns are sized and laid out,
what background images or colors are used, layout designs, and variations in display for
different devices and screen sizes as well as a variety of other effects.
CSS is easy to learn and understand but it provides powerful control over the presentation of
an HTMLdocument. Most commonly, CSS is combined with the markup languages HTML or
XHTML.
JavaScript
JavaScript was first known as Live Script, but Netscape changed its name to JavaScript,
possibly because of the excitement being generated by Java. JavaScript made its first
appearance in Netscape
in 1995 with the name Live Script. The general-purpose core of the language has been
embeddedin Netscape, Internet Explorer, and other web browsers.
The ECMA-262 Specification defined a standard version of the core JavaScript language.
2.3.4 Pandas
Pandas is quite a game changer when it comes to analyzing data with Python and it is one of
the most preferred and widely used tools in data wrangling if not the most used one. Pandas is
an opensource.
What’s cool about Pandas is that it takes data (like a CSV or TSV file, or a SQL database) and
creates aPython object with rows and columns called data frame that looks very similar to
table in a statistical software (think Excel or SPSS for example. People who are familiar with
R would see similarities to R too). This is so much easier to work within comparison to
working with lists and/or dictionaries throughfor loops or list comprehension.
2.3.5 Numpy
2.3.6 Matplotlib
2.3.7 Sklearn
2.3.8 Seaborn
Seaborn is a Python data visualization library based on matplotlib. It provides a high-level
interface for drawing attractive and informative statistical graphics.
Data visualization is the discipline of trying to understand data by placing it in a visual
context, so that patterns, trends and correlations that might not otherwise be detected can be
exposed. Python offers multiple great graphing libraries that come packed with lots of
different features. No matter if you want to create interactive, live or highly customized plots
python has a excellent library for you.
The command line program conda is both a package manager and an environment manager, to
16
Department of computer and engineering
help data scientists ensure that each version of each package has all the dependencies it
requires and works correctly.
2.3.10 Flask
MVC Architecture
MVC Components
17
Department of computer and engineering
Model
The Model component corresponds to all the data-related logic that the user works with. This
can represent either the data that is being transferred between the View and Controller
components or any other business logic-related data. For example, a Customer object will
retrieve the customer information from the database, manipulate it and update it data back to
the database or use it to render data.
View
The View component is used for all the UI logic of the application. For example, the
Customer view will include all the UI components such as text boxes, dropdowns, etc. that the
final user interacts with.
Controller
Controllers act as an interface between Model and View components to process all the
business logic and incoming requests, manipulate data using the Model component and
interact with the Views to render the final output. For example, the Customer controller will
handle all the interactions and inputsfrom the Customer View and update the database using
the Customer Model. The same controller will be used to view the Customer data.
18
Department of computer and engineering
2.4 Classification Algorithms
The Classification algorithm is a Supervised Learning technique that is used to identify the
category of new observations on the basis of training data. In Classification, a program learns
from the given dataset or observations and then classifies new observation into a number of
classes or groups. Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc. Classes
can be called as targets/labels or categories. Unlike regression, the output variable of
Classification is a category, not a value, such as "Green or Blue", "fruit or animal", etc. Since
the Classification algorithm is a Supervised learning technique, hence it takes labeled input
data, which means it contains input with the corresponding output. In classification algorithm,
a discrete output function(y) is mapped to input variable(x).
In the proposed System, Svm, Decision tree and Naive Bayes classification algorithms are utilized to
find the algorithm that best classifies the malicious applications from benign applications.
19
Department of computer and engineering
2.4.1 Support Vector Machine Algorithm
Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems. However, primarily, it is used
for Classification problems in Machine Learning
The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n- dimensional space into classes so that we can easily put the new data point in the
correct category in thefuture. This best decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme
cases arecalled as support vectors, and hence algorithm is termed as Support Vector Machine.
Consider the below diagram in which there are two different categories that are classified
using a decision boundaryor hyperplane
20
Department of computer and engineering
Fig 2.4.2 Support Vector Machine Algorithm
Example: SVM can be understood with the example that we have used in the KNN classifier. Suppose
we see a strange cat that also has some features of dogs, so if we want a model that can accurately
identify whether it is a cat or dog, so such a model can be created by using the SVM algorithm. We will
first train our model with lots of images of cats and dogs so that it can learn about different features of
cats and dogs, and then we test it with this strange creature. So as support vector creates a decision
boundary between these two data (cat and dog) and choose extreme cases (support vectors), it will see
the extreme case of cat and dog. On the basis of the support vectors, it will classify it as a cat. Consider
the below diagram:
SVM algorithm can be used for Face detection, image classification, text categorization.
o Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental
analysis,and classifying articles.
Bayes' Theorem:
o Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine the
probability of a hypothesis with prior knowledge. It depends on the conditional
probability.
o The formula for Bayes' theorem is given as:
Where,
P(B|A) is Likelihood probability: Probability of the evidence given that the probability
of a hypothesisis true.
22
Department of computer and engineering
way to display analgorithm that only contains conditional control statements. Decision trees
23
Department of computer and engineering
are commonly used in operations research, specifically in decision analysis, to help identify a
strategy most likely to reach a goal, but are also a popular tool in machine learning. It is a
flowchart-like structure in which each internal node represents a "test" on an attribute (e.g.
whether a coin flip comes up heads or tails), eachbranch represents the outcome of the test,
and each leaf node represents a class label (decision taken after computing all attributes). The
paths from root to leaf represent classification rules.
In decision analysis, a decision tree and the closely related influence diagram are used as a
visual andanalytical decision support tool, where the expected values (or expected utility) of
competing alternatives are calculated.
24
Department of computer and engineering
Working
o Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
o Step-2: Find the best attribute in the dataset using Attribute Selection Measure(ASM).
o Step-3: Divide the S into subsets that contains possible values for the best attributes.
o Step-4: Generate the decision tree node, which contains the best attribute.
o Step-5: Recursively make new decision trees using the subsets of the dataset created in step -
3.Continue this process until a stage is reached where you cannot further classify the nodes
and called the final node as a leaf node.
25
Department of computer and engineering
3. DESIGN
3.1 Introduction
Software design sits at the technical kernel of the software engineering process and is applied
regardless of the development paradigm and area of application. Design is the first step in the
development phase for any engineered product or system. The designer’s goal is to produce a
model or representation of an entity that will later be built. Beginning, once system
requirement have been specified and analyzed, system design is the first of the three technical
activities -design, code and test that is required to build and verify software.
The importance can be stated with a single word “Quality”. Design is the place where quality
is fostered in software development. Design provides us with representations of software that
can assess for quality. Design is the only way that we can accurately translate a customer’s
view into a finished software product or system. Software design serves as a foundation for
all the software engineering steps that follow. Without a strong design we risk building an
unstable system – one that will be difficult to test, one whose quality cannot be assessed until
the last stage.
During design, progressive refinement of data structure, program structure, and procedural
details are developed reviewed and documented. System design can be viewed from either
technical or project management perspective. From the technical point of view, design is
comprised of four activities – architectural design, data structure design, interface design and
procedural design.
Web applications are by nature distributed applications, meaning that they are programs that
run on more than one computer and communicate through network or server. Specifically,
web applications are accessed with a web browser and are popular because of the ease of
using the browser as a user client. For the enterprise, software on potentially thousands of
client computers is a key reason for their popularity. Web applications are used for web mail,
online retail sales, discussion boards, weblogs, online banking, and more. One web
application can be accessed and used by millions of people.
26
Department of computer and engineering
Like desktop applications, web applications are made up of many parts and often contain
mini programs and some of which have user interfaces. In addition, web applications
frequently require an additional markup or scripting language, such as HTML, CSS, or
JavaScript programming language. Also, many applications use only the Python
programming language, which is ideal because of its versatility.
Data pre-processing
Data pruning
27
Department of computer and engineering
Fig 3.2.1 Architecture Diagram
To model a system, the most important aspect is to capture the dynamic behavior. Dynamic
behavior means the behavior of the system when it is running/operating.
Only static behavior is not sufficient to model a system rather dynamic behavior is more
important than static behavior. In UML, there are five diagrams available to model the
dynamic nature and use case diagram is one of them. Now as we have to discuss that the use
case diagram is dynamic in nature, there should be some internal or external factors for
making the interaction. These internal and external agents are known as actors. Use case
diagrams consist of actors, use cases and their relationships. The diagram is used to model the
system/subsystem of an application. A single use case diagram captures a particular
functionality of a system.
Hence to model the entire system, a number of use case diagrams are used
28
Department of computer and engineering
3.3.2 Sequence Diagram
Sequence Diagrams Represent the objects participating the interaction horizontally and time
vertically. A Use Case is a kind of behavioral classifier that represents a declaration of an
offered behavior. Each use case specifies some behavior, possibly including variants that the
subject can perform in collaboration with one or more actors.
Use cases define the offered behavior of the subject without reference to its internal structure.
These behaviors, involving interactions between the actor and the subject, may result in
changes to the state of the subject and communications with its environment. A use case can
include possible variations of its basic behavior, including exceptional behavior and error
handling.
29
Department of computer and engineering
3.3.3 Activity Diagram
30
Department of computer and engineering
3.3.4 Class Diagram
The class diagram is the main building block of object-oriented modelling. It is used for
general conceptual modelling of the systematic of the application, and for detailed modelling
translating the models into programming code. Class diagrams can also be used for data
modelling. The classes in a class diagram represent both the main elements, interactions in
the application, and the classes to be programmed.
31
Department of computer and engineering
4. IMPLEMENTATION
4.1 Coding
app.py
import numpy as np
import pandas as pd
app = Flask(name)
dataset = pd.read_csv('android.csv')
@app.route('/')
def home():
return render_template('index.html')
@app.route('/predict',methods=['POST'])
def predict():
'''
final_features = [np.array(float_features)]
32
Department of computer and engineering
prediction = model.predict( final_features )
if prediction == 1:
elif prediction == 0:
output = pred
ifname == "main":
app.run(debug=True)
33
Department of computer and engineering
android_malware.py
import warnings
warnings.filterwarnings('ignore')
import pandas as pd#cv
import numpy as np
import seaborn as sns
from sklearn.svm import SVC
from sklearn.model_selection import KFold
from sklearn import preprocessing
import matplotlib.pyplot as plt
data=pd.read_csv('android.csv')
data
data.shape
data = data.sample(frac=1).reset_index(drop=True)
data.head()
sns.countplot(x='malware',data=data)
target_count = data.malware.value_counts()
print('Class 0:',target_count[0])
print('Class 1:',target_count[1])
34
Department of computer and engineering
df_class_0 = data[data['malware'] == 0]
df_class_1 = data[data['malware'] == 1]
df_test_over.shape
sns.countplot(x='malware',data=df_test_over)
X=df_test_over.iloc[:,df_test_over.columns !='malware']
Y=df_test_over.iloc[:,df_test_over.columns =="malware"]
X.head()
Y.head()
X, Y=shuffle(X, Y)
X.head()
X=X.drop(columns='name')
X.head()
Y.head()
35
Department of computer and engineering
bestfeatures = SelectKBest(score_func=chi2, k=10)
fit = bestfeatures.fit(X,Y)
dfscores = pd.DataFrame(fit.scores_)
dfcolumns = pd.DataFrame(X.columns)
featureScores = pd.concat([dfcolumns,dfscores],axis=1)
featureScores.columns = ['Specs','Score']
featureScores.nlargest(10,'Score')
X_train.shape
X_train.head()
y_train.head()
support.fit(X_train,y_train) 35
Department of computer and engineering
y_pred = support.predict(X_test)
y_pred
model1=metrics.accuracy_score(y_test,y_pred)
print(model1)
cnf_matrix = confusion_matrix(y_test,y_pred)
labels = ['Good','Malware']
sns.heatmap(cnf_matrix, annot=True, cmap="YlGnBu", fmt=".3f", xticklabels=labels, yticklabels=labels)
plt.show()
tree = DecisionTreeClassifier()
tree.fit(X_train,y_train)
y_pred = tree.predict(X_test)
y_pred
model2=metrics.accuracy_score(y_test,y_pred)
print(model2)
cnf_matrix = confusion_matrix(y_test,y_pred)
labels = [0,1]
sns.heatmap(cnf_matrix, annot=True, cmap="YlGnBu", fmt=".3f", xticklabels=labels, yticklabels=labels)
plt.show()
36
Department of computer and engineering
clf = GaussianNB()
clf.fit(X, Y)
y_pred = clf.predict(X_test)
y_pred
model3=metrics.accuracy_score(y_test,y_pred)
print(model3)
cnf_matrix = confusion_matrix(y_test,y_pred)
labels = [0,1]
sns.heatmap(cnf_matrix, annot=True, cmap="YlGnBu", fmt=".3f", xticklabels=labels, yticklabels=labels)
plt.show()
objects = ('SVM','DecisionTreeClassifier','GaussianNB')
y_pos = np.arange(len(objects))
performance = [model1,model2,model3]
plt.show()
4.2 Data Collection and Balancing data
We collected data comparing the true and false positive rate of our classifier, shown below.
using selectkBest we have selected the best attributes/ permissions/ features from the given
attributes
After building the svm model, we have trained and tested the data using the built svm model
and obtained a accuracy of 87.5
After building the Decision tree model, we have trained and tested the data using the built
decision tree model and obtained a accuracy of 95.8
Fig 4.5.1 Building Decision Tree Classification Model
After building the Bayesian model, we have trained and tested the data using the built
Bayesian model and obtained a accuracy of 54.1
Software testing is a critical element of software quality assurance and represents the ultimate
review of specification, design and coding. The increasing visibility of software as a system
element and attendant costs associated with a software failure are motivating factors for we
planned, through testing. Testing is the process of executing a program with the intent of
finding an error. The design of tests for software and other engineered products can be as
challenging as the initial design of the product itself.
There of basically two types of testing approaches.
One is Black-Box testing – the specified function that a product has been designed to
perform, tests can be conducted that demonstrate each function is fully operated.
The other is White-Box testing – knowing the internal workings of the product, tests can be
conducted to ensure that the internal operation of the product performs according to
specifications and all internal components have been adequately exercised.
White box and Black box testing methods have been used to test this package. The entireloop
constructs have been tested for their boundary and intermediate conditions. The test data was
designed with a view to check for all the conditions and logical decisions. Error handling has
been taken care of by the use of exception handlers.
Testing is a set of activities that can be planned in advanced and conducted systematically. A
strategy for software testing must accommodation low-level tests that are necessary to verify
that a small source code segment has been correctly implemented as well as high-level tests
that validate major system functions against customer requirements.
Software testing is one element of verification and validation. Verification refers to the set of
activities that ensure that software correctly implements as specific function. Validation refers
to a different set of activities that ensure that the software that has been built is traceable to
customer requirements.
The main objective of software is testing to uncover errors. To fulfill this objective, a series of
test steps unit, integration, validation and system tests are planned and executed. Each test
step is accomplished through a series of system static test technique that assist in the design of
test cases. With each testing step, the level of abstraction with which software is considered is
broadened.
Testing is the only way to assure the quality of software and it is an umbrella activity rather
than a separate phase. This is an activity to be performed in parallel with the software effort
and one that consists of its own phases of analysis, design, implementation, execution and
maintenance.
Each time a new module is added as a part of integration as the software changes. Regression
testing is an actually that helps to ensure changes that do not introduce unintended behavior as
additional errors.
Regression testing maybe conducted manually by executing a subset of all test cases or using
automated capture play back tools enables the software engineer to capture the test case and
results for subsequent playback and compression. The regression suit contains different
classes of test cases.
A representative sample to tests that will exercise all software functions.
Additional tests that focus on software functions that are likely to be affected by the change.
5.2 Test Cases
Integrated and regression testing strategies are used in this application for testing.
In conclusion, our project can identify, with moderate success, applications that pose a
potential threat based on the permissions that they request. Our application can scan
applications on a phone at any time, and alerts the user to do so when an installation or app
update occurs. We believe that this is an important step in preventing Android malware,
because this application brings to the user’s attention all the possibly dangerous applications,
allowing them to scrutinize the applications that they trust more carefully. This in turn will
help users become more security-conscious overall.
Even so, this is only a first step. Future work for this project will include increasing the
accuracy of the classifier, migrating the Python portions of this project to Java, and integrating
more advanced methods of detecting malicious behavior such as looking at API calls (this
follows a "defense in depth" strategy). One benefit of the decision tree classifier is its speed. It
can serve as a preliminary screen for more advanced but slower methods, to focus the
applications they will inspect. Lastly, taking into account application categories such as being
a game or email-client would also help detect suspicious permissions and behaviors.
References