0% found this document useful (0 votes)
7 views57 pages

Document For Final Project

The project report focuses on securing Android devices through machine learning-based malware detection, addressing the increasing prevalence of malware due to the open-source nature of the Android operating system. It aims to develop a deep learning model that accurately identifies malware-infected applications without installation, enhancing pre-installation security checks. The report reviews various machine learning techniques and their effectiveness in combating sophisticated Android malware attacks, ultimately contributing to improved mobile cybersecurity mechanisms.

Uploaded by

bborigarla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views57 pages

Document For Final Project

The project report focuses on securing Android devices through machine learning-based malware detection, addressing the increasing prevalence of malware due to the open-source nature of the Android operating system. It aims to develop a deep learning model that accurately identifies malware-infected applications without installation, enhancing pre-installation security checks. The report reviews various machine learning techniques and their effectiveness in combating sophisticated Android malware attacks, ultimately contributing to improved mobile cybersecurity mechanisms.

Uploaded by

bborigarla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

A

Project Report on
SECURING ANDROID DEVICES THROUGH MACHINE
LEARNING BASED ON MALWARE DETECTION
Submitted to
N.B.K.R INSTUTE OF SCIENCE AND TECHNOLOGY
(Atonomous)
Affiliated to JNTUA, Anantapuramu
in partial fulfillment of the requirements for the award of the Degree of

BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING

Submitted by
Batch No: A12

A. RAMESWAR (21KB5A0515)
J. VISHNU VARDHAN REDDY (21KB1A0561)
D. KUSUMANJALI (21KB1A0539)
K. ASRITHA (21KB1A0563)

Under the Guidance of


Mr. K. Raveendra Chaithanya M.Tech(Ph.D)
Assistant Professor

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


N.B.K.R. INSTITUTE OF SCIENCE AND TECHNOLOGY
(Autonomous)
VIDYANAGAR, TIRUPATI DIST, AP – 524413
April-2025
Website: www.nbkrist.org. Ph: 08624-228 247
Email: [email protected]. Fax: 08624-228 257

N.B.K.R. INSTITUTE OF SCIENCE & TECHNOLOGY


(Autonomous)
(Approved by AICTE: Accredited by NBA: Affiliated to JNTUA,Anantapuramu)
An ISO 9001-2000 Certified Institution
Vidyanagar -524 413, Tirupati District, Andhra Pradesh, India

BONAFIDE CERTIFICATE

This is to certify that the project work entitled “ SECURING ANDROID DEVICES
THROUGH MACHINE LEARNING BASED ON MALWARE DETECTION ” is a
bonafide work done by A RAMESWAR(21KB5A0515), J. VISHNU VARDHAN REDDY
(21KB1A0561), D. KUSUMANJALI (21KB1A0539) , K. ASRITHA (21KB1A0563) in the
Department of Computer Science & Engineering, N.B.K.R. Institute of Science &
Technology, Vidyanagar and is submitted to JNTUA, Anantapuramu in the partial fulfillment
for the award of B.Tech degree in Computer Science & Engineering. This work has been
carried out under my supervision.

Dr. K. Raveendra Chaithanya Dr. A. Raja Sekhar Reddy


Assistant Professor Professor & Head
Department of CSE Department of CSE
NBKRIST, Vidyanagar NBKRIST, Vidyanagar

Submitted for the Viva-Voce Examination held on

Examiner-1 Examiner-2
DECLARATION

We hereby declare that the project report entitled “SECURING ANDROID


DEVICES THROUGH MACHINE LEARNING BASED ON MALWARE
DETECTION” done by us under the guidance of Mr. K.Raveendra Chaithanya and is
submitted in partial fulfillment of the requirements for the award of the Bachelor’s degree in
Computer Science and Engineering. This project is the result of our own effort and it has
not been submitted to any other University or Institution for the award of any degree or
diploma other than specified above.

A. RAMESWAR (21KB5A0515)
J. VISHNU VARDHAN REDDY (21KB1A0561)
D. KUSUMANJALI. (21KB1A0539)
K. ASRITHA (21KB1A0563)
ACKNOWLEDGEMENT
We are thankful to our guide Mr. K.Raveendra Chaithanya for her valuable
guidance and encouragement. His helping attitude and suggestions have helped us in the
successful completion of the project.
We would like to express our gratefulness and sincere thanks to Dr. A. Raja Sekhar
Reddy, Head of the Department of COMPUTER SCIENCE AND ENGINEERING, for
his kind help and encouragement during the course of our study and in the successful
completion of the project work.
We have great pleasure in expressing our hearty thanks to our beloved Directed Dr.
V. Vijaya Kumar Reddy, for spending his valuable time with us to complete this project.
Successful completion of any project cannot be done without proper support and
encouragement. We sincerely thank to the Management for providing all the necessary
facilities during the course of study.
We would like to thank our parents and friends, who have the greatest contributions
in all our achievements, for the great care and blessings in making us successful in all our
endeavors.

A. RAMESWAR (21KB5A0515)
J. VISHNU VARDHAN REDDY (21KB1A0561)
D. KUSUMANJALI. (21KB1A0539)
K. ASRITHA (21KB1A0563)
TABLE OF CONTENTS
Chapter No. Description Page No.

Abstract i
List of Figure ii

1 Introduction 1
2 Project Description 2
2.1 Problem Definition 2
2.2 Project Details 2
3 Computational Environment 3
3.1 Software Specification 5
3.2 Hardware Specification 5
3.3 Software Features 6
4 Feasibility Study 14
4.1 Technical Feasibility 14
4.2 Social Feasibility 15
4.3 Economical Feasibility 15
5 System Analysis 16
5.1 Existing System 16
5.1.1 Drawbacks of existing system 18
5.2 Proposed System 18
5.2.1 Advantages of proposed System 20
6 System Design 21
6.1 UML Diagrams 23
6.1.1 Class Diagram 25
6.1.2 Use case Diagram 26
6.1.3 Sequence Diagram 30
6.1.4 Activity Diagram 31
6.1.5 Deployment Diagram 34
7 System Implementation 35
7.1 Implementation Process 35
7.2 Modules 35
8 Testing 37
8.1 Unit Testing 38
8.2 Integration Testing 39
8.3 System Testing 40
8.4 Acceptance Testing 41
9 Sample Source Code 43
10 Screen Layouts 46
11 Conclusion and Future Scope 48
12 Bibliography 49
References 49
Websites 50
ABSTRACT

Android has become the most standard smartphone operating system. The rapidly growing
acceptance of android has resulted in significant increase in the number of malwares when compared
with earlier years.
There exists plenty of antimalware programs which are designed to efficiently protect the user’s
sensitive data in mobile systems from such attacks. Here, I have examined the different android
malwares and their methods based on deep learning that are used for attacking the devices and
antivirus programs that act against malwares to care for Android systems.
Then, we have discuss on different deep learning based android malware detection techniques
such as, Maldozer, Droid Detector, Droidv DeepLearner, Deep Flow, Droid Delver and Droid Deep.
We aim to implement a model based on deep learning that can automatically identify whether an
android application is malware infected or not without installation.

The ultimate aim of this study is to design and implement a deep learning-based model capable
of automatically and accurately identifying whether an Android application is malware-infected or not,
without the need for installation. Our approach seeks to enhance pre-installation security checks,
minimize false positives, and provide an efficient, scalable solution for Android malware detection.

Through this work, we hope to contribute towards the advancement of intelligent, adaptive
security mechanisms that can keep pace with the rapidly evolving landscape of mobile cybersecurity
threats.

(i)
LIST OF FIGURES

S.NO. FIGURE NO DESCRIPTION PAGE NO.

1 Fig .3.1 Big data platform manifesto 11


2 Fig .3.2 KDD process stages 13
3 Fig .6.1 Use case diagram 26
4 Fig .6.2 Sequence diagram 30
5 Fig .6.3 Collaboration diagram 31
6 Fig .6.4 Activity diagram 33
7 Fig .6.5 Dataflow diagram 34
8 Fig .10.1 Accuracy prediction through curve 46
9 Fig .10.2 Malware detection through app 47

(ii)
Securing Android Devices Through Machine Learning Based on Malware Detection

1. INTRODUCTION

In our daily life Mobile Applications have become an essential part since countless facilities are
providing to us by using Mobile Apps. It will change the way of communication, as the apps are
installed on most of the smart devices. Mobile devices have refined sensors like cameras, gyroscopes,
microphones and GPS. These several sensors open up entire innovative world of applications for the
users and create massive quantities of data containing highly complex data.
Security solutions are therefore needed to defend operators from malicious applications that exploit
the complexity of smart devices and their complex data. Android OS physically grows through the
power of a wide range of smart devices. In mobile computing industry, it has largest part with 85% in
2017 due to its vulnerable source distribution.

Currently on Android platforms to defend against malware is a risky communication system that
notifies users for the required permissions earlier each application is installed. This system is slightly
ineffective because it offers permissions on its personal. To distinguish malware from benign
applications, the user want excessively much methodical knowledge.

The same permissions are required for the both benign and malicious application, consequently
we cannot be distinguished by this permission based system. Generally, the permission based
methodologies are largely not developed for the detection of malware, but it is used for the risk
assessment.

The Android Operating System make malware more difficult for the installation and execution,
because of the Android itself provide a several security solution for example Android permission and
Google’s Bouncer to address the progressively widespread security threats. Every Android application
need to ask the user for the permission to execute certain task on Android devices, such as transfer
SMS message, during the installation process.

Most of the users are allow the permission without even considering what kinds of permissions
they demand thus the Android permission system is knowingly weaken. Accordingly, the Android
permission system spread the malicious apps itself and it is very challenging in training.

DEPT OF CSE, N.B.K.R.I.S.T Page 1


Securing Android Devices Through Machine Learning Based on Malware Detection

2. PROJECT DESCRIPTION

2.1 PROBLEM DEFINITION

The open source nature of Android Operating System has attracted wider adoption of the
system by multiple types of developers. This phenomenon has further fostered an exponential
proliferation of devices running the Android OS into different sectors of the economy. Although this
development has brought about great technological advancements and ease of doing businesses
(ecommerce) and social interactions, they have however become strong mediums for the uncontrolled
rising cyberattacks and espionage against business infrastructures and the individual users of these
mobile devices. Different cyberattacks techniques exist but attacks through malicious applications
have taken the lead aside other attack methods like social engineering. Android malware have evolved
in sophistications and intelligence that they have become highly resistant to existing detection systems
especially those that are signature based. Machine learning techniques have risen to become a more
competent choice for combating the kind of sophistications and novelty deployed by emerging
Android malwares. The models created via machine learning methods work by first learning the
existing patterns of malware behaviour and then use this knowledge to separate or identify any such
similar behaviour from unknown attacks. This paper provided a comprehensive review of machine
learning techniques and their applications in Android contemporary literature.

2.2 PROJECT DETAILS

Research has shown that Android malware analysis can be done in three different ways: The
first method involves the deployment of static [1] and dynamic [2]. Investigation of code of application
in order to spot components that are malicious before loading the application into any device; The
second method involve modification of the Android system in order to put in modules for monitoring
and interception of abnormal behaviours that may occur on the device [3,4,5] while the third approach
involve engaging virtualization to implement the separation of domains ranging from lightweight
isolation of an application on the device to running multiple instances of

DEPT OF CSE, N.B.K.R.I.S.T Page 2


Securing Android Devices Through Machine Learning Based on Malware Detection

Android OS on the same device [6,7]. However, recent study has shown that machine learning
or “anomaly detection” approaches have now emerged to become a leading and more effective
approach for defeating Android malware [8, 9, 10, 11].
Unlike the static analysis techniques that involves the manual examination of the
AndroidManifest.xml file, source files and the Dalvik byte code, and the Dynamic analysis that
involves running an application in a controlled environment to study its behaviour, the Machine
Learning approach involves learning the general rules and patterns from benign and malicious app
samples and then allowing data-driven predictions of decisions, such as classification [12]. Machine
learning methodologies largely depends on static attributes extracted from an application [13]. The
static components of an Android application provide the baseline upon which machine learning
approaches are anchored and these static features are carefully gotten through the process of reverse
engineering. Machine learning techniques have been applied widely for the classification of
applications, focusing mainly on generic malware detection. The application of machine learning in
Android malware detection helps eliminate the difficulty involved with manually crafting and
updating detection patterns [8]. Machine Learning is a procedure that analyzes data using software
techniques (algorithms) to create a model, as shown in Fig. 1, which is useful for finding patterns and
regularities in datasets [14]. It is a process of making machines learn from past experiences (existing
data) in order to make decisions on future occurring events or data instances. Feature vectors are very
essential elements of Machine Learning and they are usually built for the specific task the Machine
Algorithm intent to accomplish. The basic idea behind Machine Learning is to get the probability
distribution of data.
Machine Learning is divided into three main categories and they are Supervised Machine
Learning [16, 17, 18] and unsupervised machine learning [18] and Reinforcement Machine Learning
[19]. Furthermore, there are three basic Learning Methods associated with each Learning Category;
Classifications, Clustering, and Regression. Classification is the process used in Supervised Learning
in which the data sets are well labelled into groups or classes; Clustering is the process used in
unsupervised learning for un labelled data sets; and Regression is best associated with Re-enforcement
learning in which the expected end result is being ranked, graded or estimated. A label is the name of
the definite class or group the data instances belongs to. In machine learning, data are represented by
a fixed number of features which can either be categorical, nominal, or continuous [20]. This paper
gives a thorough review of different existing literatures in the field of Android malware detections
using machine learning techniques.

DEPT OF CSE, N.B.K.R.I.S.T Page 3


Securing Android Devices Through Machine Learning Based on Malware Detection

Authors in [27, 28] showed in their works that malware attack methods can be characterized
as follows:
• Information Extraction: The malware in this category compromises a device and then steals
personal information such as IMEI number, user’s personal information and many more.
• Automatic Calls and SMS: This group of malware increases a user’s phone bill by placing
automatic calls and sending SMS to some premium numbers.
• Root Exploits: These set of malware seek to gain system root privileges in order to take control
of the system and modify the system’s configuration and other system information.
• Search Engine Optimizations: The malware here artificially searches for a term and
simulates clicks on targeted websites in order to increase the revenue of a search engine or increase
the traffic on a website.
• Dynamically Downloaded Code: This technique enables an installed benign application to
download a malicious code and deploys it in the mobile devices without the user being aware.
Covert and Overt Communication Channels: This is a vulnerability that is found in a device that
facilitates the information leak between the processes that are not supposed to share the information.
This technique is seen as a highly sophisticated.

• Botnets: This is a network of compromised mobile devices with a Bot-Master which is


controlled by a Command and Control servers (C&C). It carries out Spam delivery, DDoS (Distributed
Denial of Service) attacks on the host devices.
Malware Authors use many techniques to evade detection. [29] pointed out these concealment
techniques to include code obfuscation techniques, encryptions, unnecessary permissions which are
not needed by the application, requesting for unwanted hardware, and download or update attacks in
which a benign application updates itself or another application with malicious payloads.

DEPT OF CSE, N.B.K.R.I.S.T Page 4


Securing Android Devices Through Machine Learning Based on Malware Detection

3. COMPUTATIONAL ENVIRONMENT

3.1 SOFTWARE SPECIFICATION

Operating System : Windows XP.

Platform : PYTHON TECHNOLOGY

Tool : Python 3.6

Front End : Python anaconda script

Back End : Spyder

3.2 HARDWARE SPECIFICATION

• System : Pentium IV 2.4 GHz.

• Hard Disk : 40 GB.

• Monitor : 15 inch VGA Color.

• Mouse : Logitech Mouse.

• Ram : 512 MB

• Keyboard : Standard Keyboard

DEPT OF CSE, N.B.K.R.I.S.T Page 5


Securing Android Devices Through Machine Learning Based on Malware Detection

3.3 SOFTWARE FEATURES

3.3.1 SOFTWARE ANACONDA

Anaconda is more than just a package manager; it's a comprehensive platform for data science
and machine learning workflows. Some key features and components of Anaconda include:

1. *Conda Package Manager*: Anaconda comes with Conda, a powerful package manager that
simplifies package installation, updates, and dependency management. Conda can install packages
from the Anaconda repository as well as from other channels like PyPI.

2. *Anaconda Navigator*: A graphical user interface (GUI) that allows users to easily manage
environments, install packages, and launch applications. It provides a convenient way to navigate
through projects and environments.

3. *Jupyter Notebooks*: Anaconda includes Jupyter Notebook, an interactive web-based


computational environment for creating and sharing documents containing live code, equations,
visualizations, and narrative text. Jupyter Notebooks are widely used for data exploration,
visualization, and prototyping.

4. *Spyder IDE*: Anaconda comes with Spyder, an Integrated Development Environment (IDE)
designed specifically for scientific computing and data analysis in Python. Spyder provides features
such as code editing, debugging, variable exploration, and integrated IPython consoles.

5. *Hundreds of Pre-installed Packages*: Anaconda includes a wide range of popular data


science libraries and tools, such as NumPy, pandas, Matplotlib, scikit-learn, TensorFlow, and PyTorch.
These packages are pre-installed and ready to use, saving users the time and effort of manually
installing them.

DEPT OF CSE, N.B.K.R.I.S.T Page 6


Securing Android Devices Through Machine Learning Based on Malware Detection

6. *Environment Management*: Anaconda allows users to create isolated Python


environments with specific package versions and dependencies. This helps avoid conflicts between
different projects and ensures reproducibility of results.

7. *Support for Multiple Platforms*: Anaconda is available for Windows, macOS, and Linux,
making it accessible to users across different operating systems.
Overall, Anaconda provides a convenient and powerful platform for data scientists, researchers, and
developers to work with Python and its ecosystem of libraries for data analysis, machine learning, and
scientific computing.

3.3.2 SPYDER
Spyder is an Integrated Development Environment (IDE) primarily used for scientific
computing and data analysis in Python. While it's commonly known for its frontend interface, which
provides features such as code editing, variable exploration, and debugging, Spyder also has a robust
backend that facilitates these functionalities. Here's some more information about Spyder's backend:

1. *Code Editor*: Spyder's backend includes a sophisticated code editor with features like
syntax highlighting, code completion, code folding, and automatic indentation. The backend manages
these functionalities to provide a smooth coding experience for users.

2. *Debugger*: Spyder's backend integrates a powerful debugger that allows users to step
through code, set breakpoints, inspect variables, and analyze program execution. The backend handles
communication with the Python interpreter to provide debugging capabilities within the IDE.

3. *Variable Explorer*: Spyder includes a Variable Explorer that allows users to interactively
explore and manipulate variables in their Python environment. The backend manages the
synchronization of variables between the Python interpreter and the Variable Explorer interface.

DEPT OF CSE, N.B.K.R.I.S.T Page 7


Securing Android Devices Through Machine Learning Based on Malware Detection

4. *Integrated IPython Console*: Spyder's backend integrates an IPython console within the
IDE, allowing users to execute Python code interactively and access the full power of the IPython
interpreter. The backend handles communication between the console and the Python interpreter
running in the background.

5. *Code Analysis Tools*: Spyder's backend includes tools for static code analysis, such as
linting, code style checking, and code formatting. These tools help users write clean, consistent, and
errorfree code by providing real-time feedback and suggestions.

6. *Integration with External Tools*: Spyder's backend can integrate with external tools and
libraries for specialized tasks, such as version control systems (e.g., Git), data visualization libraries
(e.g., Matplotlib), and scientific computing packages (e.g., NumPy, SciPy).

Overall, Spyder's backend plays a crucial role in providing a seamless development experience
for users working on scientific computing and data analysis projects in Python. It handles various
tasks behind the scenes to ensure that users can write, debug, and analyze code efficiently within the
IDE.

SRS

DATA MINING

Data mining is an interdisciplinary subfield of computer science. It is the computational


process of discovering patterns in large data sets ("big data") involving methods at the intersection of
artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data
mining process is to extract information from a data set and transform it into an understandable
structure for further use. Aside from the raw analysis step, it involves database and data management
aspects, data pre-processing, model and inference considerations-interestingness- metrics, complexity
considerations, post-processing of discovered structures, visualization, and online updating. Data
mining is the analysis step of the "knowledge discovery in databases" process, or KDD.

DEPT OF CSE, N.B.K.R.I.S.T Page 8


Securing Android Devices Through Machine Learning Based on Malware Detection

The actual data mining task is the automatic or semi-automatic analysis of large quantities of
data to extract previously unknown, interesting patterns such as groups of data records (cluster
analysis), unusual records (anomaly detection), and dependencies (association rule mining). This
usually involves using database techniques such as spatial indices. These patterns can then be seen as
a kind of summary of the input data, and may be used in further analysis or, for example, in machine
learning and predictive analytics.

For example, the data mining step might identify multiple groups in the data, which can then
be used to obtain more accurate prediction results by a decision support system. Neither the data
collection, data preparation, nor result interpretation and reporting is part of the data mining step, but
do belong to the overall KDD process as additional steps.

The related terms data dredging, data fishing, and data snooping refer to the use of data mining
methods to sample parts of a larger population data set that are (or may be) too small for reliable
statistical inferences to be made about the validity of any patterns discovered. These methods can,
however, be used in creating new hypotheses to test against the larger data populations.

Big Data concern large-volume, complex, growing data sets with multiple, autonomous
sources. With the fast development of networking, data storage, and the data collection capacity, Big
Data are now rapidly expanding in all science and engineering domains, including physical, biological
and biomedical sciences. This paper presents a HACE theorem that characterizes the features of the
Big Data revolution, and proposes a Big Data processing model, from the data mining perspective.
This data-driven model involves demand-driven aggregation of information sources, mining and
analysis, user interest modeling, and security and privacy considerations. We analyze the challenging
issues in the data-driven model and also in the Big Data revolution.

BIG DATA

Big data is a collection of data sets so large and complex that it becomes difficult to process
using on hand database management tools. The challenges include capture, curation, storage, search,
sharing, analysis, and visualization.
The trend to larger data sets is due to the additional information derivable from analysis of a
single large set of related data, as compared to separate smaller sets with the same total amount of
data, allowing correlations to be found to "spot business trends, determine quality of research, prevent
diseases, link legal citations, combat crime, and determine real-time roadway traffic conditions.

DEPT OF CSE, N.B.K.R.I.S.T Page 9


Securing Android Devices Through Machine Learning Based on Malware Detection

Put another way, big data is the realization of greater business intelligence by storing,
processing, and analyzing data that was previously ignored due to the limitations of traditional data
management technologies

The four dimensions of Big Data


• Volume: Large volumes of data

• Velocity: Quickly moving data

• Variety: structured, unstructured, images, etc.

• Veracity: Trust and integrity is a challenge and a must and is important for big data just as for
traditional relational DBs

• Big Data is about better analytics!

DEPT OF CSE, N.B.K.R.I.S.T Page 10


Securing Android Devices Through Machine Learning Based on Malware Detection

The Big Data platform Manifesto

Fig.no.3.1

Some concepts
• No SQL (Not Only SQL): Databases that “move beyond” relational data models (i.e., no tables,
limited or no use of SQL)

– Focus on retrieval of data and appending new data (not necessarily tables)

– Focus on key-value data stores that can be used to locate data objects

– Focus on supporting storage of large quantities of unstructured data

– SQL is not used for storage or retrieval of data

– No ACID (atomicity, consistency, isolation, durability)

DEPT OF CSE, N.B.K.R.I.S.T Page 11


Securing Android Devices Through Machine Learning Based on Malware Detection

Hadoop

• Hadoop is a distributed file system and data processing engine that is designed to handle
extremely high volumes of data in any structure.

• Hadoop has two components:

– The Hadoop distributed file system (HDFS), which supports data in structured
relational form, in unstructured form, and in any form in between

– The MapReduce programming paradigm for managing applications on multiple


distributed servers

• The focus is on supporting redundancy, distributed architectures, and parallel processing

Some Hadoop Related Names to Know

• Apache Avro: designed for communication between Hadoop nodes through data serialization

• Cassandra and Hbase: a non-relational database designed for use with Hadoop

• Hive: a query language similar to SQL (HiveQL) but compatible with Hadoop

• Mahout: an AI tool designed for machine learning; that is, to assist with filtering data for
analysis and exploration

• Pig Latin: A data-flow language and execution framework for parallel computation

• ZooKeeper: Keeps all the parts coordinated and working together

What to do with the data

The Knowledge Discovery in Databases (KDD) process is commonly defined with the stages:

(1) Selection

(2) Pre-processing

(3) Transformation

DEPT OF CSE, N.B.K.R.I.S.T Page 12


Securing Android Devices Through Machine Learning Based on Malware Detection

(4) Data Mining

(5) Interpretation/Evaluation.

Fig.no.3.2

It exists, however, in many variations on this theme, such as the Cross Industry Standard
Process for Data Mining (CRISP-DM) which defines six phases:
(1) Business Understanding

(2) Data Understanding

(3) Data Preparation

(4) Modeling

(5) Evaluation
(6) Deployment or a simplified process such as (1) pre-processing, (2) data mining, and (3)
results validation.

DEPT OF CSE, N.B.K.R.I.S.T Page 13


Securing Android Devices Through Machine Learning Based on Malware Detection

4. FEASIBILITY STUDY

The feasibility study is an essential phase in the system development life cycle, focusing on
analyzing the viability of the proposed project. During this phase, a general plan is outlined, and
preliminary cost estimates are provided. The objective is to ensure that the proposed system is
practical, achievable, and will not impose an unnecessary burden on the organization.

A thorough feasibility study requires a clear understanding of the major system requirements,
ensuring that the solution aligns with the organization's resources, objectives, and constraints. The
feasibility analysis primarily revolves around three key considerations:

Three key considerations involved in the feasibility analysis are

TECHNICAL FEASIBILITY

SOCIAL FEASIBILITY

ECONOMICAL FEASIBILITY

4.1 TECHNICAL FEASIBILITY

Technical feasibility focuses on assessing the technical resources and capabilities required to
develop and implement the proposed system. It ensures that the system's technical requirements are
within the organization's current capabilities and infrastructure.

A technically feasible system should not impose excessive demands on the existing resources.
If the system requires significant upgrades, it may not be practical. Hence, our proposed system has
been designed with modest technical requirements to minimize the need for extensive modifications
or additional infrastructure.

The technologies employed in the system are mainstream, well-supported, and scalable,
ensuring ease of integration and maintenance. This guarantees that the system can be effectively
deployed with minimal disruption to the organization's operations.

DEPT OF CSE, N.B.K.R.I.S.T Page 14


Securing Android Devices Through Machine Learning Based on Malware Detection

4.2 SOCIAL FEASIBILITY

Social feasibility evaluates the level of user acceptance and readiness to adapt to the new
system. The success of any system largely depends on how well it is received by its intended users.

This involves training users adequately so that they can operate the system efficiently without
hesitation or resistance. Efforts are made to ensure that users perceive the system as an enhancement
rather than a threat. Raising the confidence level of users is crucial — they should feel comfortable
providing feedback and suggesting improvements, fostering a sense of ownership and trust in the
system.

Effective communication, comprehensive training programs, and user involvement throughout


the development process are key strategies employed to achieve high social feasibility.

4.3 ECONOMICAL FEASIBILITY

Economic feasibility examines the cost-effectiveness of the proposed system. It ensures that
the benefits derived from the system outweigh the costs involved in its development, deployment, and
maintenance.

Given that organizational budgets are often constrained, it is vital that the system remains
within financial limits. In this project, economic feasibility has been carefully considered, and most of
the technologies and tools utilized are open-source or freely available, significantly reducing costs.

Only necessary customized components were procured, keeping expenditures well within the
allocated budget. The overall result is a cost-efficient, high-performing system that delivers strong
value to the organization without imposing a financial strain.

DEPT OF CSE, N.B.K.R.I.S.T Page 15


Securing Android Devices Through Machine Learning Based on Malware Detection

5. SYSTEM ANALYSIS

5.1 EXISTING SYSTEM

The Bouncer can scan the Android application for a limited period of time, allowing a malicious
app too effortlessly bypass because of during the scan phase it doing nothing malicious.

At the second step, when scanned by the Bouncer, no malicious code must be included in the
initial installer. In this case, the malicious app may have a higher chance of avoiding the detection of
Bouncer.

The same permissions are required for the both benign and malicious application, consequently
we cannot be distinguished by this permission based system. Generally, the permission based
methodologies are largely not developed for the detection of malware, but it is used for the risk
assessment.

MalDozer is a Convolutional Neural Network based Android Malware Detection System.


MalDozer have a simple design in which minimal preprocessing is used to obtain the assembly
processes. These are based on the concrete neural network in terms of extraction and detection /
attribution of the features.

The current methods for Android malware detection have significant limitations that leave devices
vulnerable to sophisticated attacks. One of the primary detection mechanisms is Google Bouncer,
which scans Android applications before they are made available on the Play Store. However, Bouncer
has several notable shortcomings:

• Limited Time Scanning:Bouncer analyzes applications only for a limited period. Malicious
apps can easily exploit this by remaining inactive during the scan, displaying no harmful
behavior and thereby evading detection.

DEPT OF CSE, N.B.K.R.I.S.T Page 16


Securing Android Devices Through Machine Learning Based on Malware Detection

• Installer-Based Evasion:During the initial scan phase, if the APK installer file does not
contain any obvious malicious code, the application can bypass Bouncer’s scrutiny. Malicious
functionality can be downloaded or activated later, once the app has been installed on a user's
device, making the initial scanning ineffective.
• Permission-Based Detection Limitations:Traditional systems often rely on analyzing
application permissions to assess risk. However, both benign and malicious apps frequently
request similar permissions, making it difficult to distinguish between them using permissions
alone. Furthermore, permission-based models are typically designed for risk assessment, not
active malware detection, reducing their effectiveness against more sophisticated threats.

Another notable existing system is MalDozer, a Convolutional Neural Network (CNN)


based Android malware detection framework. MalDozer represents a more modern, machine
learning-driven approach:

• Minimal Preprocessing:MalDozer simplifies the data preparation phase by minimizing


preprocessing steps. It focuses on extracting assembly-level instructions and processes from
APK files, providing raw and rich data to the deep learning model.
• Feature Extraction and Detection:Using the structure of convolutional neural networks,
MalDozer is capable of automatically learning feature representations from the extracted code
sequences. This helps in accurately detecting and even attributing malware activities to specific
categories or families.

Despite these improvements, there is still room for enhancing detection accuracy, reducing false
positives, and improving the adaptability of models to new, unseen malware variants. Therefore, there
is a growing need for more sophisticated and efficient malware detection systems based on deep
learning techniques, capable of operating proactively and reliably even against evolving threats.

DEPT OF CSE, N.B.K.R.I.S.T Page 17


Securing Android Devices Through Machine Learning Based on Malware Detection

5.1.1 DRAWBACKS OF EXISTING SYSTEM

The reported user will get the result, containing complete information from the integrity check and
both analyses. Since new types of applications are constantly emerging, two crawler modules have
been designed. For crawling the benign apps from the Google Play Store they used one crawler and
the other crawler is used to scroll malware from known sources of malware.

Droid Deep Learner is an Android Malware categorization and identification method. Droid Deep
Learner uses deep learning method to report the present requirement for malware detection. In this
method, they required a set of features for the detection.

The features like permissions, APIs, Actions, Intents, IP addresses and URLs are encrypted in the
apk file. Based on source recompilation tool, they construct a decoder to decode the apps into readable
format.

The user can identify the Android app is malware infected or not by using Droid Detector and it
is available online as an open source as shown in Fig. 2. At first user have to submit the .apk file in
the system, Droid Detector will check its reliability and defines whether an Android application is
truthful, complete and appropriate.

5.2 PROPOSED SYSTEM

The general architecture of the proposed Droid Deep Learner method is illustrated in Fig. 3.
The goal of this system is to leverage both permission-based and API function call-based features to
detect Android malware. To achieve this, the system first examines Android applications by extracting
their manifest files (.xml) and source code files (.java), as these contain the essential information for
feature extraction.

The system begins by parsing the Android app’s manifest file to extract relevant permissions,
such as internet access, location data access, or any other sensitive permissions that might indicate
potential malicious behavior. Permissions are key indicators of what an app is allowed to do, and they

DEPT OF CSE, N.B.K.R.I.S.T Page 18


Securing Android Devices Through Machine Learning Based on Malware Detection

play a crucial role in the detection process. For instance, an app requesting permissions that are
not typically necessary for its functionality may raise a red flag.

Additionally, the system analyzes the Java source files of the app to identify API function calls.
These calls provide insights into the app's interactions with the operating system and third-party
services, which can be highly indicative of malicious behavior. For example, API calls related to
sending SMS messages, accessing personal data, or making network connections could point to
potentially harmful activities.

Once both sets of features—permissions and API function calls—are extracted, they are
combined into a comprehensive feature set. This feature set serves as the input for the training and
testing phases of the Deep Belief Network (DBN)-based deep learning model. The DBN model, a type
of deep learning architecture, is particularly effective in learning from large amounts of unstructured
data, making it well-suited for malware detection in Android apps.

The deep learning model classifies the applications based on the extracted features, ultimately
distinguishing between benign and malicious apps. This classification process involves training the
model on a labeled dataset containing both benign and malicious apps. During testing, the model's
accuracy is evaluated by comparing the predicted classifications against the actual labels, ensuring that
the model generalizes well to unseen data.

To gather the necessary datasets, the system crawls malware samples from identified malware
sources and benign applications from the Google Play Store using specialized crawlers. By regularly
crawling both categories of apps, the system ensures that it remains up-to-date with the latest trends in
both malware and legitimate app development. This dynamic crawling method allows the system to
adapt to the rapidly evolving landscape of Android threats.

One of the standout features of the proposed system, Deep Flow, is its ability to maintain high
precision in detecting emerging and frequently evolving types of malware. By utilizing deep learning,
the system can continuously improve its ability to detect new malware variants that might otherwise
evade traditional detection methods. The ongoing training of the model with new samples allows it to
adapt and keep pace with the ever-changing Android ecosystem.

DEPT OF CSE, N.B.K.R.I.S.T Page 19


Securing Android Devices Through Machine Learning Based on Malware Detection

Furthermore, the system incorporates advanced techniques to handle challenges such as


obfuscation, which is commonly used by malware authors to hide malicious code. By analyzing API
calls and permissions in conjunction with the behavior of the app, Droid Deep Learner is capable of
detecting even obfuscated malware, providing a robust and reliable solution for Android malware
detection.

In summary, the proposed system combines the power of deep learning with detailed analysis
of app permissions and API function calls to provide an advanced solution for detecting malicious
Android apps. Its dynamic and evolving nature ensures it can adapt to new threats, offering both high
precision and adaptability in the detection process.

5.2.1 ADVANTAGES OF PROPOSED SYSTEM

There are multiple hyper-parameters like amount of layers and the model’s complication.
During the deployment time, they try to have the neural network model as humble as likely. To
routinely determine the design in the raw method calls, MalDozer depend on the convolution layers.
The vector sequence is used as input to the neural network, i.e. an L×K shaped matrix. In the training
phase, depend on the app vector classification and its tags, MalDozer trains neural network parameters
for:
(i) malicious or novel for the recognition task, and
(ii) Malicious relations for the attribution task. In deployment phase, the embedding model is
used to produce the vector sequence and mine the sequence of techniques. At last, they use
the vector sequence for detect the an
(iii) droid app is malware infected or not.

DEPT OF CSE, N.B.K.R.I.S.T Page 20


Securing Android Devices Through Machine Learning Based on Malware Detection

6. SYSTEM DESIGN

System design is the critical phase in the development process where the overall architecture,
components, modules, interfaces, and data flow of a system are defined in order to meet the specified
user requirements. It serves as the blueprint for constructing the system and is crucial for transforming
high-level requirements into a working solution. System design, in this context, refers to applying
systems theory to product development, ensuring that the system works efficiently and integrates
smoothly within the intended environment.

A comprehensive system design integrates several critical disciplines, including systems


analysis, systems architecture, and systems engineering. These areas collectively provide a structured
approach to designing and building systems, taking into account the technical, operational, and user-
specific needs. Within the larger framework of product development, system design bridges the gap
between the theoretical, conceptual aspects of the system and its practical realization. It can be likened
to the act of transforming marketing, design, and manufacturing information into a tangible product
that can be brought to life.

In the realm of Android malware detection, system design takes on a pivotal role in ensuring
the efficiency and accuracy of the detection model. It involves making key decisions on how the
various components of the malware detection system will interact, which tools and technologies will
be used, and how data will flow between different system modules. The system design is not just a set
of functional modules but also includes considerations of scalability, performance, security, and
maintainability.

The overall system design for Android malware detection revolves around multiple layers of
interaction and processing:

1. User Interface Layer: The system provides an intuitive web-based interface where users can
upload Android APK files for analysis. This layer is crucial for ensuring that end-users, such
as security analysts or app developers, can easily interact with the system. It also allows them
to receive the analysis results in an understandable format, such as a classification label
indicating whether the app is benign or malicious.

DEPT OF CSE, N.B.K.R.I.S.T Page 21


Securing Android Devices Through Machine Learning Based on Malware Detection

2. Feature Extraction Layer: This is the core processing layer where the system extracts key
features from the APK files. Features such as permissions (from the manifest file) and API
calls (from the source code) are gathered, processed, and organized. The efficiency of feature
extraction directly impacts the accuracy of the classification model, making it a crucial step in
the design process.
3. Data Preprocessing and Augmentation: Before the features are fed into the machine learning
model, they must undergo preprocessing to handle missing data, normalize values, or perform
feature scaling. Additionally, data augmentation techniques can be applied to enrich the dataset,
especially in the case of detecting rare or evolving types of malware.
4. Machine Learning and Deep Learning Model Layer: This layer is responsible for the actual
detection of malware. The system utilizes a Deep Belief Network (DBN)-based deep learning
model to classify apps as benign or malicious based on the extracted features. The model is
trained with a large dataset of labeled applications, including both benign and malware
samples. The system design must ensure that the model can be updated regularly with new data
to adapt to evolving malware types.
5. Malware Database and API: To enhance the system's capability to identify new threats, it
integrates with external databases or malware repositories. This allows the system to stay
updated with known malware samples and continuously improve its detection accuracy.
Additionally, the system may call upon external APIs to check for previously detected malware
signatures or perform reputation checks on certain behaviors exhibited by the app.
6. Security and Privacy Layer: Given that malware detection inherently deals with analyzing
potentially malicious apps, the system design must ensure secure handling of the uploaded
APK files. This involves ensuring that uploaded files are isolated from the system to avoid
accidental execution or exposure to other vulnerabilities. Additionally, user data and analysis
results should be kept confidential and secure from unauthorized access.
7. Output and Reporting Layer: Once the analysis is complete, the system generates detailed
reports outlining the classification results. These reports may include additional insights such
as the most suspicious permissions or API calls, which could help in further investigation or
debugging. The output can be provided in different formats, such as HTML or PDF, and may
include suggestions for remedial actions.

DEPT OF CSE, N.B.K.R.I.S.T Page 22


Securing Android Devices Through Machine Learning Based on Malware Detection

8. System Integration and Testing: System integration ensures that all modules work together
seamlessly. This involves testing the various components (feature extraction, machine learning
model, user interface) in a controlled environment before deployment. In addition, the design
must incorporate robust testing methods to evaluate the performance, security, and scalability
of the system. Regular testing is essential for identifying and resolving issues early, especially
when dealing with new and emerging threats.
9. Scalability and Adaptability: A critical aspect of the system design is ensuring that it can
scale to handle a large number of APK submissions while maintaining high accuracy and
performance. The system should be designed to accommodate future growth, both in terms of
user traffic and the number of supported malware signatures. Furthermore, adaptability is key
as new malware strains and techniques emerge rapidly. The design must include mechanisms
for regular model updates and new data integration.

The system design process is iterative and involves continuously refining each layer based on
feedback from initial tests and ongoing monitoring. As the Android ecosystem evolves and new
malware techniques emerge, the system must be capable of adapting to these changes. This is why
designing a robust, flexible, and secure system is vital for ensuring the long-term success of Android
malware detection.

In conclusion, system design is the backbone of creating a reliable and effective malware detection
solution. It provides the necessary structure and framework to integrate various technologies and
methodologies, ensuring that the final system meets user requirements while remaining adaptable to
future challenges.

6.1 UML DIAGRAMS


The Unified Modelling Language (UML) is a standard language for specifying, visualizing,
constructing, and documenting the artifacts of software systems, as well as for business modelling and
other non-software systems. The UML represents a collection of best engineering practices that have
proven successful in the modelling of large and complex systems. The UML is a very important part
of developing objects oriented software and the software development process. The UML uses mostly
graphical notations to express the design of software projects. Using the UML helps project teams
communicate, explore potential designs, and validate the architectural design of the software.

DEPT OF CSE, N.B.K.R.I.S.T Page 23


Securing Android Devices Through Machine Learning Based on Malware Detection

As the strategic value of software increases for many companies, the industry looks for techniques
to automate the production of software and to improve quality and reduce cost and time-to-market.
These techniques include component technology, visual programming, patterns and frameworks.
Businesses also seek techniques to manage the complexity of systems as they increase in scope and
scale. In particular, they recognize the need to solve recurring architectural problems, such as physical
distribution, concurrency, replication, security, load balancing and fault tolerance. Additionally, the
development for the World Wide Web, while making some thin simpler, has exacerbated these
architectural problems. The Unified Modelling Language (UML) was designed to respond to these
needs. Simply, Systems design refers to the process of defining the architecture, components, modules,
interfaces, and data for a system to satisfy specified requirements which can be done easily through
UML diagrams.

Contents of UML

In general, a UML diagram consists of the following features:

Ø Entities: These may be classes, objects, users or systems behaviors.

Ø Relationship Lines that model the relationships between entities in the system.
Ø Generalization -- a solid line with an arrow that points to a higher abstraction of the
present item.
Ø Association -- a solid line that represents that one entity uses another entity as part of
its behaviour.
Ø Dependency -- a dotted line with an arrowhead that shows one entity depends on the
behaviour of another entity.

In this project four basic UML diagrams have been explained

1) Class Diagram

2) Use Case Diagram

3) Sequence Diagram

4) Activity Diagram
5) 5) Deployment Diagram

DEPT OF CSE, N.B.K.R.I.S.T Page 24


Securing Android Devices Through Machine Learning Based on Malware Detection

6.1.1 Class Diagram

UML class diagrams model static class relationships that represent the
fundamental architecture of the system. Note that these diagrams describe the
relationships between classes, not those between specific objects instantiated from those
classes. Thus the diagram applies to all the objects in the system.
A class diagram consists of the following features:

Ø Classes: These titled boxes represent the classes in the system and contain
information about the name of the class, fields, methods and access specifies.
Abstract roles of the Class in the system can also be indicated.
Ø Interfaces: These titled boxes represent interfaces in the system and contain
information about the name of the interface and its methods. Relationship Lines
that model the relationships between classes and interfaces in the system.
Ø Dependency: A dotted line with an open arrowhead that shows one entity
depends on the behavior of another entity. Typical usages are to represent that
one class instantiates another or that it uses the other as an input parameter
Ø Aggregation: Represented by an association line with a hollow diamond at the
tail end. An aggregation models the notion that one object uses another object
without "owning" it and thus is not responsible for its creation or destruction.
Ø Inheritance: A solid line with a solid arrowhead that points from a sub-class to
a super class or from a sub-interface to its super-interface.
Ø Implementation: A dotted line with a solid arrowhead that points from a class
to the interface that it implement
Ø Composition: Represented by an association line with a solid diamond at the
tail end. A composition models the notion of one object "owning" another and
thus being responsible for the creation and destruction of another object

DEPT OF CSE, N.B.K.R.I.S.T Page 25


Securing Android Devices Through Machine Learning Based on Malware Detection

6.1.2 Use case diagram


A use case diagram in the Unified Modelling Language (UML) is a type of
behavioural diagram defined by and created from a Use-case analysis. Its purpose is to
present a graphical overview of the functionality provided by a system in terms. A use case
is a methodology used in system analysis to identify, clarify, and organize system
requirements. The use case is made up of a set of possible sequences of interactions between
systems and users in a particular environment and related to a particular goal. It consists of
a group of elements (for example, classes and interfaces) that can be used together in a way
that will have an effect larger than the sum of the separate elements combined. The use case
should contain all system activities that have significance to the users.

A use case can be thought of as a collection of possible scenarios related to a particular goal, indeed,
the use case and goal are sometimes considered to be synonymous.

The main purpose of a use case diagram is to show what system functions are performed
for which actor.

Import Dataset

Train Data set

Feature Extraction
Recognition
User

Feature Matching

Malware Recognition

Fig 6.1: Use case Diagram

DEPT OF CSE, N.B.K.R.I.S.T Page 26


Securing Android Devices Through Machine Learning Based on Malware Detection

Parts of Use cases

A use case describes a sequence of actions that provide something of measurable


value to an actor and is drawn as a horizontal ellipse.
Actors

An actor is a person, organization, or external system that plays a role in one or


more interactions with the system.

System boundary boxes (optional)

A rectangle is drawn around the use cases, called the system boundary box, to
indicate the scope of system. Anything within the box represents functionality that is in
scope and anything outside

Include
In one form of interaction, a given use case may include another. "Include is a
Directed Relationship between two use cases, implying that the behavior of the included
use case is inserted into the behavior of the including use case”.

The first use case often depends on the outcome of the included use case. This
is useful for extracting truly common behaviours from multiple use cases into a single
description. The notation is a dashed arrow from the including to the included use case,
with the label "«include»". This usage resembles a macro expansion where the included
use case behavior is placed inline in the base use case behavior. There are no parameters
or return values. To specify the location in a flow of events in which the base use case
includes the behavior of another, you simply write include followed by the name of use
case you want to include, as in the following flow for track order.

Extend

In another form of interaction, a given use case (the extension) may extend
another. This relationship indicates that the behavior of the extension use case may be
inserted in the extended use case under some conditions. The notation is a dashed arrow
from the extension to the extended use case, with the label "«extend»".

DEPT OF CSE, N.B.K.R.I.S.T Page 27


Securing Android Devices Through Machine Learning Based on Malware Detection

Generalization
In the third form of relationship among use cases, a generalization/
specialization relationship exists. A given use case may have common behaviours,
requirements, constraints, and assumptions with a more general use case. In this case,
describe them once, and deal with it in the same way, describing any differences in the
specialized cases. The notation is a solid line ending in a hollow triangle drawn from
the specialized to the more general use case (following the standard generalization
notation).

Associations
Associations between actors and use cases are indicated in use case diagrams by
solid lines. An association exists whenever an actor is involved with an interaction
described by a use case. Associations are modelled as lines connecting use cases and

actors to one another, with an optional arrowhead on one end of the line. The arrowhead
is often used to indicate the direction of the initial invocation of the relationship or to
indicate the primary actor within the use case. The arrowheads imply control flow and
should not be confused with data flow.

STEPS TO DRAW USE CASES

• Identifying Actor

• Identifying Use cases

• Review your use case for completeness

Sequence Diagram

A sequence diagram in Unified Modelling Language (UML) is a kind of


interaction diagram that shows how processes operate with one another and in what
order. It is a construct of a Message Sequence Chart. A Sequence diagram depicts the
sequence of actions that occur in a system. The invocation of methods in each object,
and the order in which the invocation occurs is captured in a Sequence diagram. This
makes the Sequence diagram a very useful tool to easily represent the dynamic
behaviour of a system.

DEPT OF CSE, N.B.K.R.I.S.T Page 28


Securing Android Devices Through Machine Learning Based on Malware Detection

Elements of sequence diagram

The sequence diagram is an element that is used primarily to showcase the


interaction that occurs between multiple objects. This interaction will be shown over
certain period of time. Because of this, the first symbol that is used is one that
symbolizes the object.

Lifeline
A lifeline will generally be generated, and it is a dashed line that sits vertically,
and the top will be in the form of a rectangle. This rectangle is used to indicate both the
instance and the class. If the lifeline must be used to denote an object, it will be
underlined.
Messages
To showcase an interaction, messages will be used. These messages will come
in the form of horizontal arrows, and the messages should be written on top of the
arrows. If the arrow has a full head, and it’s solid, it will be called a synchronous call.
If the solid arrow has a stick head, it will be an asynchronous call. Stick heads with dash
arrows are used to represent return messages.

Objects
Objects will also be given the ability to call methods upon themselves, and they
can add net activation boxes. Because of this, they can communicate with others to
show multiple levels of processing. Whenever an object is eradicated or erased from
memory, the "X" will be drawn at the lifeline's top, and the dash line will not be drawn
beneath it. This will often occur as a result of a message.

If a message is sent from the outside of the diagram, it can be used to define a message
that comes from a circle that is filled in. Within a UML based model, a Super step is a
collection of steps which result from outside stimuli.

DEPT OF CSE, N.B.K.R.I.S.T Page 29


Securing Android Devices Through Machine Learning Based on Malware Detection

Steps to Create a Sequence Diagram


• Set the context for the interaction, whether it is a system, subsystem, operation or
class.

• Set the stage for the interaction by identifying which objects play a role in interaction.

• Set the lifetime for each object.

• Start with the message that initiates the interaction. Visualize the nesting of messages
or the points in time during actual computation.

• Specify time and space constraints, adorn each message with timing mark and attach
suitable time or space constraints.

• Specify the flow of control more formally, attach pre and post conditions to each
message.

Feature Malware
Import Data
Camera Train Data set
Face Detection Feature
ace Alingment
Extraction Feature Matching Feature Store in
Extraction Recognition
Matching DataBase
Base
An unlabelled set of Malware data

Template of a legitimate user

Current biometric data similar enough

Collect malware Data is built in order

FNMR obtained its value

Minimize variation in malware expression

Fig 6.2: Sequence Diagram

DEPT OF CSE, N.B.K.R.I.S.T Page 30


Securing Android Devices Through Machine Learning Based on Malware Detection

Collaborative Diagram:

1: An unlabelled set of Malware data 2: Template of a legitimate user


Import Data Train Data set Feature Extraction

5: FNMR obtained its value

3: Current biometric data similar enough

4: Collect malware Data is built in order


Feature Matching
Malware
Recognition

6: Minimize variation in malware expression

Store in
DataBase

Fig 6.3:collabaration diagram

Activity Diagram

Activity diagram is another important diagram in UML to describe dynamic


aspects of the system. Activity diagram is basically a flow chart to represent the flow
form one activity to another activity. The activity can be described as an operation of
the system. So the control flow is drawn from one operation to another. This flow can
be sequential, branched or concurrent. Activity diagrams deals with all type of flow
control by using different elements like fork, join etc.

DEPT OF CSE, N.B.K.R.I.S.T Page 31


Securing Android Devices Through Machine Learning Based on Malware Detection

How to draw Activity Diagram?


Activity diagrams are mainly used as a flow chart consists of activities
performed by the system. But activity diagram are not exactly a flow chart as they have
some additional capabilities. These additional capabilities include branching, parallel
flow, swim lane etc. Before drawing an activity diagram we must have a clear
understanding about the elements used in activity diagram. The main element of an
activity diagram is the activity itself. An activity is a function performed by the system.
After identifying the activities we need to understand how they are associated with
constraints and conditions. So before drawing an activity diagram we should identify
the following elements.

• Activities
• Association
• Conditions
• Constraints

The following are the basic notational elements that can be used to make up a diagram:
Initial state
An initial state represents a default vertex that is the source for a single transition
to the default state of a composite state. There can be at most one initial vertex in a
region. The outgoing transition from the initial vertex may have a behavior, but not a
trigger or guard. It is represented by Filled circle, pointing to the initial state.
Final state
A special kind of state signifying that the enclosing region is completed. If the
enclosing region is directly contained in a state machine and all other regions in the
state machine also are completed, then it means that the entire state machine is
completed. It is represented by Hollow circle containing a smaller filled circle,
indicating the final state.

DEPT OF CSE, N.B.K.R.I.S.T Page 32


Securing Android Devices Through Machine Learning Based on Malware Detection

Rounded rectangle
It denotes a state. Top of the rectangle contains a name of the state. Can contain
a horizontal line in the middle, below which the activities that are done in that state are
indicated.
Arrow
It denotes transition. The name of the event (if any) causing this transition labels the
arrow body.

Steps To Construct Activity Diagram


• Identify the preconditions of the workflow

• Collect the abstractions that are involved in the operations

• Beginning at the operation’s initial state, specify the activities and actions.

• Use branching to specify conditional paths and iterations

• Use forking & joining to specify parallel flows of control.

Fig 6.4: Activity Diagram

DEPT OF CSE, N.B.K.R.I.S.T Page 33


Securing Android Devices Through Machine Learning Based on Malware Detection

DATA FLOW DIAGRAM

Fig.no.6.5 Data Flow diagram

DEPT OF CSE, N.B.K.R.I.S.T Page 34


Securing Android Devices Through Machine Learning Based on Malware Detection

7. SYSTEM IMPLEMENTATION

7.1 IMPLEMENTATION PROCESS

Implementation is the stage of the project when the theoretical design is turned
out into a working system. Thus, it can be considered to be the most critical stage in
achieving a successful new system and in giving the user, confidence that the new
system will work and be effective.

The implementation stage involves careful planning, investigation of the


existing system and its constraints on implementation, designing of methods to achieve
changeover and evaluation of changeover methods.

7.2 MODULES

1. Android Security.
2. Malware Detection Technique.

7.2.1 Module Description

Android Security
These malwares are seriously threat Android security. The attacker can monitor user’s
information like: Messages, Contacts, Bank mTANs, Locations, etc. Here we survey on different
Android Malware Detection Techniques like: MalDozer, Droid Detector, Droid Deep Learner and
Deep Flow.
Then, we have discuss on different deep learning based android malware detection
techniques such as, Maldozer, Droid Detector, Droid DeepLearner, Deep Flow, Droid Delver and
Droid Deep. We aim to implement a model based on deep learning that can automatically identify
whether an android application is malware infected or not without installation.

DEPT OF CSE, N.B.K.R.I.S.T Page 35


Securing Android Devices Through Machine Learning Based on Malware Detection

Malware Detection Technique

Malware is a malicious code which is developed to harm a computer or network. The number
of malwares is growing so fast and this amount of growth makes the computer security researchers
invent new methods to protect computers and networks. There are three main methods used to malware
detection:
Signature based, Behavioral based and Heuristic ones. Signature based malware detection is the most
common method used by commercial antiviruses but it can be used in the cases which are completely
known and documented. Behavioral malware detection was introduced to cover deficiencies of
signature based method.

DEPT OF CSE, N.B.K.R.I.S.T Page 36


Securing Android Devices Through Machine Learning Based on Malware Detection

8. TESTING

The purpose of testing is to uncover errors and ensure that the system functions as expected
under various conditions. Testing is a crucial phase in the software development lifecycle, as it helps
identify faults, weaknesses, or bugs in the system that could compromise its effectiveness, security, or
performance. In essence, testing serves as a means to evaluate whether the software meets its
requirements, adheres to user expectations, and behaves correctly under all anticipated conditions.

Testing is the process of systematically exercising software to ensure that it works as intended
and does not fail in an unacceptable manner. It is a vital step in the software development lifecycle,
enabling developers and stakeholders to identify and correct errors before the software is released to
users. The main goal of testing is to ensure that the system meets its functional and non-functional
requirements, while also providing confidence that it is reliable, secure, and performs as expected.

Testing is not a one-time activity but an ongoing process that takes place throughout the
development life cycle. It helps identify issues at various levels, from individual components to the
entire system, ensuring that the software delivers the expected outcomes without unexpected failures.
Through rigorous testing, developers can guarantee that the system operates as intended under normal
and extreme conditions, providing end users with a stable, secure, and high-performing solution.

Key Objectives of Testing:

• Verification and Validation: Ensuring that the software meets its specifications (verification)
and meets user needs and expectations (validation).
• Fault Detection: Identifying defects or vulnerabilities that could affect the functionality,
security, or usability of the system.
• Reliability: Ensuring that the system performs correctly and consistently, even under stressful
or unforeseen conditions.
• Performance Evaluation: Testing the system's ability to handle large workloads, maintain
speed, and operate efficiently over time.
• User Confidence: Providing evidence that the system is ready for deployment and that it is
robust enough to be trusted by users.

DEPT OF CSE, N.B.K.R.I.S.T Page 37


Securing Android Devices Through Machine Learning Based on Malware Detection

TYPES OF TESTS

Unit Testing

Unit testing involves the design of test cases that validate that the internal program logic is
functioning properly, and that program inputs produce valid outputs. All decision branches and
internal code flow should be validated. It is the testing of individual software units of the application
.it is done after the completion of an individual unit before integration. This is a structural testing, that
relies on knowledge of its construction and is invasive. Unit tests perform basic tests at component
level and test a specific business process, application, and/or system configuration. Unit tests ensure
that each unique path of a business process performs accurately to the documented specifications and
contains clearly defined inputs and expected results.

Integration Testing

Integration testing is designed to verify the interaction between different software


components and ensure that they work together as a unified system. While unit testing verifies the
correctness of individual components, integration testing focuses on testing how these components
interact when combined. The goal is to ensure that the software behaves as expected when different
modules, services, or layers are integrated, addressing potential issues that could arise when
components are combined.

Integration testing is event-driven, meaning it typically focuses on the behavior of the system in
response to user actions or system events. It involves checking the flow of data between modules and
the proper functioning of interconnected components. Since integration testing typically occurs after
unit testing, where each individual component is verified in isolation, this phase helps to validate that
the components will work together as expected in a real-world scenario.

Key Objectives of Integration Testing:

• To verify that multiple components, which were unit tested individually, now work together
when integrated.

DEPT OF CSE, N.B.K.R.I.S.T Page 38


Securing Android Devices Through Machine Learning Based on Malware Detection

• To identify issues that may arise due to the interactions between modules, such as incorrect
data exchange, miscommunication between services, or failures when components interact in
different environments.
• To ensure that the system works as a whole, meeting the functional and non-functional
requirements.

Integration Testing Focus:

1. Interaction Between Modules: Ensuring that modules communicate correctly, share data, and
provide expected outputs when integrated.
2. Consistency: Verifying that the combined components work consistently and follow a
predictable behavior pattern.
3. End-to-End Functionality: Testing workflows or use cases that span across multiple
components or modules, ensuring that complex functionality works as expected.
4. Error Handling: Ensuring that errors or failures in one module are correctly handled by other
modules, maintaining system stability.

Functional test
Functional tests provide systematic demonstrations that functions tested are available as specified by
the business and technical requirements, system documentation, and user manuals.
Functional testing is centered on the following items:

Valid Input : identified classes of valid input must be accepted.

Invalid Input : identified classes of invalid input must be rejected.

Functions : identified functions must be exercised.

Output : identified classes of application outputs must be exercised.

Systems/Procedures : interfacing systems or procedures must be invoked.

DEPT OF CSE, N.B.K.R.I.S.T Page 39


Securing Android Devices Through Machine Learning Based on Malware Detection

Organization and preparation of functional tests is focused on requirements, key functions, or


special test cases. In addition, systematic coverage pertaining to identify Business process flows; data
fields, predefined processes, and successive processes must be considered for testing. Before
functional testing is complete, additional tests are identified and the effective value of current tests is
determined.

System Test
System testing ensures that the entire integrated software system meets requirements. It tests a
configuration to ensure known and predictable results. An example of system testing is the
configuration oriented system integration test. System testing is based on process descriptions and
flows, emphasizing pre-driven process links and integration points.

System testing typically covers:

• Functional Testing: Verifying that the system performs all the functions it is expected to do,
based on the user requirements.
• Non-functional Testing: This includes testing for performance, security, usability, and
reliability.
• Configuration Testing: Ensuring that the system works correctly across different
environments and configurations (e.g., different operating systems, hardware setups).
• End-to-End Testing: Validating that the system works end-to-end, including the interaction
of various subsystems and external systems.

White Box Testing


White Box Testing is a testing in which in which the software tester has knowledge of the inner
workings, structure and language of the software, or at least its purpose. It is purpose. It is used to test
areas that cannot be reached from a black box level.

White box testing typically involves:

• Code Coverage: Ensuring that all the code paths are tested, including conditional statements, loops,
and branches.

DEPT OF CSE, N.B.K.R.I.S.T Page 40


Securing Android Devices Through Machine Learning Based on Malware Detection

• Path Testing: Verifying the different execution paths through the software.

• Unit Testing: Often used in conjunction with unit testing to validate the correctness of individual
functions and methods.

Black Box Testing


Black Box Testing is testing the software without any knowledge of the inner workings,
structure or language of the module being tested. Black box tests, as most other kinds of tests, must
be written from a definitive source document, such as specification or requirements document, such
as specification or requirements document. It is a testing in which the software under test is treated,
as a black box .you cannot “see” into it. The test provides inputs and responds to outputs without
considering how the software works.

Key characteristics of black box testing:

• Focus on Inputs and Outputs: Testers provide inputs based on requirements or user stories
and check if the system produces the correct outputs.
• No Knowledge of Internal Code: Testers are not concerned with the internal code structure
or logic; they focus purely on the functionality.
• Specification-based Testing: Black box tests are written based on the software's
specifications, requirements, or use cases, ensuring that the software meets the defined
functional requirements.

8.1 Unit Testing


Unit testing is usually conducted as part of a combined code and unit test phase of the software
lifecycle, although it is not uncommon for coding and unit testing to be conducted as two distinct
phases.
Test strategy and approach Field testing will be performed manually and functional
tests will be written in detail.

DEPT OF CSE, N.B.K.R.I.S.T Page 41


Securing Android Devices Through Machine Learning Based on Malware Detection

Test objectives
• All field entries must work properly.

• Pages must be activated from the identified link.

• The entry screen, messages and responses must not be delayed.

Features to be tested
• Verify that the entries are of the correct format

• No duplicate entries should be allowed

• All links should take the user to the correct pages

8.2 Integration Testing


Software integration testing is the incremental integration testing of two or more integrated software
components on a single platform to produce failures caused by interface defects.
The task of the integration test is to check that components or software applications, e.g. components
in a software system or – one step up – software applications at the company level – interact without
error.
Test Results: All the test cases mentioned above passed successfully. No defects encountered.

6.3 Acceptance Testing

User Acceptance Testing is a critical phase of any project and requires significant participation by the
end user. It also ensures that the system meets the functional requirements.
Test Results: All the test cases mentioned above passed successfully. No defects encountered.

DEPT OF CSE, N.B.K.R.I.S.T Page 42


Securing Android Devices Through Machine Learning Based on Malware Detection

9. SAMPLE SOURCE CODE


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import time

from keras.models import Sequential


from keras.layers import Dense
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, roc_curve, auc

start_time = time.time()

# Load dataset
dataset = pd.read_csv("/content/Malware dataset.csv")

# Drop the 'hash' column and encode labels


dataset = dataset.drop(columns=['hash'])
dataset['classification'] = dataset['classification'].map({'benign': 0, 'malware': 1})

# Split into features and target


X = dataset.drop(columns=['classification']).values
y = dataset['classification'].values

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Build the ANN model


model = Sequential()
model.add(Dense(units=64, kernel_initializer='uniform', activation='relu', input_dim=X.shape[1]))
model.add(Dense(units=32, kernel_initializer='uniform', activation='relu'))
model.add(Dense(units=1, kernel_initializer='uniform', activation='sigmoid'))

# Compile the model


model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model


model.fit(X_train, y_train, batch_size=100, epochs=50)

# Evaluate
y_pred = model.predict(X_test) > 0.5
cm = confusion_matrix(y_test, y_pred)
scores = model.evaluate(X_train, y_train)
print("\nAccuracy: %.2f%%" % (scores[1] * 100))

DEPT OF CSE, N.B.K.R.I.S.T Page 43


Securing Android Devices Through Machine Learning Based on Malware Detection
# ... (previous code) ...

# ROC Curve
y_proba = model.predict(X_test).ravel()
# Calculate fpr and tpr using roc_curve
fpr, tpr, thresholds = roc_curve(y_test, y_proba) # This line is added
roc_auc = auc(fpr, tpr)

plt.figure()
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(fpr, tpr, label='ANN (AUC = {:.3f})'.format(roc_auc))
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend(loc='best')
plt.show()

# ... (rest of the code) ...

# Save model
model_json = model.to_json()
with open("SmartAM.json", "w") as json_file:
json_file.write(model_json)
model.save_weights("SmartAM_weights.weights.h5") # Changed the filename to include '.weights'
print("Model saved successfully.")

end_time = time.time()
print(f"Execution time: {round(end_time - start_time, 2)} seconds")
# Save model
model_json = model.to_json()
with open("SmartAM.json", "w") as json_file:
json_file.write(model_json)
model.save_weights("SmartAM_weights.weights.h5") # Changed the filename to include '.weights'
print("Model saved successfully.")

DEPT OF CSE, N.B.K.R.I.S.T Page 44


Securing Android Devices Through Machine Learning Based on Malware Detection
APPLICATION CODE:
!pip install Flask
import os
from flask import Flask, request, flash, redirect, render_template, url_for
from werkzeug.utils import secure_filename
# Assuming 'classifier' is a custom module or object, ensure it's imported or defined
# import classifier # If it's a separate module
app = Flask(__name__)
app.config['UPLOAD_FOLDER'] = './static/upload/'
app.config['SECRET_KEY'] = 'd3Y5d5nJkU6CdwY'

if not os.path.exists(app.config['UPLOAD_FOLDER']):
os.makedirs(app.config['UPLOAD_FOLDER'])
print("Directory created")
else:
print("Directory exists")

@app.route("/", methods=["GET", "POST"])


def home():
algorithms = {'KNN': '92.26 %', 'Support Vector Classifier': '89 %'}
result = accuracy = name = sdk = size = ''

if request.method == "POST":
if 'file' not in request.files:
flash('No file part')
return redirect(request.url)
file = request.files['file']
if file.filename == '':
flash('No selected file')
return redirect(request.url)
if file and file.filename.endswith('.apk'):
filename = secure_filename(file.filename)
print(filename)
filepath = os.path.join(app.config['UPLOAD_FOLDER'], filename)
file.save(filepath)

if request.form['algorithm'] == 'KNN':
accuracy = algorithms['KNN']
result, name, sdk, size = classifier.classify(filepath, 0)
elif request.form['algorithm'] == 'Support Vector Classifier':
accuracy = algorithms['Support Vector Classifier']
result, name, sdk, size = classifier.classify(filepath, 1)

return render_template("index.html", result=result, algorithms=algorithms.keys(),


accuracy=accuracy, name=name, sdk=sdk, size=size)

if __name__ == "__main__":
app.run(debug=True)

DEPT OF CSE, N.B.K.R.I.S.T Page 45


Securing Android Devices Through Machine Learning Based on Malware Detection

10. SCREEN LAYOUTS

Accuracy shown through graph

Fig 10.1:Accuarcy prediction through curves

DEPT OF CSE, N.B.K.R.I.S.T Page 46


Securing Android Devices Through Machine Learning Based on Malware Detection

Malware detection through app

Fig 10.2: Detection through app

DEPT OF CSE, N.B.K.R.I.S.T Page 47


Securing Android Devices Through Machine Learning Based on Malware Detection

11. CONCLUSION AND FUTURE SCOPE

11.1 CONCLUSION

In this paper we have discussed about different types of Android Malware Detection Techniques
using various Deep Learning Methods. Because of open nature on Android, countless malwares are
hidden in a large number of benign apps in Android markets. These malwares are seriously threat
Android security. The attacker can monitor user’s information like: Messages, Contacts, Bank
mTANs, Locations, etc.
Here we survey on different Android Malware Detection Techniques like: MalDozer, Droid
Detector, Droid Deep Learner and Deep Flow. MalDozer is used the Convolution Neural Network for
Malware Detection. It works on static analysis method and API method calls as a feature to detect the
application is malware infected or not.
Droid Detector will use the Deep Belief Network for the detection. They used the static and
dynamic analysis with features like: permissions, APIs, Dynamic behavior for malware detection.
Droid Deep Learner method is also use the Deep Belief Network for malware detection.

11.2 FUTURE SCOPE

They also use a static analysis method with the features like permissions and APIs for
malware detection. Deep Flow also use the Deep Belief Network with the static analysis method. In
this method they use the API method calls for Android Malware Detection. But, these all methods are
working after installing the application on device or upload it to their model. To overcome this problem
we are trying to implement a Deep Learning model that can automatically identify the application is
malicious or not before the installation.

DEPT OF CSE, N.B.K.R.I.S.T Page 48


Securing Android Devices Through Machine Learning Based on Malware Detection

12. BIBLIOGRAPHY

Good Teachers are worth more than thousand books, we have them in Our Department References
Made From:
1. User Interfaces in C#: Windows Forms and Custom Controls by
Matthew MacDonald.
2. Applied Microsoft® .NET Framework Programming (Pro-
Developer) by Jeffrey Richter.
3. Practical .Net2 and C#2: Harness the Platform, the Language,
and the Framework by Patrick Smacchia.
4. Data Communications and Networking, by Behrouz A Forouzan.
5. Computer Networking: A Top-Down Approach, by James F.
Kurose.
6. Operating System Concepts,by Abraham Silberschatz.
7. M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz, A.
Konwinski,G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, and M.
Zaharia,“Above the clouds: A berkeley view of cloud
computing,” University ofCalifornia, Berkeley, Tech. Rep. USB-
EECS-2009-28, Feb 2009.

8. “The apache cassandra project,” http://cassandra.apache.org/.

9. L. Lamport, “The part-time parliament,” ACM Transactions on


Computer Systems, vol. 16, pp. 133–169, 1998.

10. N. Bonvin, T. G. Papaioannou, and K. Aberer, “Cost-efficient and


differentiated data availability guarantees in data clouds,” in
Proc. of the ICDE, Long Beach, CA, USA, 2010.

11. O. Regev and N. Nisan, “The popcorn market. online markets for
computational resources,” Decision Support Systems, vol. 28, no.
1-2, pp. 177 – 189, 2000.

12. A. Helsinger and T. Wright, “Cougaar: A robust configurable


multi agent platform,” in Proc. of the IEEE Aerospace
Conference, 2005.

13. J. Brunelle, P. Hurst, J. Huth, L. Kang, C. Ng, D. C. Parkes, M.


Seltzer, J. Shank, and S. Youssef, “Egg: an extensible and
economics-inspired open grid computing platform,” in Proc.

DEPT OF CSE, N.B.K.R.I.S.T Page 49


Securing Android Devices Through Machine Learning Based on Malware Detection

of the GECON, Singapore, May 2006.

14. J. Norris, K. Coleman, A. Fox, and G. Candea, “Oncall:


Defeating spikes with a free-market application cluster,” in Proc.
of the International Conference on Autonomic Computing, New
York, NY, USA, May 2004.

15. C. Pautasso, T. Heinis, and G. Alonso, “Autonomic resource


provisioning for software business processes,” Information and
Software Technology, vol. 49, pp. 65–80, 2007.

16. A. Dan, D. Davis, R. Kearney, A. Keller, R. King, D. Kuebler, H.


Ludwig, M. Polan, M. Spreitzer, and A. Youssef, “Web services
on demand: Wsla-driven automated management,” IBM Syst. J.,

17. vol. 43, no. 1, pp. 136–158, 2004.

18. M. Wang and T. Suda, “The bio-networking architecture: a


biologically inspired approach to the design of scalable, adaptive,
and survivable/available network applications,” in Proc. of the
IEEE Symposium on Applications and the Internet, 2001.

19. N. Laranjeiro and M. Vieira, “Towards fault tolerance in web


services compositions,” in Proc. of the workshop on engineering
fault tolerant systems, New York, NY, USA, 2007.

20. C. Engelmann, S. L. Scott, C. Leangsuksun, and X. He,


“Transparent symmetric active/active replication for servicelevel
high availability,” in Proc. of the CCGrid, 2007.

21. J. Salas, F. Perez-Sorrosal, n.-M. M. Pati and R. Jim´enez- Peris,


“Ws-replication: a framework for highly available web
services,” in Proc. of the WWW, New York, NY, USA, 2006,

Sites Referred:
http://www.sourcefordgde.com
http://www.networkcomputing.com/
http://www.ieee.org http://www.emule-
project.net/

DEPT OF CSE, N.B.K.R.I.S.T Page 50

You might also like