Measurement: Sensors 28 (2028) 100785,
Contents lists 2
leat ScienceDirect,
Measurement: Sensors
journal homepage: ww. sciencedirect.comjoumalimessurement-sensors
ELSEVIER
Improved chimp optimization algorithm (ICOA) feature selection and deep
neural network framework for internet of things (IOT) based android
malware detection
‘Tirumala Vasu G"*", Samreen Fiza’, ATA. Kishore Kumar”, V. Sowmya Devi‘,
Ch Niranjan Kumar‘, Afreen Kubra®
* agro of cand Cammacatan Ege Pesny Uersty, Kama, na
> Dapemen of Beco end Carmancaton Engrg Se Vibe tngerng Cle Mahon lau Une, Tinga, Andra Prade nda
Dapurnen of Conpur Sac and ager Sree of Sec Tcang, Sard, Tanna, ie
greed ein opinizaton algo
acon)
Deep neural ewok ewer ONT)
ung Shore manor (STM) sd ede
iniernet of Things oT) extensively implemented ung Android application his detecting malicious Android
[Android platform is often left vulnerable to new and undiscovered malware becase the increasing quantity
and variety of Andsoid malware that has considerably undermined the efficiency ofthe traditional defence
systems. Several dat-dlven malware detection techniques were blag Out. Furthermore, these techniques face
coring reliance on past knowiedgeorinvidal workers. This esearch work proposed a syste for detecting
malware that begins by teaching rch charecterscs an improved chimp optimization algorithm (COA) based
feature selection and Deep Neural Network Framework (DNMP) for accurate detecting of Android malware. Deep
learning eethodswidespreed use inthe indsty of feature representation leering served a an inspiration for
this framework, Raw feature preprocessing feature representation learing, ad malware detection ae the tree
ey components of the proposed DNF. The second phase i to learn high-level discriminative features to fue
smalvareidncieation using the COA. Finally, a DNNF based on Long Short-Term Memory (LSTM) is created for
lfficlent nlware detection fn Android applications. Rased on the Sings ofthe simulations, t secre tha he
considered tobe tate of the
1. Introduetion malware [3]. More than 3.25 million new malicious Androkd applica
tions have been found in 2016 alone. In general, this means that every
‘With over 859% market share, the most widely used platform for
smart mobile devices is now andeoid. Over 65 billion downloads of
applications have been made via Google Play as ofthis writing, which
now offers access to approximately 3 million diferent apps [2]. Android
becomes more popular, cybercriminals become more interested in
developing harmful applications that may attack mobile devices and
steal ertical data. Users of Andeoid may download softwate from un.
‘ellable websites and shops, such as file-sharing websites and third-party
‘app stores, in contrast o other rival smare-mobile device systems. Given
how bad the malware infiltration Issue has become, according to 2
recent survey, Android devices are the target of 97% of all mobile
* Corresponding author
mal adares:
[email protected] (EV. @)
heps://do.org/20-1016/}.measen.2023.100785
10 5, a new dangerous Android app is released (4. These malicious
applications aze designed to launch a varity of attacks employing vi-
ruses, Trojan horses, worms, exploits, and other malware. It may be
quite éifficult to identity all ofthe versions of certain well-known ms-
licious programmes, some of which have more than 50 (see Table 1),
“Malware detection has typically been dependent on manually eval:
tating the behaviour and/or decompiled code of known malware pro
grammes in order to manually develop malware signatures, This wat
done in order to avoid eny potential biases introduced by automatic
analysis [5]. Given the state nature of signature-based malware detec
tion, which makes it possible for new malware to be created to avoid
Received 15 December 2022; Received in revised form 26 February 2023; Accepted @ May 2028
‘Available online 19 May 2023
2665-9174/6 2023 Published by Elsevier Lid, This is an open acces arte unde the CC BY-NCAD license (hi ereativecommons.org/lcenses/by-ne-n/4.0/)‘Table1
ccureat signatures, this technique is dificult to seale to handle huge
‘numbers of applications (]. Based on an examination ofthe behaviour
‘of dynamic application, the permissions have been sought, moreover,
the application's bytest-grams code's, many solutions have been pre=
sented [7]. The exclusionary characteristics that are supplied to the
machine learning system that makes the final classification decision,
however, are often designed using expert analysis in many of these
techniques
These three types of detection have seen substantial application of|
‘machine learning technologies. Techniques as a result of the data
‘gathering and ongoing advancements in computer power offering an
‘additional viewpoint for effective and automated Android malware
detection [8]. The four processes that make up the machine
learning-based methodologies for detecting malware on Android devices
‘are a follows: Initial dataset construction involves gathering both good
land bed Android apps. In order to deseribe Android apps, feature en
‘gineering is done in the second step. Third, malware detection machine
learning models are developed. Fourth, by forocasting test samples, the
trained models’ performance is assessed [9], Feature engineering is one
‘of the most important aspects of machine learning; nonetheless, iis a
laborious, time-consuming manual process that often call for a great
‘deal of domain-specific expetise. For instance, human rules ate needed
to extract relevant information from binary fles when using machine
leaming-based static detection approaches, which often involve the
usage of specialised recompilation tools [10]. In order to collect
bbchavioural characteristics like function call, permission requests, and
network connections using dynamic detection techniques, the target
programme must be run in isolation, Researchers are concentrating
more on methods that allow autonomous feature lesning in order to
listenew ArtayListeDataPair
14 lstadd(oew DataPair IME", imei)
15 hupPost setEntiy(new UrllincodedFormEntty
(ist ETTPUTE 8);
16)
Maximum overall accuracy (OA) and the least amount of charac:
teristics were employed inthis study as two goa functions for classifying
land cover, The two goal functions were combined using the weighted
sum approach. Henee, Equation [1] was used to generate the fitness
function
ies Function 1.086) + (1) lng w
‘where N = 101 features, mj isthe numberof features chosen in the ith
ehimp, and Objective Function (i the fitness function ofthe jth chimp.
(084) isthe jth chimps overall accuracy. The weight parameter is alsa
present and is thought to be 0.92, Trial-and-error was used to calibrate
the value of The remaining variables were held constant, and the
‘objective function was used with various parameter values to identify
the optimal value foreach of the parameters. Furthermore, the fitness
function's value was regarded as the primary criterion for determining
how to measure and calibrate the parameters,
3.3. Feature selection using improved chimp optimization algorithm
‘acoa)
|A mathematical model based on intelligent variety is called the
Chimp Optimization Algorithm. Four dstinet kinds of chimps—attack
ers, obstacles, chasers, and drivers—can drive, chase, block, and attack,
“The four hunting steps are fnished in two pars. The exploration stage
Meacemen Sees 28 (2023) 100785
comes fis, followed by the exploitation stage. Driving, obstructing, and
chasing the prey are all parts ofthe exploration phase. Its effective to
selectively describe an object's intrinsic essence using feature repce
sentation learning. In this work the feature selection process is done by
Improved chimp optimization algorithm (ICOA). ‘The process. is
explained briefly in the below section, Population places are divided
into Oand 1 in the feture selection process using a logical funetion. The
characteristic wll be chosen if it matches the number 1, and it won't be
chosen if t matches the number 0. As the choice of kernel function and
its parameters affects how well SVM performs, ICOA is also required t0
‘optimise the parameters in order to improve elasiication accuracy.
33.1. Chimp optimization algorithm
(ChOA is a heuristic optimization technique that draws inspiration
from the population of chimpanzee foreging behaviours, According to
the variety of intelligence and skills that individuals inthe chimp pop:
ulation exhibit during hunting, the groups of chimpanzees are divided
into "Attacker," "Bacrier” "Chaser," end "Driver. Each species of chimp
has the capacity for autonomous thought, which it uses to investigate
and foretell the location of prey using its unique search approach. In
addition to thei obs, they are socially driven to seek out sex and other
advantages as the hunts come toa close (24]. The disordered individual
hunting bchaviour takes place throughout this procedure. According t0
conventional ChOA, ifthere are N chimpanzees, and the ith chimp is at
positon Xi, then. "Attacker" isthe best response, followed by "Barrer,
"Chaser," "Driver” and "Barrier," whieh i the second best response. The
following are the chimpanzees’ behaviours as they approach and sur
round their prey, along with equations for updating their positions:
D=|C Xu) Xai o
Kal t+1)- Kal) “AD ®
Aafl2n—0,C=2n ®
m= chao — valve ©
where rl and r2 are random vectors having (0, 1]-anged values. When
the number of iterations is increased, the non-linear decay factor, off
sees a linear reduction in value from 2.5 to 0, The iteration count,
shown. An undetermined integer between [-,f] makes up the value of
the random vector A. The impact of incentives for sexual behaviour on
cach chimpanzee's position is represented by the chaotic factor m. The
location of the prey within the range [0, 2] on each chimpantee’s in
dividual position i affected by arandom variable called C The locations
ofthe Attacker, Barrier, Chaser, and Driver inthe popation are used to
caleulate the positions of the other chimpanzees in the population. The
equations that are used to update positions are as follows:
C. + Xue m2 ©
Pree 102 Xrae =m 2X ”
Desens [65 + Keer +X] ®
Prmie “(Ca + Kener ma | ®
Hate As * Daur co)
Avs Dror ay
Kener As * Deo an
Aes Done ay
KK a
‘where C1, C2, C3 and 04 are similar to C; Al, A2, A3 and A4 are similarreeaa,
oA; And mi, m2,m3 and m4 are similar to m.
33.2, Principle of light refraction
‘According to the theory of light refraction, whenever light travels
‘through one medium and then enters another medium at an angle, the
‘propagation direction shifts. This causes a change in the speed of light at
‘the boundary between the wa media, which in turn causes a deviation
fof the path [25]. Fig. 3 deplets how light cravelling fom point P in
‘medium 1 to point Qin media 2 through refraction point O and subse
{quent refractive action. Assume that light in mediums 1 and 2 has
‘efrative indices of nt and n2, respectively; the lights velocities in the
‘two media are v1 and v2, respectively, and thatthe angles a and b are
the angles of incident and refact. The reciprocals ofthe teraction path
are equal tothe sine of the angle of incidence divided by the angle of
refraction, whichis also known as the ratio ofthe velocity of light inthe
‘oo mediums. To put it another way, as shown by Formula (15), that
‘wold be:
as)
Equation (1) states that "Attacker," "Barsir,""Chaser," and "Driver"
led te chimp population in sucessfully completing the hunting process.
‘The more physically fit chimpanzees aid the population as a whole in
‘updating is locations via mathematics. Asa result, the population's prey
search and guidance are mostly carried out by the fittest chimpanzees.
Consequently, these dominant chimpanzees need to be in high physical
shape. The population may be attracted to the fitter chimpanzees more
rapidly with this method of population positioning, bu the population's
diversity suffers. Final results show that ChOA is confined to a local
‘optimum,
‘The ChOA chimp population first initases and dhen stars to search
the whole area. ChOA may then gather a large numberof chimpanzees
into the ideal search environment under the leadership of "Attacker."
“arvier," "Chaser, and "Driver, However, if the present population is at
«local optimum, then ChOA is likely to do 30 as well owing to the weak
‘exploratory abilities of other chimpanzees in the population. To put it
another way, the chimpanzees’ capacity to explore the whole world is
largely dependent on them. Also underlined isthe fact that ChOA’s ca
pacity for worldwide exploration sill has opportunity for growth. In
‘order to avoid being too early because of locally ideal answers, the
population leader mast innovate.
3.3.3. Distribution swategy (D5)
In normal ChOA, the Attacker, Barsier, Chaser, and Driver are pein
cépally in charge of finding during the search, guiding the population's
Fig. 8. Principle of light reaction,
Meacemen Sees 28 (2023) 100785
‘other chimpanzees to report on the location of prey, according to
Equation (5) (14) and standard ChOA mathematics. However, the
‘Attacker, Barer, Chaser, and Driver wil direct the whole population co
‘congregate close to these four sorts of chimpanzees as they ook forthe
best prey in the immediate region. As a result, population diversity will
‘decrease and the algorithm will find a local optimum. The algorithm
seems to be "Prematurity" right now. This article tackles this isue by
outlining a novel teaching strategy based on the idea flight refraction.
‘This strategy may assist the population in escaping the local optimum,
‘while preserving its variety. Tig. 4 depits the fundamental idea of
refraction education
In Fig. 4, the refracted and incident rays are denoted by | and 10,
respectively, and the xcaxis [a, b] denote the search space for the
‘answer. The normal is represented by the y-axis, While X and XO,
respectively, stand in fr the places where lghtis incident and where i
is reflected. An and B stand for the incidence and refraction angles,
respectively. The following equations may be created by using Vig. line
segments are related geometrically as follows:
(ob) /2-Ke) ft ae)
woe (E2)// en
It is knowathat d = sin a/sin b from the refractive index definition.
Equation (1) and Equation (15) above may be combined with to get
the formula shown below:
sina _((a+b)2—X=)/1
a ro
Equation (19) may be changed as follows when n-dimensional and
with h = ka, the optimization problem has this dimension:
(a+b) /2=(a +6) fan) —x fn as)
6
as)
where, Xf and X) are, respectively, Initial and postrefration values for
the highest and lowest values forthe jh dimension in the z-dimensional
space, correspondingly. Equation (18) has the form shown below when
kon
x
aby +X 0)
Lean be shown that DS isa specific kindof refraction earning since
Equation (20) represents a typical DS.reeaa, Meacemen Sees 28 (2023) 100785
Algorithm 2. Process of modified Chimp optimization algorithm
Input: N, the number of iterations allowed inthe population Maxlt
Output: ‘The best value for fitness is F, while the best location is Xbest (Xbest)
1-0,
2 Inllalze the chimp population Xi (i= 1, 2,3... N) by Equation (9-12);
3 Initialize the parameters A, C, and §
4 foreach Xi do
5 Caleulate fitness F(X)
6 end
7 XAttacker = the best solution;
3 second best solution;
the third best solution;
10 XDriver = the fourth best solution;
AL while U< Maxi de
12 foreach each chimp do
13 Equation should be used to update the chimp's loction (7);
Tend
15 Update the parameter A, C, and £3
16 Update DAttacker, DBarrier, DChaser, and DDrriver using Equation (5);
17 Update XAttacker, XBarver, XChaser, and XDriver using Equation (6);
18 Carry out the refraction leaning strategy forthe current optimal solution by
using Equation (18)
34. Deep neural neswork framework
Deep learning enables computer learning representations of data
with various degrees of abstraction using models with several processing
layers. For the purpose of detecting malware, this paper introduces an
LSTW-based deep learning model
34.1. Long short term memory (LSTM)
‘The LSTM replaces each standard hidden layer node with # memory
‘eT allowing for long-term storage and access to the data In time series
prediction applications including machine translation, air pollution
forecasting, weather forecasting, and voice recognition, 1 STM networks
are often commonly employed [26]. The LSTM's architecture is seen in
"ig. 5, where the hidden layer is made up of memory cells and blocks of
memory. Input gate, forget gate, and output gate are the three gate
‘components that make up the memory block. Multiplicative gate units
fare used to prevent adverse effects brought on by unrelated inpats
“The forget gate determines how mich data willbe saved or deleted in
: *@
ig. 5. Gating mechaniss inthe LSTNE Bock diagram.
the memory cell, The LSTM replaces each standard hidden layer node
cha memory cell, As shown below, the forge gae's activation func
tion is calculated,
fae(Wgh. s+ Meat) a
where the bias vector bis, hi the output ofthe previous block, x, this
Input sequence; Wy and Way represent, in the forget gate, the weight
matrices for the input and output vectors ofthe preceding and current
cells, respectively. «that is provided by the sigmoid function
weueen
Following are some examples of how the input layer uses the ct
Input vector to decide what data may be stored in the memory cell:
bp eWuha4 Wot 4B) es)
Which ousputs may be passed in the current time step is determined by
the output gate layer, which is specified as
(Waraa + Mok +.) ea
‘The current state of the cells aleulated by Sncegrating the forget
sate and input.
O61 +1806, es)
‘where Gis the condition thatthe cell was in at the time sep t andthe
vector element-wise multiplication indicator known as the Hadamard
product. fOG-1 and kOC, determine, accordingly, what content
“shouldbe carried over between the current input and the prior ell stat,
Hence, C. is established in accordance with the tanh activation function.
C
tanh ach, 4 Woes be) 9)
‘Multiplying the hidden state is produced by the ousput gate by the
current cel stateMeacemen Sees 28 (2023) 100785
2
th ge
32 85
is ‘*| Hf
a
és) *
weroes veroos
Results ofthe proposed an existing approiches’ precision comparison
0,=Oranh (6) on
AAs result, the DNN model is unable to categorise the new rest,
sample because to its lack of generalisation abilities.
4, Results and discussion
‘The suggested frameworks effectiveness on the database is verified
inthis section, More than 17,000 samples were collected via MUDFLOW
for the first database, which comprises 3733-dimensional dataflow
characteristics [27]. Thls database was made using 815 innocuous
samples from the Google Play Store [29], 2013 pieces of malware from
Virus Share [25], and 2013 samples of benign samples from 2019 to
2020, To assess the efficiency of the recommended method for detecting
malware The ICOA-DNNF provides the commonly used evaluation
criteria Accuracy (Ace, Specificity (Spe), Precision (Pre), Fl-score (F1),
and Sensitivity (Sen). The following list provides the definitions of
‘The proportion of properly identified positive observations to all of
the anticipated positive observations is what is referred to as precision.
Precision =7P | (70 + FP) es)
‘The proportion of property detected positive observations to all ob
servations is what is known as the sensitivity or recall ratio
Recall =TP / (TP-+F) 9)
F— measure
(Recalls Precision (Recall + Precision) (0)
Following are the postive and negatives used to calculate accuracy:
Accuracy (TP | FP) / (TP TL EP 4 BN) on
ig 6 illustrates how well che suggested ICOA-DNNF performed in
the precision comparison findings. Thus, the results demonstrate that
‘characteristics may be extracted using ICOA to predict malware with
high accuracy. As a result, the performance ofthe linear transformation
is not significantly impacted by the amount of useable features in the
proposed IODA. Since there is a0 need to laboriously adjust the reg
larisation value in the classifier, ite an attractive quality. For the pur
pose of resolving the categorization issue, the suggested ICOA provides
very suecessfl method.
“The performance results of the ICOA-DNNF based classifier that was
suggested ace shown In Hig. 7. According to the findings, the results
‘demonstrate thatthe suggested ICOA-DNNF method yields high recall,
results of 91.74%, in contrast to the current methodology, which yields
Tower cecal results, such as SHLMD [30] method metric having 89.68%
and 1-D CNN [31] method metric having 87.258 As shown in Fig. 8, It
‘ean be shown that the suggested ICOA-DNNF performs very well in
terms of the malware detection rate, far superior to the 1-D CNN and
‘SHLMD. The qualitative analysis conducted using deep learning and the
‘quantitative Findings in terms of F-measure are in accord, Regarding the
accuracy of the suggested ICOA-DNNE for a specific database, ic is
contrasted with other cutting-edge categorization methods.
7. Recall outcomes of the comparison between the suggested and ext.
rent approaches,
mLDCNN —mSHLMD
ime
LDCNN —SHLMD.—_ICOA-ONNF
METHODS:
mICOA-DNNF
F-measure(%)
weRsy,
Fig 8 Resls of the proposed and current approaches! comparisons sing
measure,
gg DCN MSHLMD —_ICOA.DNNE
z 90
Bes
320
5
L-DCNN SHIM ICOA.DNNF
METHODS:
Fig. 9, Accuracy comparison results ofthe propoted and existing methods
Pig. 9 demonstrates that compared to the current clasifer, the
suggested ICOADNNF provides more accuracy. Consequently,
‘comparing to other classifiers constructed on earlier developed models,
the classifiers’ efficiency willbe greater
5. Conclusion and future work
[A novel deep ncural network-based Android malware detection
technique is presented in this paper. In compared to previous cutting
edge methodologies, this ground-breaking deep learning approach to
malware analysis has shown strong performance and promise end has
been verified using datasets for Android malware in loT devices, A
malware detection framework was suggested in this study for effective
‘Android malware detection that begins with learning rich features using
an improved chimp optimization algorithm (ICOA) based feature se
lection. The three main elements of the proposed DNNF are malware
detection, feature representation learning, and raw feature preprocess
ing. The second phase isto learn high-level dseriminative features to
fuel malware identification using the COA. Thea a deep neural network
architecture based on LSTM is designed for effective malware detection
in Android apps. Given simply the raw opcode sequences of a large
number of labelled malware samples, proposed system is capable ofreeaa
learning to conduct feature selection and malware classification
‘concurrently. The proposed model's category benefits comprise being
built to operate on mobile devices’ GPUs and being greatly more
‘computationally efficient than n-gram-based malware classification
methods. The results demonstrate that the suggested ICOA-DNNF
‘method yields high recall results of 91.74%, in contrast tothe current
‘methodology, which yields lower recall results, such as SHLMD method
metic having 89.68% and 1-D CNN method metric having 87.25%,
‘Pature work should expand proposed approach to include both dynamic
and static malware analysis across many platforms.
‘CRediT authorship contribution statement
‘Tirumala Vasu G: Conceptualization, and design of study, Approval
of the version ofthe manuseriptto be published. Samreen Fiza: Data
‘uration, Approval of the version of the manuscript to be published.
ATA. Kishore Kumar: Formal analysis, Writing ~ original draft, the
manvseript, Approval ofthe version ofthe manuscript 1 be published.
'V. Sowmya Devi: Formal analysis, Approval of the version of the
‘manuscript to be published. Ch Niranjan Kumar: Writing ~ review &
editing, critically for important intellectual content, Approval of the
version of the manuseript to be published. Afreen Kubra: Writing —
review & editing, critically for important intellectual content, Approval
‘of the version of the manuseript tobe published,
Declaration of competing interest
‘The authors declare that they have no known competing Financial
Interests or personal relationships that could have appeared to influence
the work reported inthis pape.
Data availability
‘The data that has been used is confidential.
‘Acknowledgements
‘The author witha deep sense of gratitude would thank the supervisor
{for his guidance and constant support rendered during this research
References
(01K ti, $4, 6. mL zhang, DS, 1 A rele of anold mata
‘cto apeoetes Dsed acne learn EEE Acces 8 (202)
ia ag, Wa, Pan 8 Nepal, Xag, A survey of nde are
“Eseton wth Soop neural mod, ACM Compa Sr 596) (2090)
devetion using state analysts IEEE Access 8 (2020) 116363-11637
4 P- Chan,W Sang, State detcion of Andra alwarey sing pemisons
Se 20 .
‘2012 European tateligeace and Secusty taforaatics Coaference, 2012,
16] TES. la, NCJ. Chi, Analysis of ended malware detection performance wing
Converge ETC, 20 pp. 490-95,
a
fe
si
o
1
0)
an
2
oa
a)
os
18)
on
os)
oo
0)
eu
wa
9)
oy
5)
es
Meacemen Sees 28 (2023) 100785
SY Yer See, Meas Mut Ane anode detest
Conference oa Advanced Information Networking abd Application (AINA), 2013
Sota te Spon on cry and Prey, 2072 9p. 9610 ”
Cnerence New Teconagny Moly snd Sucrty, NTS), 2016 9p. 15
‘approach, i: 2015 TERE 2nd intenatianal Conference an Cyber Security and Choad
TTL Sun Yn, 2. UW ren Ye, Sinn emis otfeton
ier omhinelearingasd endo malware eteton, EE Tat 18 7)
(2018) saieszzs
‘bed on fctriation machine, IEEE Aecess 7 (2019) 184008-184019,
thao, Zod Gong, Ne Zhang. Wang, Quik apd acute ald
tlwar deetion bse on serve At 018 Eton one
Soar itt of Things Ses, EEE 2016, Aust, p. 148-18.
teed om. Compa 10 (6) 019) HSS SO
“king 8 Kang MoS Sexe, Rn, A sind deep ering mtr
1G) ao1e) 773798
£2 en, Wo, @ Ning. Hosa Che, aden malware dteston or
Sa TT dvs ug deep ers. A Hoe New 20) (2020), 102096
Inve, Hum. Comput, 10 (8) (2019) 3035-9043. -
Forensics Secu 16 (2020) 1563-1578
Framework fer Ane are deletion SEE Trane. a Data Eg 3 (12)
(Gear) sssessra
Maule, Howat reise cnet, a, ebjeceseastve wd eye
"ar nin ase for sno app, ACM Sgn Nat 4 (6) 201) 298-26,
1M sake ACR Man, Chin opieton ier, Hep Sys. App 148,
(a0) 1-8,
a Neha AA. Moun Mat md hi
Googe Sere py ancecomcr 2020,
(Goat) sssess70 °
Proceming & its Applications (CSPA, IEEE, 2018, March, p. 99-102