Analysis of Cyber Security Threats Using
Analysis of Cyber Security Threats Using
ABSTRACT- Nowadays malware detection is a problem binary file. This analysis can quickly capture the syntax but
that researchers have tried to solve for so many years by it’s easily disturbed by code obfuscation and encryption
using enormous type of methods. The behaviors of two technology. The second sort of detection is dynamic
given malware variants remain similar, although their detection. It analyses the malware behavior like network
signatures could also be distinct. The proposed project activities, system calls, and file operations by executing the
mainly concentrates on classifying the malware families by Malware. This system can detects newly created malware
considering the malware API sequence or API commands. however, it requires more execution time.
This type of classification is helpful for the analyst as it
helps them to get a better insight into the functioning of the II. RELATED WORK
malware.
Egele et al. [2] Automatic dynamic malware investigation
KEYWORDS- Malware Detection, Malware Family procedures and techniques have been created; programmed
Detection, KNN, SVM, API Calls Argument. dynamic examination delivers a report for each malware
program, enumerating its run-time activities. The
I. INTRODUCTION information created by these investigation devices clarifies
the conduct of the malware program, empowering the
A cyber or cyber security threat could also be a malicious convenient and applicable arrangement of countermeasures.
act that seeks to wreck data, steal data, or disrupt digital life Tsyganoket al. [3] the grouping blunder went from
generally. Cyber-attacks include threats like computer practically 9 percent to 22 percent. The arrangement
viruses, data breaches, and Denial of Service (DoS) attacks. blunder went from near 19 percent to 22 percent.Wang et al.
There are several sorts of computer security threats like [4] 2 to 3 API call successions have been created and used
Trojans, Virus, Adware, Malware, Root kit, hackers and far to portray eight dubious practices. The analysis included
more. utilizing a Thomas Baye’s algorithmic program to arrange
Malware detection refers to the method of detecting the whether the program was malevolent and achieved ninety 5
presence of malware on a number system or of percent once 879 examples of 553 vindictive malware were
distinguishing whether a selected program is malicious or instructed in 80th of the information.
benign. In order to guard a computer from infection or Liu et al. [5] to scale back the overhead an ideal opportunity
remove malware from a compromised computing system, to build productivity by a serious half-hour, MapReduce
it's essential to accurately detect malware. The proposed reviewed. For recognizable proof of Trojans, malware,
project is mainly concentrating on classifying the malware worms, and spyware, the trial result identifying with
families by considering the malware API sequence or API accuracy was forty-fifth (from five hundredth to 89%).
commands. This type of classification helpful for the Ding Yuxin et al. [6] we utilize a powerful impurity
analyst as it helps them to get a better insight into the examination strategy to stamp the framework call
functioning of the malware. This is very helpful for boundaries with spoil labels, at that point develop the
analysts, because just by knowing the class/family of the administrator call guidance reliance diagram by following
malware they can have an idea about how to devise the proliferation of the pollutant information, constructed
sanitation and detection techniques for that malware. Also malware practices as reliance charts to discover the reliance
by knowing the family to which a malware belong we have connections between framework calls. They proposed a
a general idea about its behavior. This helps in sharing of calculation to infer the conventional conduct chart, which is
data between malware analysts. utilized to depict the social highlights of a malware family,
There are two sorts of detection techniques that are in view of the reliance diagrams of malware tests.
normally employed by malware analysts, static and Yousra Aafer et al. [7] an extreme examination was created
dynamic detection. Static detection is predicated on specific to eliminate pertinent highlights from malware action got at
strings from the disassembled code without executing the the API level, and different classifiers were assessed
utilizing the made list of capabilities. Their discoveries whether program was malicious and achieved 93.98% when
show that by utilizing the KNN classifier, we are prepared 80% of the data were used to train in 914 samples with 453
to accomplish precision as high as 98 percent and a bogus malicious malwares.
positive rate as low as 3 percent.
AlirezaSouri et al. [8] the procedures overviewed don't III. METHODOLOGY
appear to be adequate, while the natural component and
progressed plan of malware are progressively advancing The study will be based on quantitative method whereby the
and along these lines ending up being more hard to identify. accuracy of the proposed system will be measured using
A logical and cautious review of interruption recognizable KNN and SVM. The proposed system is shown in below
proof techniques for exploitation of information handling Figure 1.
methodologies should be utilized. Likewise, in 2 key Single
classes, it arranges malware recognition procedures Malware
check with
alongside signature-based strategies and conduct based Best of
location. Impacts, we seem to conclude that with twenty Main menu SVM or
KNN
ninth, j48 has 17 November, call tree has Bastille Day, NB
has 9%, BF has five-hitter and furthermore the substitute
methodologies have only 3 percent utilization of Family
information preparing results, the SVM strategy has the Malware /Benign Classification
Classification
most extent for malware discovery approach.
Deepak Koundel et al. [9]set up a way to deal with portray
an application by exploitation information investigation as
payment product or amiable application. We like to utilize
different credits of an application for classifications of an
application: I the authorizations utilized by an application,
(ii) the consents empowered by battery use rating and (iii) Data Data Data
Data Data Data
set 3
the apparatus on the robot market not inheritable rating. To set 1 set 2 set 1 set 2 set 3
N N N
records, while during this paper we attempt to limit the
thoughts behind our system by working principally with
uneven course perceptions and also with course when Figure 1: Proposed System for Malware Detection and
effectively tried on medium-sized outcomes. Malware Family Classification
Chih-Ta Lin et al. [11] their strategy mixes the decision and
furthermore the extraction of alternatives, which
significantly diminishes the spatial property of training and Accurate and sufficient number of features and cases in the
characterization choices. Their procedure consolidates the dataset are very critical for accurate classification results.
decision and furthermore the extraction of choices, which Hence, detecting malware must be automatic, efficient,
essentially diminishes the spatial property of instructing and effective and accurate. Malware can be detected and
grouping choices. Helped malware practices got from a analyzed by either static or dynamic analysis using two
techniques:
sandbox environment, pay in 5 stages: (a) removing data
from conduct signs on the n-gram work territory; (b) a) Code analysis without executing the software (signature
developing a conduct log Experiments were done on a true based)
informational collection of four, 288 examples from nine b) Behavioral analysis (anomaly based)
families, that the adequacy and furthermore the quality of Researchers used a diversity of techniques for detecting
our system were obvious.The [1] surveyed a method for malware despite how they handled the results. Figure 2
detecting worms and other malware by using sequences of illustrates some of these techniques.
WinAPI calls and depending on fixed API call addresses. In the proposed system the malware datasets are collected
While [2] developed automated dynamic malware analysis from different well known websites which consists of
techniques and tools; automated dynamic analysis provides malware API sequences. Along with the technology
a report for each malware program, describing its run-time advancement, the malware authors have developed
behavior. The information yielded by these analysis tools malicious code that hard and difficult to be analyzed and
elucidates malware program behaviors, facilitating the detected by researchers. For example, malware writers
timely and appropriate implementation of countermeasures. created malicious code with implement new technique
[4] Developed and used two –to three API function call mutation characteristic on that malware which causes an
sequences to describe eight suspicious behaviors. The enormous growth in number of variation of malware.
experiment involved using a Bayes algorithm to classify
A. K-nearest Neighbor There are so many malware classes listed above but for
K-Nearest Neighbors (KNN) is one of the simplest, though, experiment purpose we are considering total 4
accurate machine learning algorithms. KNN may be a non- classes.Benign
parametric algorithm, meaning that it doesn't make any Dridex
assumptions about the info structure. In world problems,
data rarely obeys the overall theoretical assumptions, Dridex is malicious software (malware) that targets banking
making non-parametric algorithms an honest solution for and financial access by leveraging macros in Microsoft
such problems. KNN model representation is as simple as Office to infect systems. Once a computer has been
the dataset – there is no learning required, the entire training infected, Dridex attackers can steal banking credentials and
set is stored. KNN are often used for both classification and other personal information on the system to realize access
regression problems. In both problems, the prediction is to the financial records of a user.
predicated on the k training instances that are closest to the Darkcomet
input instance. In the KNN classification problem, the
DarkComet is a Remote Access Trojan (RAT) application
output would be a category, to which the input instance
that may run in the background and silently collect
belongs, predicted by the bulk vote of the k closest
information about the system, connected users, and network
neighbors.
activity. DarkComet may plan to steal stored credentials,
B. Support Vector Machines usernames and passwords, and other personal and tip. This
Support Vector Machines (SVM) is another machine information could also be transmitted to a destination
learning algorithm that's generally used for classification specified by the author.
problems. The main idea relies on finding such a hyper Cybergate
plane, which would separate the classes in the best way.
The term ’support vectors’ refers to the points lying closest CyberGate is one of many remote access tools (RATs) that
to the hyper plane, that might change the hyper plane allow users to control other connected computers remotely.
position if removed. The distance between the support Cyber criminals often use these programs for malicious
vector and the hyper plane is referred to as margin. purposes such as to steal personal, sensitive information and
Intuitively, we understand that the further from the hyper misuse it to generate revenue. People who have computers
plane our classes lie, the more accurate predictions we can infected with programs like CyberGate should uninstall
make. That is why, although multiple hyper planes are often them immediately.
found per problem, the goal of the SVM algorithm is to 3) Single Malware Check
seek out such a hyper plane that might end in the utmost This step will extract the feature of the given malware file
margins. and it will find out best accuracy among SVM and KNN
The proposed work consists of three main phases. They are algorithm. Here we are checking the sample file Benign or
Malware (of any family) in single test from given file,
1)Malware/Benign Classification features are extracted and used to make prediction using
In this phase based on the dataset attributes training and KNN or SVM as Dataset1 given Best Accuracy. And it is
testing proportions have taken (ex: 80 samples of each for used as Knowledge Base/Single Test.
training and 40 samples of each for testing out of 120
malware cases) and it will classify which is the malware IV. IMPLEMENTATION
and benign using SVM and KNN.
During this step the research plan is designed and can be
2)Family Classification implemented in practice. The whole implementation
process can be outlined in the following steps.
Dataset 1 80 40
Dataset 2 60 60
Dataset 3 20 100
Figure 4: Accuracy Analysis of Dataset2
1 benign 1
CONFLICTS OF INTEREST
The authors declare that they have no conflicts of interest
REFERENCES
[1] H. Sun, Y. Lin, and M. Wu, “Api monitoring system for
defeating worms and exploits in ms-windows system,” in
Proceedings of the 11th Australasian Conference on
Information Security and Privacy, 2006, pp. 159-170.
[2] M. Egele, T. S. Scholte, E. Kirda, and C. Kruegel, “A survey
on automated dynamicmalware-analysis techniques and tools,”
ACM Computing Surveys, Vol. 44, 2012, pp.6:1-6:42.
[3] K. Tsyganok, E. Tumoyan, M. Anikeev, and L. Babenko,
“Classification of polymorphic and metamorphic malware
samples based on their behavior,” in Networks,2012, pp. 111-
116.
[4] C. Wang, J. Pang, R. Zhao, W. Fu, and X. Liu, “Malware
detection based on suspiciousbehavior identification, ” in
Proceedings of the 1st International Workshop on Education
Technology and Computer Science, 2009, pp. 198-202.
[5] S. Liu, H. Huang, and Y. Chen, “A system call analysis
method with mapreduce for malware detection,” in
Proceedings of the 17th IEEE International Conference on
Parallel and Distributed Systems, 2011, pp. 631-637.
[6] Ding Yuxin, Xia Xiaoling, Chen Sheng, Li Ye, A malware
detection method based on family behavior graph, Computers
& Security (2017).
[7] YousraAafer, Wenliang Du, and Heng Yin, DroidAPIMiner:
Mining API-Level Features for Robust Malware Detection in
Android , Dept. of Electrical Engineering & Computer Science
Syracuse University, New York, USA fyaafer, wedu,
[email protected].
[8] Souri and Hosseini ,A state of the art survey of malware
detection approaches using data mining techniques Hum.
Cent.Comput. Inf. Sci. (2018)
8:3https://doi.org/10.1186/s13673-018-0125-x.
[9] Deepak Koundel, SurajIthape, Vishakha Khobaragade, Rajat
Jain B.E. Computer Science JSPM’s JSCOE Pune, India,
Malware Classification using Naïve Bayes Classifier for
Android OSThe International Journal Of Engineering And
Science (IJES) Volume 3 Issue 4 Pages 59-63 2014 ISSN (e):
2319 – 1813 ISSN (p): 2319 – 1805.
[10] Dragos¸ Gavrilut¸ Mihai Cimpoes¸u1, Dan Anton1, Liviu
Ciortuz, Faculty of Computer Science, University of Iasi,
Romania, BitDefender Research Lab, Iasi, Romania, Malware
detection using machine learning Conference Paper ·
November 2009, DOI: 10.1109/IMCSIT.2009.5352759 ·
Source: IEEE Xplore.
[11] Chih-ta lin, nai-jian wang, han xiao and Claudia eckert,
Department of Electrical Engineering, National Taiwan