Malware Detection and Classification Based On Extraction of API Sequences
Malware Detection and Classification Based On Extraction of API Sequences
ir
http://translate68.ir
Abstract— With the substantial growth of IT sector in the the concept of pattern (byte code-signature) recognition. Anti-
21st century, the need for system security has also become Virus Scanners (AVS) follow traditional signature based
inevitable. While the developments in the IT sector have detection method to identify malware. Signatures are the
innumerable advantages but attacks on websites and computer sequences of bytes existent in the database. Although
systems are also increasing relatively. One such attack is zero day signature based examination can discern malware by yielding
malware attack which poses a great challenge for the security low false positive rates but this technique turns out to be
testers. The malware pen testers can use bypass techniques like inefficient for malware whose signature are not listed in the
Compression, Code obfuscation and Encryption techniques to database. Also, once should be well versed with the minutiae
easily deceive present day Antivirus Scanners. This paper
of software for static analysis, which is not usually possible.
elucidates a novel malware identification approach based on
Malware detection using static analysis can be easily
extracting unique aspects of API sequences. The proposed
feature selection method based on N grams and odds ratio circumvented by malware writers by making use of the
selection, capture unique and distinct API sequences from the obfuscation techniques. Thus, signature based detection fails
extracted API calls thereby increasing classification accuracy. to uncover zero day malware attack.
Next a model is built by the classification algorithms using active Dynamic Analysis is defined as the method of monitoring
machine learning techniques to categorize malicious and benign the working of an application by analyzing its runtime
files. performance. Day by day Malware are becoming more
advanced and vigilant enough so as to halt their execution as
Keywords— Malware, API sequence, API call gram
soon as they detect that their execution is simulated for
security based analysis using emulators or virtual
I. INTRODUCTION environment. In this way they can easily elude from the
MALWARE is a contraction of two words Malicious malware detection setups. Dynamic analysis can be preferred
Software which cause substantial security threats in computer over static analysis because this technique does not require a
world. Viruses, Worms, Trojans, Spyware, Adware and other deep technical understanding of the software. In addition to
similar attack mechanisms, all fall under the broad term this, dynamic analysis is also proficient in detecting malware
MALWARE. “Malware, in general speaking, are the set of whose signatures are not known. In present scenario, dynamic
instructions inserted in the application software, with the analysis is most commonly used to detect malware but it is not
intention to threaten the security of the system, without the adequate because there are some negative aspects of dynamic
knowledge of its owner.” Nowadays, whenever software analysis like some malware exhibit malevolent characteristics
(application program) runs, it is not sure that it will generate of self modification and this type of behavioral aspects cannot
the desired output. It may produce some harmful results or can be unearthed using this runtime analysis technique. In addition
damage the system. In other words, the system has been to this, malware which depend on certain trigger conditions
attacked by some malicious activity. According to Panda for their execution and can alter their behavior when these
Security Annual Report 2013[1], an average of 82000 new conditions are achieved, are not exposed by dynamic analysis
malware has been reported per day. Also, Kaspersky Lab because all possible execution paths cannot be probed in a
detects 200000 malicious programs every day in 2012[2]. The single pass.
statistics is large when it comes to system security.
A. Motivation
Malware attackers have made windows portable
executable (PE) as their most vulnerable prey for carrying out Zero day malware attacks by malware assailant are
malware based attacks. Therefore, our research work is increasing rapidly. Moreover, malware attackers have made
focused on analyzing portable executables. Malware analysis windows portable executable as their eminent prey. Code
can be accomplished in two ways- static analysis and dynamic obfuscation techniques, simply frauds antivirus scanners,
analysis. Static analysis can be defined as the method for thereby, decreasing the obfuscated malware detection rate.
examining software without executing it. This technique uses Therefore, our research work has focused on analyzing
portable executables by extracting aspects of API calls.
978-1-4799-3080-7/14/$31.00 2014
c IEEE 2337
Downloaded from http://iranpaper.ir
http://translate68.ir
Benign
Apply proposed feature extraction
algorithm to capture distinct API Classify
sequences
Malware
Stage 1 Stage 2
Distinct API sequences were drawn out using proposed Program Category Number of samples
feature selection algorithm and were fed as input to all these Trojan 35
classification algorithms. These classifiers necessitate the Worm 30
use of supervised learning in order to train them, so we have Virus 30
used K-fold cross validation with all these classification Rootkit 25
algorithms for examining and producing independent
dataset. In K-fold cross validation, thorough training and
testing was performed for different values of K and it can be B. Evaluation Metrics
deduced from the findings that error estimation is best in
Evaluation Metrics can also be constructed by finding
case of K=10 that is, for 10 folds(implies that out of the
TP (True Positive), TN (True Negative), FP (False Positive)
total data, one tenth data is utilized for testing and nine-
and FN (False Negative) where True Positives is all the
tenth data is utilized for training). The performance of all
malware samples recognized correctly as malware, True
the applied classification algorithms after applying K-fold
Negatives is all the benign samples recognized correctly as
cross validation method is shown in Fig 2 and the essential
benign, False Positives is all the benign samples recognized
inference is that SVM provides best accuracy among all the
incorrectly as malware and False Negatives is all the
remaining classifiers.
malware samples recognized incorrectly as benign.
Accuracy (Acc) is the extent to which malware and
benign samples are recognized correctly and can be
calculated as
Acc = (TP + TN) / (TP + TN + FP + FN)
Accuracy of discussed algorithms is shown in Fig.3
C. Analysis Results
In this experimental research, designed program captures
the API calls from the input sample to get the API
sequences. Next, the proposed feature selection algorithm
extracts the distinct set of API sequences. Then, these API
sequences fed as input to classification algorithms to
categorize malicious samples from benign samples. Our
experimental analysis inferred that SVM output the best
results among all classification algorithms.
A comparison of the accuracy of proposed method to
other malware detection methods is listed in Table III.
Fig.2 Performance of four different algorithms using k
cross validation (k=2 to 10)
[17] APIMonitorCorp.Win32APIMonitor[OL].2011.http://www.APImonit
or.com/
[18] VX-Heavens. Virus Collections (VXheavens)
http://vl.netlux.org/vl.php/
[19] Ammar Ahmed E. Elhadi et al.,” Malware Detection Based on Hybrid
Signature Behaviour Applicat. Programming Interface Call Graph”,
American J. of Appl. Sci., 2012, pp. 283-288.
[20] Mojtaba Eskandari and Sattar Hashemi, “Metamorphic Malware
Detection using Control Flow Graph Mining”, Int. J. of Comput. Sci.
and Network Security, IJCSNS, vol. 11,12, Dec. 2011.
[21] Faraz Ahmed, et al.,” Using Spatio-Temporal Information in API
Calls with Machine Learning Algorithms for Malware Detection”, In
Proc. of the 2nd ACM workshop on Security and artificial intelligence,
ACM, March 2009, pp. 55-62.
[22] Fatemeh Karbalaie et al., “Semantic Malware Detection by
Deploying Graph Mining”, In Proc. of the Student Int. Conf. for IT
Security for the next generation, University of HongKong and
Kasperky Academy,2010, pp. 1020-1025.
[23] Yoshiro Fukushima et al.,” A Behavior Based Malware Detection
Scheme for Avoiding False Positive”, In Proc. of 6th IEEE ICNP
Workshop on Secure Network Protocols, (NPSec), Oct., 2010, pp.79-
84.