0% found this document useful (0 votes)
36 views7 pages

Comparative Analysis of Feature Extraction Methods of PXC

Uploaded by

ratnakar.bhosale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views7 pages

Comparative Analysis of Feature Extraction Methods of PXC

Uploaded by

ratnakar.bhosale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

International Journal of Computer Applications (0975 8887)

Volume 120 - No. 5, June 2015

Comparative Analysis of Feature Extraction Methods of


Malware Detection

Smita Ranveer Swapnaja Hiray


PG Student, Dept. of Computer Engineering, Associate Professor, Dept. of Computer Engineering,
Sinhgad College of Engineering, Pune Sinhgad College of Engineering, Pune
Savitribai Phule Pune University, India Savitribai Phule Pune University, India

ABSTRACT Currently existing methods for tackling malware are primarily


based on two complementary approaches. These are classified
Recent years have encountered massive growth in malwares which according to the type of features they use for discovering malware
poses a severe threat to modern computers and internet security. activity. The signature-based detection approach relies on the
Existing malware detection systems are confronting with unknown identification of unique string patterns in the binary code. This
malware variants. Recently developed malware detection systems technique uses static technique for creating signatures. It cannot
investigated that the diverse forms of malware exhibit similar cope with malwares unseen previously which is called as zero
patterns in their structure with minor variations. Hence, it is day attacks. When a novel type of malware family is observed,
required to discriminate the types of features extracted for detecting we need to analyse an instance of malware, generate a signature
malwares. So that potential of malware detection system can be for it and insert it into malware database for reference inducing a
leveraged to combat with unfamiliar malwares. We mainly focus classifier. During this period, an instance of this malware might
on the categorization of features based on malware analysis. This attack several systems or networks. Since signature based detection
paper highlights general framework of malware detection system approach has become inefficient and intractable. Knowing the
and pinpoints strengths and weaknesses of each method. Finally we weakness of detection systems malware designers developed code
presented overview of performance of present malware detection obfuscation techniques like code reordering, garbage insertion,
systems based on features. variable renaming to disguise their content.

Following this intuition, heuristic based approach has been


Keywords: introduced an automated classification system. It is based on
rules determined by experts, which relies on dynamic analysis
Feature Extraction, Malware Detection, Opcodes, Static Analysis,
of malicious behavior that deviates significantly from a normal
Dynamic Analysis, Machine Learning.
behavior. It precisely deals with unknown malware discovery.
However, this detection approach generates greater amounts of
false alarms than signature based detection because not each
1. INTRODUCTION suspicious executable file is malware. It has been observed
The proliferation of modern computers, internet users, and that each of the two approaches had some limitations. Further
communication infrastructure in any field is also followed by antivirus vendors attempted to use individual as well as hybrid
multiplicative increase in malwares and cyber-attacks caused by analysis approach for mining features and tackling newly emerging
them. Malware variants are evolved to gain unauthorized access of malwares [1]. They achieved a precise detection rate and low false
systems, to get the economic benefits by illegal ways. Propagation positives compared to existing malware detection methods. In [2,
of malware is havoc to internet security, commercial companies, 3, 4] investigated malware detection systems based on integrated
privacy of users and governments. static and dynamic analysis features using data mining approaches.
An appropriate determination of malware variant depends on the
Malware is derived from malicious software. It is an instance of feature type employed for discovering malicious activity. The
malicious code with intention to subvert the function of system performance of the system depends on the feature type which is the
and has potential to harm a computer or network. It covers a best indicator of malware and requires least time for quantifying
range of threats like virus, trojans, adwares, spywares, etc. They the correlation between malicious activities. Hence, we sought
replicate themselves and enter into the system in different ways; to give a summarized view the earlier malware detection system
either multiple media or through the most popular way of getting propounded by researchers. Table-1 summarizes aforementioned
downloaded into the system as the genuine application. Since malware detection systems with their pros and cons in light of
different malware detection system has been introduced till date to emerging threats. Here we synthesized the subset of up-to-date
circumvent the attacks caused by malwares. A malware detection malware detection system incorporating static, dynamic and hybrid
system identifies malwares and defends the system to perform its analysis approach. Rest of the paper is structured as follows: at first
function. section 2 gives the motivation and section 3 explores the general

1
International Journal of Computer Applications (0975 8887)
Volume 120 - No. 5, June 2015

framework of malware detection using machine learning approach using machine learning exhibits three distinct stages: Feature
followed by malware analysis in section 4. In this vein, we mainly extraction, feature selection sometimes followed by dimensionality
focus on the feature categorization based on malware analysis. reduction techniques, and then classification using machine
These features description is briefed in section 5. Further, Section learning algorithm. This flow of malware detection process is
6 analyzes the performance of existing malware detection systems as shown in Fig. 1. Each stage indicates different measure and
based on feature extraction techniques on standard dataset which is methods used in previously existing methods. Firstly the dataset is
briefed in Table-II, finally concluding remarks in section 7. prepared which consists of malware and benign executables. These
files are preprocessed depending on the FE method and next feature
2. MOTIVATION selection is done to quantify the correlation of feature for improving
performance and reducing number of computations to attain the
Network security has always been a major concern for everyone learning speed. Further after generalizing the feature capability,
involved in internet and for everyone using computer system. classifier is trained on the basis of the filtered results of feature
According to the Sophos Security Threat Report 2014 [5], malware selection. Researchers have adopted supervised machine learning
and related IT security threats have grown and matured, and the approach which uses classifiers Decision trees, Support vector
developers of malicious softwares have become far more creative machine, Nave bayes, Bayesian network, KNN algorithm, etc. are
in camouflaging their work. In 2013 there was a rise of a vicious mentioned in [6,7,1]. The best classifier is chosen which gives
new version of trojans, spywares. McAfee Security Labs catalogues the clear margin, and reduces interference and misclassification
nearly 100,000 malware versions every day, i.e. approximately between maliciousness and benignancy of executables. The dataset
one new threat per new second of time. Since this is urged to is tested corresponding to the trained classifier and results are
know how to circumvent the malware propagation. Most of the generated as malicious or benign softwares. The obtained outcomes
previous surveys briefed malware types and detection techniques. are evaluated with consequent performance metrics.
In [6], Saeed et al. gave an overview of malwares and their
detection systems; while in [1] Shabtai et al. presented a state of art
survey on machine learning techniques employing static features. 4. MALWARE ANALYSIS
Hence here we give an abstract view of the recently formulated Malware analysis is a technique to study malware behavior and
malware detection systems. The prime motivation of our survey its structure by extracting features which describes its malevolent
is to summarize the types of widely used features for malware intention. Several techniques have been introduced to detect unseen
detection. variants of malware. The domain of features is characterized
by the way of analyzing executables. Traditionally features are
categorized on the basis of static and dynamic analysis of
program files. For attaining efficiency and robustness, the system
adheres to the best feature type which explores a meaningful
corpus of malwares. In static analysis, the expected behavior of
program is determined over the observations in its binary code
or internal structure of files instead of actually executing it [6].
The static feature uniquely identifies the signature of malware or
malware families. Static analysis is vulnerable to code obfuscation
techniques. Dynamic analysis is test the program real time by actual
execution in controlled environment. In dynamic analysis behavior
of malicious softwares is monitored in emulated environment and
traces are obtained from the reports generated by sandbox. It can
deal with code evasion techniques [8]. However, it is resource
consuming and time intensive. Further malware detection system
utilized hybrid approach which is an integration of static and
dynamic analysis. Variety of features is invented by compounding
the static and dynamic approach. This taxonomy of features based
on malware analysis is depicted in Fig.2.

5. FEATURE EXTRACTION METHODS


The first crucial stage of malware detection mechanism is to
Fig. 1: General framework of malware detection system determine the representation of malicious software files. Various
representation patterns of malware files were mentioned in the
literature. Transforming the large, vague collection of inputs into
the set of features is called feature extraction. When there is
3. GENERAL FRAMEWORK OF MALWARE abundant input data to an algorithm and tend to be more redundant
DETECTION SYSTEM and irrelevant, feature extraction is performed. It is required
to gain the precise measurement of features which influence
Extensive survey has been done into the detection methods the classification of input as benign or malicious. Since feature
propounded by research community. Malware research can be extraction process transforms the features into an organized, more
categorized in terms of static as well as dynamic analysis and in manageable subset of information. Further it also reduces the
terms of how the features of malware are processed after extraction. dataset for processing resulting low computational overhead [1].
We observed that the general framework for malware detection The outcome of the feature extraction phase is a vector containing

2
International Journal of Computer Applications (0975 8887)
Volume 120 - No. 5, June 2015

the frequencies of features extracted. Features extracted are chosen malicious behavior. First all dataset executable files are
such that it attains maximum classification accuracy. The time disassembled and opcodes are extracted. An opcode is the
required to get features from input dataset is also depends on the assembly language instruction which describes the operation
feature extraction methods. to be performed. It is short form of operational code.
Feature extraction method affects the performance of the system An instruction contains an opcode and operands, optionally
in terms efficiency, robustness, and accuracy. At first Schultz et upon which the operation should act. Some operations have
al. in [9], introduced the notion of applying machine learning operands upon opcodes may operate, depending on CPU
techniques for the detection of malwares based on their respective architecture, registers, values stored on memory and stacks,
representation of files from the dataset. They employed three etc. The action of an opcode takes in arithmetic, logical
FE methods, while further researchers extended this idea of operations, and data manipulation operation. Opcodes are
feature extraction to ameliorate the performance and accuracy of capable to statistically derive the variability between malicious
the system. Following the aforementioned research background and legitimate software.
features are described as follows:
Moskovitch et al. [16] presented mean accuracy of the
combinations n-gram opcode sequences. They stated that
2-gram opcode sequence was the best N-gram sequence
comparatively, which showed classification accuracy.
However, for more than bigram opcode sequence the accuracy
is decreased. Santos et al. [13, 15] used opcode sequences for
categorizing malicious and benign files with different feature
selection and classification algorithms. In [13, 28], opcode
sequence of 1-gram and 2-gram sequences for detecting new
variants of malware families. They used histograms for each
n-gram sequences calculating frequency of similarity ratio for
each malware instance. Sekar et al. [29] used n-gram approach
and examined performance of system by applying Finite State
Automaton (FSA) approach. They estimated two approaches
on httpd, ftpd, and nsfd protocols which resulted into a lower
false positive rate when compared to the n-gram approach.

(3) Portable Executables


Fig. 2: Taxonomy of feature extraction methods These features are extracted from certain parts of EXE
files. Portable Executables (PE) features are extracted by
static analysis using structural information of PE. These
meaningful features indicate that the file was manipulated
(1) Byte n-gram Features or infected to perform malicious activity. In [19], Shafiq et
Byte n-gram features are sequences of n bytes extracted al. propounded a real time approach for malware detection
from malwares used as signature for recognizing malware. based on structural features mined from PE. They tested
Although this type of feature does not provide meaningful performance on two datasets Malfease and VXheavens dataset
information, it yields high accuracy in detecting new malware. [30] which remarked that PE features has low processing
Abou-Assaleh et al. [10] extracted byte n-gram features from overheads. These features may include part of the pieces
the binary code of the file where the L most occurring n-grams of information given as follows [1]: 1. File pointer: pointer
of each class in the training set are selected to denote the denotes the position within the file as it is stored on disk,
profile of the class. Every new instance is associated with CPU type; 2. Import Section: functions from which DLLs
a class closest profile using K-Nearest Neighbors (KNN) were used and Object files, list of DLLs of the executable can
algorithm. Their experiments achieved 98% accuracy on be imported. 3. Exports Section: describes which functions
dataset of benign and malware files. In [11], byte n-grams was exported. 4. Data extracted from the PE Header that
in combination with opcode n-grams are used as features. describes physical and logical structure of a PE binary which
They provide an extensive evaluation using a test collection may include features like code size, debug size as well as
comprised of more than 30,000 files. Different settings of creation time, file size, etc. 5. Resource Directory: indexed
opcode and byte sequence n-gram representations and five by a multiple-level binary-sorted tree structure, resources like
types of classifiers yielded an accuracy of up to 99%. In dialogs and cursors used by a given file.
[12], Li et al. propounded a method for detecting file types
by analyzing n-gram sequence of their binary content. This (4) String Features
method represented compact fileprint for each file type and These features are based on plain text which is encoded
used mahalanobis distance to determine the closest file type in executables like windows, getversion, getstartupinfo,
model based on centroids obtained. getmodulefilename, messagebox, library, etc. These strings
are consecutive printable characters encoded in PE as well
(2) Opcode n-gram Features as non-PE executables. String features are used in Schultz
Previous studies represented that opcodes feature extraction et al. [9] provided 97.11% accuracy, when compared to
was more efficient and successful for classification. They using PE features and byte n-grams. Strings features are
reveal statistical diversities between malicious and legitimate not very robust as they can be modified easily any time. In
softwares. Some rare opcodes are better predictors of [19] proposed a malware detection system, SBMDS, which

3
International Journal of Computer Applications (0975 8887)
Volume 120 - No. 5, June 2015

Feature Type Classification Method Strengths Weaknesses


Gain ratio, Fisher Score, ANN, DT,
NB, BNB, BDT, SVM [11, 16] Accuracy, Imbalance problem, Packed Executables
Static Information gain, DT, KNN, SVM, RF, Unknown Malware
[13,14, 15,11, 16,17] NB, Bayesian Network[13,14]
Mutual Information, TF, cosine Detects Malware variants Pack executable
similarity [15] families Accuracy
Game Theory, Genetic Algorithm, Dimensionality problem,
Time
SVM [18] False positive
DT, KNN, SVM,RF, Hybrid approach, Scalable, Manifest Single
Hybrid
NB, Bayesian Network, Automation, Robust program behavior.
[2, 3, 4, 19]
Defend Polymorphism &
SVM ensemble with bagging [19] Large size data
Metamorphism
Behavioral analysis using Flexible, Automation, Zero
False Positive
Dynamic Phylogenic trees[25] day malware
[20,21, 22,23, 24,25, 26,27] Behavior analysis using Reduces runtime and
Incomplete picture
SVM,DT,IB1,RF,[20] + KNN, memory overheads,
malware activity
NB[22, 24] Automation
False positives, Fast generation Accuracy, Latest
Behavior Graph Matching [23, 24]
of behavior graphs malwares Testing time

Table 1. : SUMMARY OF MALWARE DETECTION SYSTEMS

classifies malware using SVM based on interpretable string classifiers SVM, KNN, and nave bayes. In [20], Tian et al.
features. It outperformed existing antivirus softwares achieved presented an automated classification system which uses API
better accuracy and efficiency using string features. call sequences as features and discriminates malwares and
cleanwares performance an accuracy of 97% achieved over a
dataset of malwares and cleanwares.
(5) Function Based Features
Function based features are extracted over the runtime Biley et al. [26] investigated an antivirus (AV) technique
behavior of the program file. Function based features functions which eliminates the drawbacks of earlier AV products and
that reside in a file for execution and utilize them to qualifies consistency, conciseness and completeness across
produce various attributes representing the file. Dynamically malware. System state changes describe the malware behavior
analyzed function calls including system calls, windows fingerprint in terms of files registry, process creation, network
application programming interface (API) calls, their parameter flows, etc. It uses clustering and classification of malware
passing, information flow tracking, instruction sets, etc. These samples. However the virtualized environment was static. An
functions increase the code reusability and maintenance. It is automatic behavior analyzing system proposed by Rieck et
semantically richer representation. Any malicious software for al. in [20] which gives an incremental and timely defense
execution or replications invokes some kernel level system call method for clustering and classification of malware binaries
to communicate with operating system; it is a sign of malicious in similar behavior and identifying novel classes of malwares
activity. In [22, 21, 25], addressed automatic behavior analysis using machine learning method. It avoids runtime overhead
using Windows API calls, instruction set, control flow graph, and gives accurate discrimination of novel malware.
function parameter analysis and system calls are used as
features. Park et al. [23] presented a malware detection system which
uses system call and their parameters values as the features
In [31] presented an automated malware detection system and generates directed subgraph for each programs behavior
which classifies malwares into their families monitoring their during execution. It creates a maximal behavior subgraph for
network behavior. It creates behavior graph from network measuring their similarity between their programs and known
traces obtained which represents network activities and their malware families. They evaluated performance over 6 known
network flow dependencies. The graph structure, in-degree, malware families and provided fair dissimilarity rates keeping
out-degree of nodes and root denotes the features of malware low false positives still the accuracy needed to be improved
activity. As per [31, 24], J48 decision trees given better as some malwares succeed to get kernel privileges. Lee et al.
TPR, FPR and accuracy results in comparison with other in [27] proposed a similar technique of clustering malware
classifiers. Firdausi et al. [24] propounded a malware detection families using supervised machine learning technique. It
system which monitors the behavior of malicious files in also analyzes sample datasets behavior according to system
controlled environment using a free online dynamic analysis call and parameters in virtual environment and generating
tool named Anubis. Then the generated results are parsed a behavior profile for network activities. Further they
into vector model for classification on the basis of the trained computed similarities between those profiles and grouping
classifier. The performance is tested on the small dataset of of different samples is done by applying k- medoids clustering.
benign and malicious files with and without feature selection.
The accuracy of 92.3% and 96.8% with and without feature (6) Hybrid Analysis Features
selection resp. achieved by J48 classifier was better than other These features are obtained by combining both techniques

4
International Journal of Computer Applications (0975 8887)
Volume 120 - No. 5, June 2015

Performance Metrics
Feature
(High Accuracy, TPR, & Low FPR is better)
Feature Type Feature Signature TPR Accuracy (%) FPR
Opcode n-gram + Byte Code n-gram [16] - 95 0.06
Opcode n-gram [11] - 99 0.03
Static Opcode n-gram [13] 0.95 92 0.03
Byte code n-gram + Opcode n-gram [14] 0.95 96 0.1
Portable Executable Header [17] - 99 0.05
Opcode n-gram + Application Programming Interface
0.97 96.22 0.07
Function calls [2]
Hybrid Function Length Frequency + Printable String Information
0.98 97.05 0.055
+ Application Programming Interface calls [3]
Application Programming Interface Function calls +
- 93.7 0.15
Portable Executable Header + String [19]
Function Length Frequency + Printable String Information [4] - 98.86 -
System Call [24] 0.95 96.8 0.04
Dynamic System state change [26] - 91.6 -

Table 2. : PERFORMANCE EVALUATION OF MALWARE DETECTION SYSTEMS

static analysis as well as dynamic analysis. It reduces the The aforementioned researches evaluated their system on the
effect of countermeasures of each static and dynamic technique standard dataset which consists of two sets of executables benign
for analyzing malwares and improves the performance and and malicious. The malicious executables dataset is downloaded
detection rates. Islam et al. [3] extracted static features of from the VXheavens website [30], which covers malwares such
functions such as function length frequency and printable as virus, adwares, worms, Trojan horses, etc. Here we provide
string Information (FLF and PSI) based on the functions comparative assessment of performance measures over results
of different lengths and the number of distinct printable generated by systems on the malware dataset. Table II gives the
strings present in unpacked malware executables. Further they overview of referenced malware detection system. We found some
extracted Application Programming Interface (API) function insights from our review which are as follows: First we observed
calls and parameters by dynamic analysis. They provided that systems using opcode and PE features adhere to low FPR
superior results in terms of accuracy on combining the function and high accuracy i.e. above 95% with some fluctuations [11, 14,
based features and string features. Similarly a combination 17, 18]. They were unable to cope with packed executables, while
of string and function features is used for classification of disassembly of executables is not always feasible.
malwares in [4]. They used different function length frequency
ranges and printable string information performed better over PE-miner approach in [17] was robust and reliable against packed
seen malware set. Santos et al. [2] introduced a hybrid executables in real time with low processing overheads. Behavioral
approach eliminating the need for each individual static and features API call and system call tracing is effective on zero day
dynamic malware analysis using both emulation (Qemu) and malwares while they increase the FPR which can undermine the
simulation (Wine) techniques for attaining the transparency efficacy of the system. Combining the features in a single method
without interference to the system. They extracted opcode step up the performance and provides accuracy up to 99% along
sequences statically and Windows API calls dynamically; with high TPR keeping the low FPR. Features based on dynamic
characterizing their behavior in groups of system information, analysis are less vulnerable to code evasion techniques. Though
persistence, file creation, process or thread creation, adding features based on dynamic analysis are best indicators of malware,
registry keys, errors, etc. This method employed classification they are time consuming and resource intensive. Since, precise and
algorithms such as KNN, SVM, Decision trees, bayesian effective results are achieved by hybrid approach which eliminates
networks, etc. to discriminate malwares and benign softwares. the loopholes of each method. In [2, 3, 4] malware detection
This provided more accurate results leading to notable increase system employing hybrid features showed high accuracy and TPR
in performance metrics. in comparison with those using static and dynamic features.

6. PERFORMANCE EVALUATION
7. CONCLUSION
Every malware detection system is obliged to provide a timely
defense against cyber-attacks caused by malwares with high This paper gives an overview of malware detection techniques
precision. The performance evaluation is done by using classical based on static, dynamic and hybrid analysis of executables. We
metrics such as classification accuracy, False Positive Rate (FPR) presented a comparative assessment of features and illuminated
and True Positive Rate (TPR) with least processing time. TPR their effect on performance of the system. We found that, high
is ratio of the number of correctly detected malware to the total accuracy and TPR can be achieved by selecting an appropriate
number of malware in the testing set. FPR ratio of the number of feature extraction method. Although opcode and PE features
benign files detected as malware to the total number of benign files enhanced the speed and accuracy of malware detection system, they
in the testing set. The efficiency and robustness of the system is give rise to false positives. Hybrid analysis features maintain low
defined by high accuracy, high TPR and low FPR, such system is false positive rate and yield precise results in least processing time.
effective in the real life scenarios. These methods used for malware classification should be able to

5
International Journal of Computer Applications (0975 8887)
Volume 120 - No. 5, June 2015

deal with huge and daily emerging malware variants which can Taiwan; 2008.
preserve the performance and accuracy of the system in real time.
[14] I.Santos, F. Brezo, X. Ugarte-Pedrero, P. G. Bringas, Opcode
sequences as representation of executables for data-mining-based
8. REFERENCES unknown malware detection, Information Sciences, vol. 231, pp.
64-82, 2013.
[1] A. Shabtai, R. Moskovitch, Y. Elovici, C.Glezer, Detection of
malicious code by applying machine learning classifiers on static [15] A.Shabtai, R. Moskovitch, C. Feher, S. Dolev, and Y. Elovici,
features: A state-of-the-art survey, Information security technical Detecting unknown malicious code by applying classification
report 14, 2009. techniques on opcode patterns, Security Informatics, vol. 1, pp.
122, 2012.
[2] Santos, I., Devesa, J., Brezo, F., Nieves, J. and Bringas,
P.G. (2013) OPEM: A Static-Dynamic Approach for Machine [16] I. Santos, F. Brezo, J. Nieves, Y. K. Penya, B. Sanz, C.
Learning Based Malware Detection, Proceedings of International Laorden, and P. G. Bringas, Opcode-sequence-based malware
Conference CISIS12-ICEUTE12, Special Sessions Advances in detection, in Proc. 2nd Int. Symp. Eng. Secure Software and Syst.
Intelligent Systems and Computing, 189, 271-280. (ESSoS), Pisa, Italy, . vol. LNCS 5965, pp. 3543, Feb.34, 2010.

[3] R. Islam, R Tian, Lynn, M. Batten , S. Versteeg, Classification [17] M. Z. Shafiq, S. M. Tabish, F. Mirza, and M. Farooq,
of malware based on integrated static and dynamic features, Pe-miner: Mining structural information to detect malicious
Journal of Network and Computer Applications 36,646656,2013. executables in realtime, in Proceedings of the 12th International
Symposium on Recent Advances in Intrusion Detection,
[4] Islam R, Tian R, Batten L, Versteeg S. Classification of malware ser. RAID 09. Berlin, Heidelberg: Springer- Verlag, 2009,
based on string and function feature selection, Cybercrime and pp.121141.i.org/10.4236/jis.2014.5-2006.
Trustworthy Computing Workshop (CTC) 2010:917.
[18] Mikhail Zolotukhin, Timo Hamalainen, Support Vector
[5] Sophos labs, Security Threat Report 2014. Machine Integrated with Game-Theoretic Approach and Genetic
Algorithm for the Detection and Classification of Malware,
[6] I.A. Saeed, A. Selamat, Ali M. A. Abuagoub, A Survey on Globecom 2013 IEEE Workshop - First International Workshop on
Malware and Malware Detection Systems, International Journal of Security and Privacy in Big Data
Computer Applications, Volume 67 No.16, April 2013.
[19] Y. Ye, L. Chen, D. Wang, T. Li, Q. Jiang, and M. Zhao,
[7] Mathur, K. and Hiranwai, S. A Survey on Techniques in Sbmds: an interpretable string based malware detection system
Detection and Analyzing Malware Executables. International using svm ensemble with bagging, Journal in Computer Virology,
Journal of Advanced Research in Computer Science and Software vol. 5, no. 4, pp. 283293, 2009.
Engineering, 2013, 3: 422428.
[20] Rieck, K., Trinius, P., Willems, C. and Holz, T. (2011)
[8] Ekta Gandotra, Divya Bansal, Sanjeev Sofat, Malware Analysis Automatic Analysis of Malware Behavior Using Machine
and Classification: A Survey, Department of Computer Science Learning. Journal of Computer Security, 19, 639-668.
and Engineering, PEC University of Technology, Chandigarh,
India Journal of Information Security, 2014, 5,56-64 Published [21] Tian R, Batten L, Islam R, Versteeg S. An automated
Online April 2014 in SciRes. classification system based on the strings of Trojan and virus
families, In: Proceedings of the 4th international conference on
[9] Schultz, M., Eskin, E., Zadok, F., Stolfo, Data mining methods malicious and unwanted software: MALWARE 2009; 2009. p.
for detection of new malicious executables. In: Proceedings of the 2330.
22nd IEEE Symposium on Security and Privacy. (2001) 3849.
[22] Tian, R., Islam, M.R., Batten, L. and Versteeg, S. (2010)
[10] Tony Abou-Assaleh, Nick Cercone, Vlado Keselj, and Differentiating Malware from Cleanwares Using Behavioral
Ray Sweidan. Detection of new malicious code using n-grams Analysis, Proceedings of 5th International Conference on
signatures In Proceedings of Second Annual Conference on Malicious and Unwanted Software (Malware), Nancy,October
Privacy, Security and Trust, pp. 193196, 2004. 2010, 23-30.

[11] R. Moskovitch, C. Feher, N. Tzachar, E. Berger, M. Gitelman, [23] Park, Y., Reeves, D., Mulukutla, V. and Sundaravel, Fast
S. Dolev and Y. Elovici. Unknown Malcode Detection Using Malware Classification by Automated Behavioral Graph Matching.
OPCODE Representation. Proc. Of the 1-st European Conference Proceedings of the 6th Annual Workshop on Cyber Security and
on Intelligence and Security Informatics (EuroISI08), 2008. Information Intelligence Research, Article No. 45,2010.

[12] W. Li, K. Wang, S. Stolfo, B. Herzog. Fileprints: Identifying [24] Firdausi, I., Lim, C. and Erwin, Analysis of Machine
file types by n-gram analysis. Proc. of the IEEE Workshop on Learning Techniques Used in Behavior Based Malware Detection,
Information Assurance and Security,2005. Proceedings of 2nd International Conference on Advances in
Computing, Control and Telecommunication Technologies (ACT),
[13] Moskovitch R, Stopel D, Feher C, Nissim N, Elovici Y. Jakarta, 2010, 201-203.
Unknown malcode detection via text categorization and the
imbalance problem In: IEEE Intelligence and Security Informatics,

6
International Journal of Computer Applications (0975 8887)
Volume 120 - No. 5, June 2015

[25] Wagener G, State R, Dulaunoy A. Malware behaviour


analysis, Journal in Computer Virology 2008;4(4):27987.

[26] Biley, M., Oberheid, J., Andersen, J., Morley Mao, Z.,
Jahanian, F. and Nazario, Automated Classification and Analysis
of Internet Malware, Proceedings of the 10th International
Conference on Recent Advances in Intrusion Detection, 4637,
178-197.

[27] Lee, T. and Mody, J.J. Behavioral Classification Proceedings


of the European Institute for Computer Antivirus Research
Conference (EICAR2006).

[28] D. Bilar, Opcodes as predictor for malware. International


Journal of Electronic Security and Digital Forensics, pp. 156-168,
2007.

[29] R. Sekar, M. Bendre, D. Bollineni, and Bollineni, R. Needham


and M. Abadi, Eds., A fast automaton-based method for detecting
anomalous program behaviors, in Proc. 2001 IEEE Symp. Security
and Privacy, IEEE Comput. Soc., Los Alamitos, CA, USA, 2001,
pp.144155.

[30] VXheavens Website:url:http://vx.netlux.org.

[31] Nari, S. and Ghorbani, Automated Malware Classification


Based on Network Behavior. Proceedings of International
Conference on Computing, Networking and Communications
(ICNC), San Diego, 28-31 January 2013, 642-647.

You might also like