0% found this document useful (0 votes)
44 views8 pages

Evaluating Automatically Generated Yara Rules and Enhancing 15tho7h74a

Uploaded by

av3r19dmw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views8 pages

Evaluating Automatically Generated Yara Rules and Enhancing 15tho7h74a

Uploaded by

av3r19dmw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Evaluating Automatically Generated YARA Rules

and Enhancing Their Effectiveness


Nitin Naik1 , Paul Jenkins2 , Roger Cooke3 , Jonathan Gillett3 and Yaochu Jin4
1
School of Informatics and Digital Engineering, Aston University, United Kingdom
2
School of Computing, University of Portsmouth, United Kingdom
3
Defence School of Communications and Information Systems, Ministry of Defence, United Kingdom
4
Department of Computer Science, University of Surrey, United Kingdom
Email: [email protected], [email protected], [email protected],
[email protected], [email protected]

Abstract—Emerging as a widely accepted technique for mal- Indicator of Compromise (IoC) strings from those malware
ware analysis, YARA rules due to its flexible and customisable samples to find similar types of malware.
nature, allows malware analysts to develop rules according to the The success of YARA rules is dependent on the effective-
requirements of a specific security domain. YARA rules can be
automatically generated using tools, however, they may require ness of generated YARA rules, which is determined by the
post-processing for their optimisation, and may not be effective types of IoC strings and the number of IoC strings utilised in
for the specific security domain. This compels the requirement to its rules [3]. Therefore, the generation of the most effective
enhance automatically generated YARA rules and increase their YARA rules is the biggest challenge in applying YARA rules
effectiveness for malware analysis without increasing computa- for malware analysis [4]. YARA rules can be generated either
tional overheads. Reflecting on the above requirement, this paper
initially evaluates automatically generated YARA rules using manually or automatically. Generating YARA rules manually
three YARA tools: yarGen, yaraGenerator and yabin. These requires a highly-specialized skill-set in a specific security
tools are Python-based open-source tools used to generate YARA area, whereas generating YARA rules automatically using a
rules automatically utilising different underlying techniques. tool is a relatively easy task [5]. However, there are several
Subsequently, it proposes a method to enhance automatically issues with automatically generated YARA rules such as these
generated YARA rules using a fuzzy hashing method. This
proposed enhancement method can improve the effectiveness rules require post processing operations for their optimisa-
of YARA rules irrespective of the chosen YARA tool used to tion, despite this they may not become very effective for
generate YARA rules, which is demonstrated through several certain types of threats [4], [5]. This drives the requirement
experiments on samples of collected malware and goodware. to enhance YARA rules and make them more effective for
Index Terms—Malware Analysis; YARA Rules; Fuzzy Hash- malware analysis. There are a number of ways to achieve this
ing; yarGen, yaraGenerator; yabin; Ransomware; Indicator of
aim, however, any chosen mechanism should not increase the
Compromise; IoC String.
computational overheads as certain types of YARA rules may
slow down the operation when applied to a large sample of
I. I NTRODUCTION malware [2], [6], [7]. Reflecting on the above requirement
and the further enhancement of YARA rules, this paper at
The accelerating rate of malware incidents on daily basis first evaluates automatically generated YARA rules using three
indicates the magnitude of the problem in malware analysis. YARA tools yarGen, yaraGenerator and yabin. These tools
While malware analysts detect many malware attacks and are Python-based open-source tools used to generate YARA
incidents, keeping pace with the number and different types rules automatically utilising different underlying techniques.
of attacks poses a significant challenge to malware analysts. Subsequently, it proposes a method to enhance automatically
There is no silver bullet with respect to malware, as there is generated YARA rules using a fuzzy hashing method. This
no single malware analysis technique with the capability to proposed enhancement method can improve the effectiveness
treat all malware incidents, as a result analysts select the most of YARA rules irrespective of the chosen YARA tool used
suitable malware analysis technique for the specific security to generate YARA rules [8], which is demonstrated through
incident under consideration [1]. In recent years, YARA rules several experiments on the collected malware and goodware
technique has emerged as a widely accepted technique for samples.
malware analysis due to its flexible and customisable nature, The paper is divided into the following sections: Section
allowing malware analysts to develop YARA rules according II discusses YARA rules and fuzzy hashing as the underlying
to their specific requirements in targeting specific types of methods. Section III describes the three employed tools for
threats [2]. YARA rules are generated based on reverse en- automatically generating YARA rules: yarGen, yaraGenerator
gineering of malware samples to include the most common and yabin. Section IV explains the collection and verification
process of ransomware and goodware samples. Section V B. Fuzzy Hashing
performs an evaluation of automatically generated YARA rules Fuzzy hashing is used to determine the similarity between
using yarGen, yaraGenerator and yabin Tools. Section VI digital files, which makes it a very useful method for malware
presents the proposed enhancement process of automatically analysis as several pieces of malware and their variants possess
generated YARA rules using above YARA tools by employing some similarity with each other, which is not detected by a
the fuzzy hashing method SSDEEP. Section VII explores cryptographic hash as it has a binary outcome i.e., either the
advantages and limitations of YARA Rules. Lastly, Section two files are exactly identical or not [13], [14]. In a fuzzy
VIII concludes the paper and outlines some future work. hashing technique, the file of interest is split into several blocks
and each block is treated separately for calculating its hash,
II. YARA RULES AND F UZZY H ASHING finally, hashes of all the blocks are concatenated to obtain the
fuzzy hash of that file (see Fig. 3). A number of factors affect
A. YARA Rules the size of the fuzzy hash of a file, comprising of the block
YARA rules are developed to detect malware by match- size, the size of the file and the output size of the chosen
ing its signatures/strings with the existing malware signa- hash function [15]. Fuzzy hashing methods are divided into
tures/strings [3], [9]. These rules contain predetermined sig- different types namely: Context-Triggered Piecewise Hashing
natures/strings related to known malware used in attempting (CTPH), Statistically-Improbable Features (SIF), Block-Based
to match against the targeted files, folders, or processes Hashing (BBH) and Block-Based Rebuilding (BBR) [16],
[10]. YARA rules consist of three sections: meta, strings and [17], [18]. Forensic analysis of malware requires a thorough
condition as shown in Figs. 1 and 2. Here, strings can be knowledge of the degree of similarity between known malware
classified into three types: text strings, hexadecimal strings and inert files to assess files for their threat potential [19]. This
and regular expression strings. Text strings are generally a is especially important when considering the analysis and clus-
readable text complemented with some modifiers (e.g., nocase, tering of suspected malware in order to discover new variants
ASCII, wide, and fullword), to manage the process more [20], [21]. As a result, the use of the similarity preserving
effectively [11]. Hexadecimal strings are a sequence of raw property of fuzzy hashing is useful in malware analysis while
bytes complemented with three flexible formats: wild-cards, comparing unknown files with known malware families during
jumps, and alternatives [11]. Regular expression strings are malware analysis, where samples possess similar functionality,
similar to text strings as a readable text complemented with yet different cryptographic hash values [22].
some modifiers; which are available since version 2.0 and
increases the capability of YARA rules [11]. Text strings
and regular expressions which express a sequence of raw
bytes through the use of escape sequences. The final part of
YARA rules is a rule condition that specifies the number of
signatures/strings required matching with the target to declare
the sample as malware [12]. YARA conditions determine
whether to trigger the rule or not, however, these conditions
are Boolean expressions similar to those used in all other
programming languages [11].

Fig. 3. Fuzzy Hash generation process in Fuzzy Hashing [10]

III. E MPLOYED T OOLS FOR AUTOMATICALLY G ENERATED


YARA RULES : YAR G EN , YARAG ENERATOR AND YABIN
Generating YARA rules automatically is the most popular
method in employing YARA rules in malware analysis. In this
work, three different tools yarGen, yaraGenerator and yabin
are used to generate YARA rules automatically for evaluating
their effectiveness. Here, these three tools are explained with
their advantages and drawbacks.

A. yarGen Tool
yarGen is a Python-based tool utilised to generate YARA
Fig. 1. YARA Rules: Syntax Fig. 2. YARA Rules: Example rules, which is developed by Florian Roth [23]. It generates
YARA rules utilising some intelligent techniques such as
fuzzy regular expressions, Naive Bayes classifier and Gibber-
ish Detector [24]. The generated YARA rules include those
strings and opcodes from malware which do not match with
the provided goodware databases [23]. These YARA rules
contain a predefined number of strings (generally up to 20
strings), based on their highest scores to maintain a reasonable
operational speed. This tool generates two types of rules basic
rules and super rules depending on the malware sample types,
where basic rules can generally target a specific malware and
super rules can target a set of malware or malware family.

Fig. 5. yaraGenerator Generated YARA Rule

• It requires significant resources for opcodes-based rules


and loading goodware files.
• The rule generation process is slow.
• The creation of super rules could cause duplication of
rules and redundancy.
• It requires installation of all dependencies and built-in
databases for working successfully.

B. yaraGenerator Tool
It is a Python-based tool used for the generation of YARA
rules, which is developed by Chris Clark [25]. It generates
Fig. 4. yarGen Generated YARA Rule
YARA rules with a completely different signature for different
types of files such as EXEs, PDFs and Emails utilising string
1) Advantages- yarGen Tool:
prioritization logic and code refactoring [25]. The generated
• It allows generation of YARA rule based on both opcodes YARA rules consist of strings only, including those strings
and strings. from malware which do not match with the provided blacklist
• It supports the use of PE (portable executable) modules, of strings [25]. It uses a database of 30,000 blacklisted strings
which are used by the Windows operating system for divided based on different file formats. These YARA rules
executables such as DLL and COM files. contain a large number of strings (depending on the types of
• It can be integrated with other anti-malware software for samples) selected randomly as it does not compute a score or
its more effective use. weighting for strings.
• It reduces false positives by checking all strings against 1) Advantages- yaraGenerator Tool:
strings of goodware databases.
• Python script is simple and easy to use through command • It can generate specialised rules of a specific file format.
line interface. • It supports the use of PE (portable executable) modules,
which are used by Windows operating system for exe-
2) Drawbacks- yarGen Tool: cutables such as DLL and COM files.
• It requires post-processing of rules for increasing their • It reduces false positives by checking all strings against
effectiveness. strings of blacklist files.
• Python script is simple and easy to use through command 2) Drawbacks- yabin Tool:
line interface. • It requires post-processing of rules to make them more
2) Drawbacks- yaraGenerator Tool: effective.
• It requires post-processing of rules for increasing their • It may not work on some specific file formats.

effectiveness. • It only uses functions and does not use other types of

• It generates YARA rules based on random selection of strings.


strings which may not select the most appropriate strings • It can only work with unpacked executables.

in many cases. • It is not designed to work on .NET executables, Java files

• It does not support the inclusion of opcodes. and Microsoft documents.


• The project was developed as a work in progress and not • It is mainly developed for the testing purpose and not for

updated afterwards. the production.


IV. C OLLECTION OF M ALWARE AND G OODWARE
C. yabin Tool
S AMPLES
It is another Python-based tool used for generating YARA
In this implementation, one of the most prevalent malware,
rules, which is developed by Alien Vault Open Threat Ex-
ransomware was selected to perform all analysis and evalu-
change (OTX) community [26]. It generates YARA rules
ation of the effectiveness and performance of the proposed
by finding rare functions in a certain malware samples or
techniques. Ransomware was selected for the experiment as it
families [26]. It recognises functions by checking function
is one of the most relevant and damaging examples of malware
prologues which define the start of functions, for example,
that exploits victims for financial gain, business disruption and
55 8B EC mostly specifies the start of a function in programs
market share. Numerous types of ransomware were created
compiled by Microsoft Visual Studio. The generated YARA
and used in cyberattacks, though, some ransomware cate-
rules include those strings from malware which do not match
gories were worthy of greater focus due to their historical
with the provided whitelist of common library functions [26].
significance, severity of attack and financial loss. Based on
It uses a whitelist obtained from 100 Gb of non-malicious
primary research, four ransomware categories were targeted
software to omit common library functions [26]. These YARA
for this work WannaCry/WannaCryptor, Locky, Cerber and
rules contain a list of hexadecimal strings to compare against
CryptoWall [27], [28], [29]. Thousands of malware samples
suspected malware files for finding the similarity in their byte-
were acquired from the two sources Hybrid Analysis [30]
sequences.
and Malshare [31]. Later, these samples were verified for
their credibility as numerous samples were simply bogus
samples. It was critical to select only credible samples of a
specific category as a reference to test all selected malware
analysis methods and the proposed techniques successfully.
These samples were investigated based on the information
available on VirusTotal [32]. To determine that every sample
was indeed genuine malware or ransomware and they were
members of a specific ransomware category, the criterion
was set that it must be identified as malware by at least
40 or more detection engines on VirusTotal. To check the
ransomware category of collected samples, their category from
WannaCry/WannaCryptor, Locky, Cerber and CryptoWall was
verified manually on the recognized detection engines on
VirusTotal. This sample collection and verification process was
both lengthy and time consuming, leading to 1000 ransomware
samples being selected out of several thousand samples, these
Fig. 6. yabin Generated YARA Rule were equally divided into 250 samples of four ransomware
categories WannaCry/WannaCryptor, Locky, Cerber and Cryp-
1) Advantages- yabin Tool: toWall. The four different categories of ransomware were
• It can be used to cluster malware samples based on the chosen to evaluate how each employed YARA tool and its
reuse of their code. corresponding enhanced method works on the different cate-
• The search patterns can be extended during the post- gories of ransomware.
processing operation. In addition to the collection of malware (ransomware)
• It provides a large whitelist obtained from numerous non- samples, equal numbers of goodware samples were collected
malicious software to omit common library functions. to balance this analysis. These 1000 goodware samples were
• Python script is simple and easy to use through command the files collected from ten commonly used software: JAVA,
line interface. MS OFFICE, Google Chrome, MySQL, R, NMAP, McAFee,
MATLAB, Python and Snort. These 10 different software rules which produce very different results. This result is based
samples were chosen in such a way that it could encompass on the default settings of each tool, however, if the default
a wide range of benign programs and to evaluate how each settings are changed and number of strings and attributes
employed YARA tool and its corresponding enhanced method are increased or decreased then the same tool may produce
functions on the different types of benign program. Finally, a different analysis results. Importantly, if the number of strings
total 2000 samples were utilised to perform all the experiments and attributes are significantly increased then it adversely
applying all employed YARA tools and their corresponding affects the performance of YARA rules as malware analysis
enhanced methods. is always performed on a large sample size.
These three tools are further evaluated based on the values
V. E VALUATING AUTOMATICALLY G ENERATED YARA
of False Positives and False Negatives. This evaluation is based
RULES U SING YARA T OOLS
on four standard evaluation metrics (Accuracy, Precision,
In this section, three selected tools that automatically Recall and F1-Score), which are calculated as shown in Table
generate YARA rules yarGen, yaraGenerator and yabin are II. Here, the overall result of YARA rules generated by yarGen
evaluated. All the tools are applied on the collected and tool is better than the result of YARA rules generated by two
verified ransomware samples of four ransomware corpora other selected tools yaraGenerator and yabin. Moreover, to
WannaCry/WannaCryptor, Locky, Cerber and CryptoWall to evaluate the efficiency of any tool decisively, a balance of
generate YARA rules. The generated YARA rules from each Precision and Recall is very important, therefore, F1-Score
tool are used to perform malware analysis and determine consisting of both may be more helpful in determining a
their malware detection success rate as explained later in relatively better tool. Here, F1-Score of YARA rules generated
the subsections. The experiment is aimed to illustrate the by yarGen tool is 75.49%, which is better than the F1-Score
similarity detection success rate of each YARA tool for each of YARA rules generated by the other two selected tools.
ransomware category separately and collectively. It is expected This shows that YARA rules generated by yarGen are more
and most probably that each sample of the same category holds efficient as compared to YARA rules generated by the other
some similarity to other samples in that category. Therefore, tools yaraGenerator and yabin. Despite the relatively better
experiments evaluate how many samples within one category result of YARA rules generated by yarGen tool, its result is
are matched with at least one other sample of the same not sufficient to consider it as a generic method for analysis in
category by the generated YARA rules through each tool. this particular case. Therefore, further investigation is required
to improve the efficiency of these YARA rules.
A. Evaluation Procedure of Automatically Generated YARA
Rules
All the three tools yarGen, yaraGenerator and yabin are TABLE I
D ETECTION R ESULTS OF AUTOMATICALLY G ENERATED YARA RULES
Python-based tools, therefore the YARA rules generation USING T OOLS YAR G EN , YARAG ENERATOR AND YABIN FOR WANNAC RY,
procedure for all the tools was quite similar by using the L OCKY, C ERBER AND C RYPTOWALL R ANSOMWARE S AMPLES
command line interface. However, the generated YARA rules
are quite different as they are based on different methodolo- Ransomware yarGen- yaraGenerator- yabin-
Category YARA Rules YARA Rules YARA Rules
gies. Utilising all three tools with their default settings and Detection Rate Detection Rate Detection Rate
databases, YARA rules are generated for all four ransomware
WannaCry 89.6% 28.8% 44.8%
corpora WannaCry/WannaCryptor, Locky, Cerber and Cryp- Ransomware
toWall separately. Evidently, if the default settings are changed Locky 54.4% 6.8% 11.2%
and the number of strings and attributes are increased or Ransomware
Cerber 77.2% 10.4% 17.2%
decreased then the same tool may produce different YARA Ransomware
rules. Furthermore, the automatically generated rules require CryptoWall 27.6% 5.2% 9.6%
post-processing to make them more effective. However, for the Ransomware
rational evaluation of three YARA tools, the generated YARA
rules were evaluated without any post-processing operation.

B. Evaluation Results of Automatically Generated YARA Rules TABLE II


E VALUATION M ETRICS FOR AUTOMATICALLY G ENERATED YARA RULES
Once the YARA rules are generated utilising all three USING T OOLS YAR G EN , YARAG ENERATOR AND YABIN
tools for all four ransomware categories separately, they are
used to detect the similarity for each ransomware category. Evaluation yarGen- yarGenerator- yabin-
The detection results for all four ransomware categories for Metric YARA Rules YARA Rules YARA Rules
all three YARA tools are shown in Table I. The detection Accuracy 79.80% 56.4% 60.35%
results show that YARA rules generated by yarGen tool Precision 95.99% 78.53% 88.09%
Recall 62.20% 12.8% 20.7%
outperformed the YARA rules generated by the other two F1-Score 75.49% 22.01% 33.52%
tools yaraGenerator and yabin. This result indicates that for
the same malware samples, different tools generate different
VI. E NHANCING THE E FFECTIVENESS OF
AUTOMATICALLY G ENERATED YARA RULES WITH F UZZY
H ASHING M ETHOD
A. Enhancing Procedure of Automatically Generated YARA
Rules
Irrespective of the selected YARA rule generation tool,
all YARA rules contain strings which are matched against
strings of examined malware samples. The number and types
of strings determine the success of generated YARA rules.
Nonetheless, threat actors are equally intelligent and under-
stand such mechanisms, and they frequently attempt evasion
by using intelligent modifications in their malware. If only
few or none of the selected strings are found in the examined
samples then YARA rules do not flag samples as malware even
though they may be malware. To enhance the effectiveness of Fig. 7. Fuzzy Hashing Aided Enhanced YARA Rules
YARA rules, the number of strings in YARA rules can be
increased, however, adding a large number of strings in rules
may increase the computational complexity and overheads
affecting the performance of YARA rules significantly. Addi-
tionally, in order to write such complex YARA rules or modify
automatically generated rules, a high degree of expertise is hashing method for all three tools on the four ransomware
required in cyber security [2], [6], [33]. Consequently, it is categories are shown in Table III. Noticeably, enhanced YARA
essential to find a simpler solution to make YARA rules more rules generated by all three tools (yarGen, yaraGenerator and
effective without incurring the complexities stated earlier. yabin) have indicated an improvement in the detection result as
Therefore, the requirement is to explore alternative mech- compared to the original YARA rules generated by these three
anisms other than strings to enhance YARA rules. Fuzzy tools. The detection results show that enhanced YARA rules
hashing is a compact, fast and resource-optimised malware generated by yarGen tool again outperformed the enhanced
analysis method, which may not be effective on its own, YARA rules generated by other two tools yaraGenerator and
nonetheless it can complement YARA rules enhancing its yabin.
effectiveness without affecting complexity significantly [22].
Fuzzy hashing attempts to find structural similarity between The enhanced YARA rules generated using fuzzy hashing
the two entire files in circumstances where the selected strings and three tools are further evaluated based on the values of
cannot be found in the sample [34]. Therefore, they can False Positives and False Negatives. Similarly, this evaluation
complement each other in finding a missed opportunity by one is based on four standard evaluation metrics (Accuracy, Preci-
of the mechanisms. Additionally, fuzzy hashing can provide sion, Recall and F1-Score), which are calculated as shown
the degree of similarity of each matched sample alongside the in Table IV. Here, the overall result of enhanced YARA
outcome of YARA rules which is not achievable in YARA rules generated by yarGen tool is again better than the result
rules alone. Thus, the combined search result can increase of enhanced YARA rules generated by the other two tools
the accuracy and confidence level of the malware analysis. yaraGenerator and yabin. Moreover, to evaluate the efficiency
The operational flow of this proposed fuzzy hashing aided of any tool decisively, a balance of Precision and Recall is very
enhanced YARA rules is shown in Fig. 7 important, therefore, the F1-Score consisting of both may be
more helpful in determining a relatively better tool. Here, the
B. Enhancing Results of Automatically Generated YARA Rules F1-Score of enhanced YARA rules is 79.08%, which is better
The generated YARA rules using three selected tools than the F1-Score of enhanced YARA rules generated by other
yarGen, yaraGenerator and yabin are adapted to incorporate two tools yaraGenerator and yabin. This shows that enhanced
fuzzy hashing method SSDEEP to evaluate their effectiveness YARA rules generated by yarGen are again more efficient as
on all four ransomware corpora WannaCry/WannaCryptor, compared to enhanced YARA rules generated by other two
Locky, Cerber and CryptoWall. The reason for the selection tools yaraGenerator and yabin.
of a particular SSDEEP fuzzy hashing method over other
fuzzy hashing methods (e.g., SDHASH and mvHASH-B) is Finally, the selected three tools yarGen, yaraGenerator and
explained in detail in the paper [8], [14], where SSDEEP is yabin are compared based on their operational and functional
more compact, faster and a resource-optimised fuzzy hashing parameters as shown in Table V. This shows yarGen is rela-
method in comparison to the other fuzzy hashing methods tively better tool in terms of various features, functionalities
[35]. Here, the SSDEEP fuzzy similarity scores greater than and accuracy, however, due to its comprehensive features and
30% are utilised for all the three YARA tools [8]. The detec- functionality, it requires greater resources and computational
tion results of enhanced YARA rules utilising SSDEEP fuzzy overheads, resulting in its slower performance.
TABLE III VII. A DVANTAGES AND L IMITATIONS OF YARA RULES
D ETECTION R ESULTS OF E NHANCED YARA RULES G ENERATED USING
F UZZY H ASHING AND T OOLS YAR G EN , YARAG ENERATOR AND YABIN A. Advantages of YARA Rules
FOR WANNAC RY, L OCKY, C ERBER AND C RYPTOWALL R ANSOMWARE
S AMPLES
YARA rules offers several advantages over other malware
analysis techniques, here are some of the most notable advan-
Ransomware yarGen- yaraGenerator- yabin- tages:
Category Fuzzy Hash Fuzzy Hash Fuzzy Hash • YARA rules offer an easy and efficient way of writing
Enhanced YARA Enhanced YARA Enhanced YARA
Rules Detection Rules Detection Rules Detection flexible and custom rules according to the requirements
Rate Rate Rate of a specific security domain.
WannaCry 93.2% 90.8% 90.8% • YARA rules are an open standard and work on most of
Ransomware the major platforms such as Windows, Linux and Mac
Locky 59.6% 41.6% 41.6%
Ransomware
OS.
Cerber 77.2% 33.6% 33.6% • YARA rules can be easily integrated into Python and
Ransomware C/C++ programming languages.
CryptoWall 38.4% 28% 28% • YARA rules can be used for both static and dynamic
Ransomware
malware analysis.
• Several tools are available to generate YARA rules easily
TABLE IV and efficiently.
E VALUATION M ETRICS FOR E NHANCED YARA RULES G ENERATED
USING F UZZY H ASHING AND T OOLS YAR G EN , YARAG ENERATOR AND
• Several public repositories of YARA rules offer readily
YABIN available rules for malware analysis.

Evaluation yarGen- yaraGenerator- yabin- B. Limitations of YARA Rules


Metric Fuzzy Hash Fuzzy Hash Fuzzy Hash
Enhanced YARA Enhanced YARA Enhanced YARA YARA rules are one of the most established malware
Rules Rules Rules analysis techniques, however, they have some limitations, here
Accuracy 83.55% 74.25% 74.25% some of the most notable:
Precision 96.27% 93.27% 94.54% • YARA rules are commonly written based on IoC strings,
Recall 67.10% 48.5% 48.5%
F1-Score 79.08% 63.81% 64.11%
however, attackers can easily manipulate, replace or en-
crypt these IoC strings to evade them, which could make
these rules less effective.
TABLE V
• IoC strings are extracted from existing malware and their
C OMPARISON OF O PERATIONAL AND F UNCTIONAL PARAMETERS OF
YAR G EN , YARAG ENERATOR AND YABIN T OOLS families through a reverse engineering process, which re-
quires a highly-specialized skill-set in a specific security
Operational yarGen yaraGenerator yabin domain.
and Functional • The success of YARA rules is dependent on the types
Parameters
and number of IoC strings included in rules, however,
IoC Strings Text, Hex, Regu- Text, Hex, Regu- Function
lar Expressions, lar Expressions Prologues
achieving the balance of both is a challenging task as an
Opcodes ineffective and inappropriate number of IoC strings could
Weighing Scores of Yes No No affect the performance of YARA rules adversely.
IoC Strings • YARA rules can be automatically generated using tools,
Portable Yes Yes No
Executable (PE) however, they may require post-processing for their op-
Module timisation, and may not be as effective as manually
Use of Machine Yes No No generated YARA rules.
Learning Methods
Underlying Fuzzy Regular String Prioritiza-Finding rare • YARA rules are effective in detecting malware which
Methods Expressions, Naive tion Logic and functions resemble similarity with the existing malware and their
Bayes Classifier and Code Refactoring by checking families, however, it may miss out new and unique
Gibberish Detector function
prologues malware variants.
Language Python Python Python
Databases good-exports.db, blacklist.txt, whitelist VIII. C ONCLUSION
good-imphashes.db, regexblacklist.txt database
good-opcodes.db, (db.db)
This paper presented an evaluation of automatically gen-
good-strings.db erated YARA rules using three YARA tools yarGen, yara-
Malware Yes No Yes Generator and yabin, including a technique to enhance their
Clustering
Resource Highest Lower Lowest
effectiveness using a fuzzy hashing method. These three tools
Requirement are applied on the collected ransomware samples of four
Speed Slowest Slower Fastest ransomware corpora WannaCry/WannaCryptor, Locky, Cerber
Accuracy Most Accurate Least Accurate Less Accurate and CryptoWall to generate YARA rules. The generated YARA
Open-Source Yes Yes Yes
rules from each tool are used to perform malware analysis
and determine their malware detection success rate. Here, [17] C. Sadowski and G. Levin, “Simhash: Hash-based similarity detection,”
the yarGen tool provided relatively better detection results as 2007. [Online]. Available: www.webrankinfo.com/dossiers/wp-content/
uploads/simhash.pdff
compared to other two tools yaraGenerator and yabin. Later, [18] V. Gayoso Martı́nez, F. Hernández Álvarez, and L. Hernández Encinas,
the generated YARA rules using three selected tools yarGen, “State of the art in similarity preserving hashing functions,” 2014.
yaraGenerator and yabin are enhanced by incorporating the [Online]. Available: http://digital.csic.es/bitstream/10261/135120/1/
Similarity preserving Hashing functions.pdf
fuzzy hashing method SSDEEP and their effectiveness on [19] N. Naik, C. Shang, P. Jenkins, and Q. Shen, “D-FRI-Honeypot: A
all four ransomware corpora is re-evaluated. This proposed secure sting operation for hacking the hackers using dynamic fuzzy rule
enhancement improved detection results for all three tools, interpolation,” IEEE Transactions on Emerging Topics in Computational
Intelligence, 2020.
however, yarGen performed relatively better as compared to [20] N. Naik, P. Jenkins, N. Savage, and L. Yang, “A computational intel-
the other two tools yaraGenerator and yabin. In the future, two ligence enabled honeypot for chasing ghosts in the wires,” Complex &
important analyses should be performed: generating YARA Intelligent Systems, 2020.
[21] ——, “Cyberthreat Hunting- Part 2: Tracking Ransomware Threat
rules by adapting various parameters of each tool and eval- Actors using Fuzzy Hashing and Fuzzy C-Means Clustering,” in IEEE
uation of additional YARA tools and their generated YARA International Conference on Fuzzy Systems (FUZZ-IEEE). IEEE, 2019.
rules. [22] ——, “Cyberthreat Hunting- Part 1: Triaging Ransomware using Fuzzy
Hashing, Import Hashing and YARA Rules,” in IEEE International
Conference on Fuzzy Systems (FUZZ-IEEE). IEEE, 2019.
ACKNOWLEDGEMENT [23] F. Roth. (2018) yarGen is a generator for YARA rules. [Online].
The authors gratefully acknowledge the support of Hybrid- Available: https://github.com/Neo23x0/yarGen
[24] ——. (2017) How to post-process YARA rules generated
Analysis.com, Malshare.com and VirusTotal.com for this re- by yarGen. [Online]. Available: https://medium.com/@cyb3rops/
search work. how-to-post-process-yara-rules-generated-by-yargen-121d29322282
[25] C. Clark. (2013) yaraGenerator: Automatic YARA rule generation.
R EFERENCES [Online]. Available: https://github.com/Xen0ph0n/YaraGenerator
[26] C. Doman. (2018) yabin: A YARA rule generator for finding
[1] K. Baker. (2020) Malware Analysis. [Online]. Available: https: related samples and hunting. [Online]. Available: https://github.com/
//www.crowdstrike.com/epp-101/malware-analysis/ AlienVault-OTX/yabin
[2] C. S. Culling. (2018) Which YARA Rules : Basic or Advanced? [27] K. Savage, P. Coogan, and H. Lau, “The evolution of ransomware -
[Online]. Available: https://vt-gtm-wp-media.storage.googleapis.com/ Symantec,” pp. 1–57, 2015.
2.0-Which-YARA-Rules-Rule-Basic-or-Advanced-1.pdf [28] Y. Klijnsma. (2019) The history of Cryptowall: a large scale
[3] N. Naik, P. Jenkins, N. Savage, L. Yang, K. Naik, and J. Song, “Em- cryptographic ransomware threat. [Online]. Available: https://www.
bedding fuzzy rules with YARA rules for performance optimisation of cryptowalltracker.org/
malware analysis,” in IEEE International Conference on Fuzzy Systems [29] Malwarebytes. (2019) Ransomware. [Online]. Available: https:
(FUZZ-IEEE). IEEE, 2020. //www.malwarebytes.com/ransomware/
[4] D. French. (2012) Writing effective YARA signatures to identify [30] Hybrid-Analysis. (2019) Hybrid Analysis. [Online]. Available: https:
malware. [Online]. Available: https://insights.sei.cmu.edu/sei blog/ //www.hybrid-analysis.com/
2012/11/writing-effective-yara-signatures-to-identify-malware.html [31] Malshare. (2019) A free Malware repository providing researchers
[5] Intezer.com. (2019) Generate advanced YARA rules based on code access to samples, malicious feeds, and YARA results. [Online].
reuse. [Online]. Available: https://intezer.com/wp-content/uploads/ Available: https://malshare.com/index.php
2019/06/Intezer YARA White Paper.pdf [32] VirusTotal. (2019) Virustotal. [Online]. Available: https://www.
[6] V. Alvarez. (2019) YARA Documentation, Release 3.10. 0. virustotal.com/#/home/upload
[Online]. Available: https://buildmedia.readthedocs.org/media/pdf/yara/ [33] R. Dias. (2014) Intelligence-Driven Incident Response with YARA.
latest/yara.pdf [Online]. Available: https://www.sans.org/reading-room/whitepapers/
[7] F. Roth. (2019) YARA performance guidelines. [Online]. Available: forensics/intelligence-driven-incident-response-yara-35542
https://gist.github.com/Neo23x0/e3d4e316d7441d9143c7 [34] N. Naik, P. Jenkins, N. Savage, L. Yang, T. Boongoen, and N. Iam-
[8] N. Naik, P. Jenkins, N. Savage, L. Yang, K. Naik, J. Song, T. Boongoen, On, “Fuzzy-Import Hashing: A malware analysis approach,” in IEEE
and N. Iam-On, “Fuzzy hashing aided enhanced YARA rules for mal- International Conference on Fuzzy Systems (FUZZ-IEEE). IEEE, 2020.
ware triaging,” in IEEE Symposium Series on Computational Intelligence [35] N. Naik, P. Jenkins, J. Gillett, H. Mouratidis, K. Naik, and J. Song,
(SSCI). IEEE, 2020. “Lockout-Tagout Ransomware: A detection method for ransomware
[9] VirusTotal. (2019) YARA in a nutshell. [Online]. Available: https: using fuzzy hashing and clustering,” in IEEE Symposium Series on
//virustotal.github.io/yara/ Computational Intelligence (SSCI), 2019.
[10] N. Naik, P. Jenkins, N. Savage, L. Yang, K. Naik, and J. Song,
“Augmented YARA rules fused with fuzzy hashing in ransomware
triaging,” in IEEE Symposium Series on Computational Intelligence
(SSCI), 2019.
[11] V. Alvarez. (2019) Writing YARA rules. [Online]. Available:
https://yara.readthedocs.io/en/v3.4.0/writingrules.html
[12] Readthedocs. (2019) Writing YARA rules. [Online]. Available:
https://yara.readthedocs.io/en/v3.5.0/writingrules.html
[13] J. Kornblum, “Identifying almost identical files using context triggered
piecewise hashing,” Digital investigation, vol. 3, pp. 91–97, 2006.
[14] N. Naik, P. Jenkins, and N. Savage, “A ransomware detection method
using fuzzy hashing for mitigating the risk of occlusion of information
systems,” in 2019 IEEE International Symposium on Systems Engineer-
ing (ISSE), 2019.
[15] A. Tridgell, “Efficient algorithms for sorting and synchronization,” Ph.D.
dissertation, Australian National University Canberra, 1999.
[16] F. Breitinger and H. Baier, “A fuzzy hashing approach based on random
sequences and hamming distance,” in Annual ADFSL Conference on
Digital Forensics, Security and Law. 15, 2012. [Online]. Available:
https://commons.erau.edu/adfsl/2012/wednesday/15

You might also like