Article 7
Article 7
Abstract: Nowadays, the security of all systems connected to the public network is severely tested. Most users try to protect
themselves against many abusive practices by using many security tools to keep their privacy safe. Information technology
security involves many branches that address the prevention and protection against malicious software. One of those branches
is the analysis of malicious files, specifically we will focus on the static analysis of malware. In static analysis, a suspicious
sample is not executed and observed as in dynamic analysis, but many tools and methods are used to extract meaningful
character strings from sample, data from the header of executable file format, information about the type of compression, the
type of compiler used to create the file, and last but not least the application code. This work provides an initial insight into the
complex subject of static analysis.
Keywords: Forensic analysis; Static analysis; Malware; Portable executable; String; PE header; Extractor; Obfuscation;
Compression.
45
Science & Military 1/2020
One of the main differences between static and own tools and scripts that allow the user to automate
dynamic analysis is that static analysis is somewhat and speed up repetitive tasks.
safer than dynamic one because it does not directly Before we begin, the risks associated with using
execute malicious code. Therefore, we do not need to these services should be understood. False positives
worry too much about becoming a victim of and false negatives will always be a problem. Even if
dangerous malware techniques. The risk of accidental 100 % of antivirus products indicate that a file is safe,
execution of malware can be further reduced by using that doesn’t necessarily mean the file is safe. This can
a virtual machine (VMware, VirtualBox, etc.), by also be applied the other way around. In addition, if a
analysing malware on an operating system for which private instance of the service does not start, files that
it was not made or by increasing the level of User have been uploaded to public websites may be
Account Control (confirmation is required to run the automatically shared with other resellers and third
program). Another advantage of the static analysis is parties. This is generally good because the vendors
the possibility to detect potential functions of need samples to build new signatures. However,
malicious software that may not be found during targeted malware may contain hard-coded usernames,
dynamic analysis. Although the static analysis is passwords, domain names, or Internet Protocol
more thorough, it is also more time-consuming. Many addresses (abbreviated IP) of internal systems that
methods used in static analysis increase the time should not be distributed to suppliers and possibly to
required to analyse code. Nowadays, almost every the public. [1]
malware is obfuscated, which means that parts of the Probably the best known online tool for analysing
program are replaced by another functionally malware is VirusTotal. This tool allows you to upload
equivalent parts that are encoded, compressed or a dangerous file, check a suspicious Uniform
intentionally extended with random or confusing Resource Locator (abbr. URL), search for an already
code. Because of this, security teams do not use such uploaded file using a hash and so on. Then it can
detailed analysis when dealing with a large number of perform automatic forensic analysis on the uploaded
incidents. Limited capabilities, resources and time do file using more than 60 antivirus engines as shown in
not allow each incident to be resolved by slow Fig. 1. The result of such scan are simple pieces of
methods and therefore security teams tend to use information quickly obtained by many methods of
automatic, partially less informative methods. And static and dynamic analysis. On the other hand, the
even after a comprehensive analysis of the code, they disadvantage of this tool is its closed source code.
may not be able to identify all the functions that the Similar features are provided by other online scanners
software could potentially perform (for example such as VirSCAN and Jotti [3], [6].
external communication with websites, servers or As previously mentioned, the use of multi-AV
receiving encryption keys from the environment) [4], online service is quite simple. All you need to know
[5]. is the URL of a specific online tool (eg
Www.virustotal.com, www.virscan.org or
3 ONLINE ANTIVIRUS SCAN virusscan.jotti.org) and after opening the website, the
suspicious file only needs to be dragged to the website
The first step in analysing files is to make sure that or the full path to the file which will be recorded and
sample is perceived as malicious code using available scanned. These online scanners provide sophisticated
antivirus (abbr. AV) tools. Online multi-AV scanners scripts and custom applications for their faster use
provide a quick and clear picture of an unknown file and automation of certain tasks. The script called
that can be potentially dangerous for us. In many virt.py created by Xiaokui Shu was used to illustrate
cases the use of these services is very easy because of the use of VirusTotal service. By modifying the
the intuitive and user-friendly interface. Some online registry in Windows, this auxiliary batch script has
scanners allow their services to be used with their been added to the right-click context menu:
46
Science & Military 1/2020
REG ADD "HKEY_CLASSES_ROOT\*\shell\Scan Internet (for example the Internet Relay Chat protocol
with VirusTotal" is easily recognized by its text commands), file names
REG ADD "HKEY_CLASSES_ROOT\*\shell\Scan and file paths that the malware works with, or
with VirusTotal\command" /t REG_SZ /d decryption keys for the encrypted parts of the code.
"\"%CD%\script_check_file.bat\" \"%%1\"" Although strings do not give a clear picture of the
purpose and capability of a file, they can give a hint
The code of the auxiliary batch script looks like
about what malware is capable of doing [4].
this:
This approach will not work with encrypted
REM Enter the directory which contains our scripts strings and the output may additionally contain a
cd /d "%~dp0" significant amount of strings that do not represent any
REM Execute the script with the parameter -s (send meaningful information. Malware authors often use
file) and the input data tools and methods to prevent reverse engineering and
python.exe virt.py -s %1 encoding or compression to make the analysis and
REM Wait for the online scanner to process the file detection more complicated. A software without
timeout /T 15 /NOBREAK malicious code almost always contain a large number
REM Execute the script with the parameter -r of strings, while compressed malware has only a few.
(retrieve report) and the input data Therefore we know that if we encounter a software
python.exe virt.py -r %1 containing a small number of strings, it is probably
compressed and may contain a malicious code. Then
A community that uses online multi-AV scanner
the extraction of strings can be used again after the
services is raising its global level of IT security by hidden part of the code is unpacked.
sharing results of scanned malicious files and URLs.
However, such openness to the community is also a
4.1 Tools Strings, HexDive or BinText
major disadvantage of the online scanner, what makes
it useless in some cases. Specifically, the biggest Specialized software such as Strings, HexDive or
problem is the fact that all users may retrieve a report BinText can be used to search for strings stored in the
of any sample at any time. Authors usually modify program. All of these programs search for Unicode
their malware to have a unique hash fingerprint (no or ASCII characters and list all strings with a pre-set
sample with that fingerprint has yet been analysed by length. Strings from Windows Sysinternals is a basic
VirusTotal). And when the analyst uploads the tool that implements string extraction and its main
sample to VirusTotal, the author of malware advantage is a great compatibility. Once downloaded,
immediately learns that his malware was found and is it is a good idea to copy this tool to a directory which
being analysed by a forensic analyst. Because of this, is included in the environment variable named Path
an attacker may change the behavioural strategy, turn (the content of the variable can be displayed by
off the sample and so on. Although the tool provides executing the command “set“ or “echo %Path%“) or
helpful features and integrates many analytical add the path to the variable in order to run Strings
methods and tools, it is not advisable to use from the command line [3], [7].
VirusTotal during an analysis conducted by a security If you want to list strings of seven or more
team such as Computer Security Incident Response characters from a suspicious file, use the following
Team (abbr. CSIRT) [1], [3]. command (Fig. 2):
Extraction of strings (a sequence of Unicode and HexDive is an intelligent extractor that speeds up
ASCII characters - American Standard Code for the analysis of strings obtained from executable files.
Information Interchange) from the suspicious This is achieved by displaying only the relevant
software is another method used by analysts when strings for malware analysis (its output is about two-
analysing malicious files. This extraction is probably thirds smaller than the output of Strings) [5], [8].
the simplest method by which it is possible to reveal Finally, BinText provides useful information
some features of the program. This method tries to about strings in an intuitive graphical interface with
find meaningful text strings in binary files that create the options to search, filter and store the output data
a sequence of bytes with values in the range of in the table as depicted in Fig. 3. In the Windows
printable characters ending with the byte of zero operating system the shortcut to this application or the
value. It is basically a trivial data mining from the application itself may be copied to the folder
binary files that can often be quite effective. It is a C:\Users\<username>\AppData\Roaming\Microsoft\
source of a huge number of artefacts, some of which Windows\SendTo (in Explorer also accessible via the
may be crucial for forensic analysis. Such crucial address shell:sendto), that way, it'll be always
artefacts include various strings such as IP addresses available for quick use [9].
and URLs with which the malware is able to
communicate, registry keys with values, commands
that malware uses for communication over the
47
Science & Military 1/2020
48
Science & Military 1/2020
5 PORTABLE EXECUTABLE FILE begins with a header containing information about the
FORMAT code, the type of application, the required library
functions, the required disk space, the creation date
In static analysis other very useful pieces of and many more. Just the list of used libraries and
information can be obtained from the headers and function calls can reveal many features of the
sections of the Portable Executable (abbr. PE) file program [5].
format such as the list of all Dynamic-link Libraries A PE file consists of a number of headers and
(abbr. DLL) and functions that the file imports. sections. To maintain compatibility with the old
Binary executable files (usually with extensions like Microsoft Disk Operating System (abbr. MS-DOS),
exe, dll, sys, acm, mui and others) used in all versions each PE file begins with a header programmed for
of Windows operating system (abbr. OS) are that system. This header is known as
nowadays mostly in PE file format (rarely some IMAGE_DOS_HEADER. In most cases it only
legacy file formats are used) which is defined by the contains the message "This program cannot be run in
exact data structure. Data structure of PE file format DOS mode." Especially the first e_magic field is
contains the information necessary for the Windows interesting from an analyst's point of view because it
OS loader to manage the wrapped executable code. is always at the beginning of each executable file and
As the name implies, the Portable Executable file it has the fixed value of two characters (MZ).
format is portable between all versions of Windows Therefore, if the analyst knows that this is an
OS regardless of the way the processor carries out the executable file, but there are no MZ characters at the
instructions of a computer program. Therefore the PE beginning of the file, it is possible that the file is
file can be executed on 32-bit systems as well as 64- encrypted. Additionally these characters, together
bit systems [3]. with that MS-DOS message, can help us find out the
The data structure of the PE file format apart from encryption key and decrypt the program because the
the actual application code and application data also values of both these fields of data are known [3], [4].
defines the header where you can find detailed The IMAGE_FILE_HEADER (PE Header)
information about that program. Excluding the structure contains basic information about the
program code itself, the file header is one of the main file, such as the date and time the file was created
sources of information in the static analysis, mainly (TimeDateStamp), the number of sections
because the header is available immediately at the which immediately follow the headers
start of the analysis and it can provide a first insight (NumberOfSections), the processor architecture
into the parameters and features of the analysed (Machine) for which the program is intended etc.
malware. Fig. 4 shows the structure of PE files which Such pieces of information are very important in the
49
Science & Military 1/2020
static analysis. For example, the creation date of the figure out some of the application functions such as
file will determine whether it is an old sample or a the feature to connect to the Internet or work with
new one that has not yet been scanned by an antivirus other files or resources. Based on this we can search
technology. Also a value stored in TimeDateStamp for other artefacts such as IP addresses, domain
could not make any sense at all (referring to the future names, file or application paths and so on [3].
or the distant past). This artefact usually deepens our The specialty of malware developers is the
suspicions that the file may be malicious [4]. runtime linking. In the runtime linking, the functions
Moreover the header of PE file includes a are called during the program execution when a
structure called IMAGE_OPTIONAL_HEADER specific function is directly requested. Functions are
(Optional PE Header), which contains additional neither imported at the time of compilation nor
pieces of information for static analysis. There is an embedded directly into the program code. These
important field called AddressOfEntryPoint that functions are often called using the system functions
contains the address of the entry point at which the (known as system API) such as LoadLibrary,
program execution starts. The ImageBase field is also GetProcAdress, LdrLoadLibrary, LdrGetProcAddr or
essential. It determines at which address in the using a serial number (each function has an assigned
memory the image of the program should be placed. number). Then those system functions can be found
Its default value is always 0x00400000 (for the DLL in the import directory table. Runtime linking is
it is 0x10000000) and, as with TimeDateStamp, usually used in the programs that are encoded,
another value can be a sign of something potentially compressed or encrypted, and their code is used as a
malicious. malware loader that extracts or decrypts the code of
The headers are followed by a table of sections the application itself, which then loads the required
and sections themselves which are an excellent source libraries at runtime. Malware authors take advantage
of information for forensic analysis. Here we will be of the compression or the encryption to hide program
interested in the sizes of individual sections. The functionality but it can be sometimes found in
virtual size (VirtualSize) specifies how much space legitimate applications as well [2], [4].
should be reserved for the section when loaded into
memory. The field named SizeOfRawData contains 5.1 Tools CEF Explorer, GT2
the size of the section or the size of the initialized data
on disk. These sizes should be with small variations
approximately the same. If the virtual size is much The tool called CFF Explorer allows you to
larger than the size of raw data, it might indicate that extract the metadata from the PE file header as can be
the file has been compressed [4]. seen in Fig. 5. Additionally it offers the basic
One of the most useful pieces of information that translation of machine language into assembly
we can gather about an executable is the list of language, but in practice it is preferred to use
functions that it imports. Imports are functions used specialized tools. The main advantage of this tool is
by one program that are actually stored in a different that it presents the results completely and precisely,
program, such as code libraries that contain including the offset values, hexadecimal values with
functionality common to many programs. Code their meaning and other values a field might contain.
libraries can be connected to the main executable by On the other hand, the program expects that the user
linking. Programmers link imports to their programs will be professionally experienced and able to
so that they don’t need to re-implement certain correctly interpret the listed values. Therefore it does
functionality in multiple programs. The information not inform the user about any anomalies and does not
we can find in the PE file header depends on how the present any special results by their significance in
library code has been linked. Code libraries can be forensic analysis, but only completely presents the
linked statically, at runtime, or dynamically [2]. results in the order in which they were found out [10].
Static linking is the process of copying the entire GT2 is a command line program which is able to
code of imported functions directly into the body of identify most of the executable files and archives by
the program what may result in a huge increase of the their binary signatures. So it is different from standard
file size. Because of this impractical fact, static Windows filetype detection since it does not consider
linking is not very used nowadays. In the field of the file's extension by default. In addition, it can also
malware analysis, dynamic and runtime linking is read and analyse the metadata obtained from the file
crucial [4]. header [5], [11].
When dynamic linking is used, program imports The frequently used tool called Dependency
functions during its compilation. The code of the Walker can list all DLL libraries used in executable
function is not stored directly in the program but it is programs (this feature is also included in the tools
stored only as a reference in the header of PE file. The mentioned above). Dependency Walker also displays
.idata section contains the import directory table a recursive tree of all the dependencies of the
which includes a list of entries for every DLL which executable file (all the files it requires to run) which
is loaded by the executable. In the first stage of the is evident in Fig. 6 [12].
analysis, thanks to the import table the analyst can
50
Science & Military 1/2020
51
Science & Military 1/2020
6 COMPRESSION OF MALWARE
52
Science & Military 1/2020
References
Prof. Dipl. Eng. Jozef ŠTULRAJTER, CSc.
Armed Forces Academy of General M. R. Štefánik
[1] LIGH, M. H., ADAIR, S., HARTSTEIN, B.,
Department of Computer Science
RICHARD, M.: Malware Analyst's Cookbook.
Demänová 393
Indianopolis : Wiley Publishing, Inc., 2011.
031 01 Liptovský Mikuláš
s. 746. ISBN 978-0-470-61303-0.
Slovak Republic
[2] SIKORSKI, M., HONIG, A.: Practical Malware
E-mail: [email protected]
Analysis. San Francisco : No Starch Press, Inc.,
2012. s. 802. ISBN-10: 1-59327-290-1.
[3] KRÁL, B.: Forenzní analýza malware. Brno :
Vysoké učení technické v Brne, 2018, s. 63. Andrej Fedák - was born in Žiar nad Hronom
[4] DANILOV, M.: Metody a nástroje malwarové in 1994. He received his engineering degree from the
analýzy. Praha : Vysoká škola ekonomická Armed Forces Academy of General M. R. in the field
v Prahe, 2016. s. 85. of Military Communication and Information
[5] FUJTÍK, O.: Zjišťování podobnosti malware. Systems. Nowadays, he is an officer of aeronautical
Brno : Masarykova univerzita, 2014. s. 72. ground information systems - Air Force
Headquarters. His research is focuses on computer
[6] VirusTotal. [Online]. [accessed 20. July 2019].
networks, information systems, information and
Retrieved from: <https://www.virustotal.com/
cyber security.
gui>
[7] Strings – Windows Sysinternals. [Online]. Jozef Štulrajter works as a professor at the
[accessed 20. July 2019]. Retrieved from: Department of Informatics, Armed Forces Academy
<https://docs.microsoft.com/en-us/sysinternals/ of General M. R. Štefánik in Liptovský Mikuláš. He
downloads/strings> graduated (Ing.) at the Military Technical College in
[8] HexDive 0.6. [Online]. [accessed 20. July 2019]. 1974. He obtained the degree of CSc. diploma in
Retrieved from: <http://www.hexacorn.com/ Theoretical Electrical Engineering - Theory of
blog/category/software-releases/hexdive> Circuits and Systems of the Military Academy in
[9] BinText. [Online]. [accessed 20. July 2019]. Liptovský Mikuláš in 1992. His research interests
Retrieved from: <https://www.aldeid.com/wiki/ include Information and Communication Technology
BinText> (ICTs), computer architectures, image coding,
computer security.
[10] Explorer Suite. [Online]. [accessed 25. July
2019]. Retrieved from: https://ntcore.com/
?page_id=388
[11] GT2 0.34. [Online]. [accessed 25. July 2019].
Retrieved from: <http://www.helger.com/gt/
gt2.htm>
[12] Dependency Walker 2.2. [Online]. [accessed 25.
July 2019]. Retrieved from: <http://www.
dependencywalker.com>
53