Mas 3
Mas 3
com
1. Introduction
Welcome to the third article in the MAS (Malware Analysis Series). After two articles that, hopefully,
provided you with information for an initial foundation and motivation about malware analysis, so let’s
move forward to learn other interesting aspects of malicious Windows binaries from well-known samples,
which are available to download from public sandboxes.
While I’m not sure whether you’ve read or not the first two articles, you can get them from the following
links below:
▪ MAS_1: https://exploitreversing.com/2021/12/03/malware-analysis-series-mas-article-1/
▪ MAS_2: https://exploitreversing.com/2022/02/03/malware-analysis-series-mas-article-2/
I will not review all concepts presented in my last two articles and, if necessary, so I recommend reading
them when it’s possible. Of course, in practical terms and over the time, several techniques and
approaches already explained will be repeated over and over again to provide you with more experience
on the proposed topics.
I received several questions from professionals who have asked me about the purpose of this series, so it’s
time to make it clear: the purpose is to show several malware analysis techniques, approaches, contexts
and concepts associated with the topic, as already mentioned in previous articles.
On lab setup, readers could use the procedure of the lab setup and tools that I mentioned in the last two
article, and just if need be, so I’ll point out any tool that we haven’t used previously. Please, in case you
need it, I recommend that you read the previous articles in this series.
Anyway, before proceeding, it’s recommended to take a snapshot of your virtual machines and turn off any
network communication and shared folders. While we aren’t handling a ransomware case, avoid exposing
your virtual machines to the local network when analyzing malware samples. Additionally, I’ll be using
REMnux and Windows 8.1/10 (64 bits) to perform any analysis. Thus, if you have the configured lab
proposed in the last article, so you can re-use it.
Now we’re ready to start our analysis.
This time, we are analyzing this sample:
SHA 256: ed22dd68fd9923411084acc6dc9a2db1673a2aab14842a78329b4f5bb8453215
You can easily get it by using Malwoverview (https://github.com/alexandreborges/malwoverview) and
downloading it from Malware Bazaar as shown in the command below:
1|Page
https://exploitreversing.com
▪ malwoverview.py -b 5 -B
ed22dd68fd9923411084acc6dc9a2db1673a2aab14842a78329b4f5bb8453215
2. Gathering information
As usual, our first steps are collecting enough information about the given malware threat. There’re
several tools to accomplish this task, so let’s start by checking it against Virus Total:
[Figure 01] First evaluation of the malware sample against Virus Total using Malwoverview.
Great! Using the same Malwoverview, it’s quite simple to search for our sample on Triage and gather
further information as shows figures below:
3|Page
https://exploitreversing.com
3. Unpacking
Using Die tool (https://github.com/horsicq/Detect-It-Easy) to check further information on the sample, we
have the following points:
▪ The sample was compiled using MS Visual C++ 2005.
▪ It includes MFC library.
▪ Apparently there is a simple anti-debugging trick (IsDebugPresent( ))
▪ The entropy is high for .rsrc and .text sections, but it isn’t always a solid fact to confirm that the
sample is packed.
Using PE-Bear (https://github.com/hasherezade/bearparser) we are able to collect further information:
▪ It’s a 32-bit sample (from FileHdr tab).
▪ It doesn’t statically load any DLL related to network communication using WinSock2, WinINet,
COM and so on. This fact might be strange because, unless it’s a wiper, common malware threats
usually establishes a network communication to its creator. Thus, either the malware loads
network API dynamically or it might be packed.
▪ The malware imports some resource-related APIs, which could indicate that resources could
contain some data configuration and other useful information. Some of these APIs are:
▪ FindResource
▪ FindResourceExA
▪ LoadResource
▪ SizeOfResource
▪ LockResource
5|Page
https://exploitreversing.com
▪ FreeResource
▪ DllRegisterServer
▪ DllUnregisterServer
Using Resource Hacker tool (http://www.angusj.com/resourcehacker/), we confirm that there is some
data within resources, but it might not have any relation to the real payload:
[Figure 06] Examining the resource content using Resource Hacker tool
So far, we aren’t sure whether the sample is packed or not, so we have to use debugger to confirm it.
Remember that it is a DLL, so we need to debug the rundll32.exe and provide, as argument, the DLL and
one of the exported functions, which is the DllRegisterServer( ) (function #1).
As I mentioned in the first article of this series, there’re many ways to unpack malware samples, which
some of them are semi-manual (using debuggers), automatic (pe-sieve and hollows_hunter) and even
completely manual through scripts.
Whatever be your choice, start up your virtual machine (Windows 8.1 or Windows 10), open up the
x32dbg (it’s a 32-bit DLL -- https://x64dbg.com/) and load the rundll32.dll
(C:\Windows\SysWOW64\rundll32.exe) for debugging. Go to File → Change Command Line and type a
similar line, providing the DLL and the first exported function (or its respective ordinal number):
▪ "C:\Windows\SysWOW64\rundll32.exe" C:\Users\Administrator\Desktop\MAS_3\mas_3.bin, #1
Press CTRL + F2 to reload the binary with the provided argument and, likely, the debugger had stopped on
the System Breakpoint. Play F5 once and you’ll have stopped one the Entry Point.
Before proceeding, double-check to be sure that the virtual machine doesn’t have any shared folder and
network communication with any system (internal or external to your lab). Typically I disable any
network interface.
6|Page
https://exploitreversing.com
[Figure 07] Unpacking and extracting the PE binary during a x32dbg session
Open up the dumped sample in PE-Bear and you’ll notice that sections headers are messed up:
▪ .rdata size - .text size == 23000 – 1000 == 22000, so fill .text size with this value.
▪ .data size - .rdata size == 24000 – 23000 == 1000, so fill .rdata size with this value.
▪ .reloc size - .data size == 26000 – 24000 == 2000, so fill .data size with this value.
Another simpler approach to unpack the malware is through hollows_hunter tool, which there’re versions
to x86 and x64 (https://github.com/hasherezade/hollows_hunter). In this case, you should:
a. run hollows_hunter64 in loop to ensure to catch any implant on memory: hollows_hunter64 /loop
b. run the malware: rundll32 mas_3.bin,#1 (take care: the malware is going to remove itself, so keep
a backup of it)
Hollows_hunter64 will provide two DLLs almost of same size, but you should prefer the larger one. Of
course, you can observe the injected DLL in the the running regsvr32.exe process:
[Figure 13] Hunting URLs and IP addresses through regex on Process Hacker
9|Page
https://exploitreversing.com
4. Reversing
As usual, let’s start our reversing session using IDA Pro 7.7.x and, just in case you don’t have this version,
you could follow the reversing session using IDA Home 7.7.x. (https://hex-rays.com/ida-home/). We’re
going to keep the focus on few objectives such as:
▪ Renaming variables and functions.
▪ Decrypting strings
▪ Extracting C2 data configuration
▪ Handling hashed functions
▪ Extracting eventual public keys
▪ Fixing calling conventions whether it’s necessary
▪ Creating C++ structures whether they are necessary and make our understanding easier.
Differently from last article, my intention is not to enter in deep details and I’ll try to keep this article short.
Some professionals asked about reasons why I don’t use dynamic analysis. Actually, it’s a matter of
personal preference for static analysis although I think dynamic analysis very useful and I also use it in
several stages such as:
1. Understanding a network protocol communication for writing an emulation script.
2. Confirming whether a given function of the malware works as I think it does.
3. Unpacking (in general).
4. Handling specific shellcode analysis.
5. Analyzing first stages in .NET format (next article).
At same way, in several consulting services, I usually extract valuable information by performing memory
analysis (Volatility) and gathering important indicators, information and artifacts such as:
a. Created services (persistence)
b. Network communication information (C2)
c. Code injected (evasion)
d. Hooked functions (evasion)
e. Detecting callbacks (rootkits)
f. Unpacked binary (unpacking)
g. Created/Changed Registry entries (persistence)
Certainly I could write a large section using memory analysis before starting the reversing phase, but this
article would become so big and, eventually, it’s a good opportunity to a near future.
During this section, we’ll use the same IDA plugins presented in the second article of Malware Analysis
Series (MAS), though there’re other good ones I’d like to show you in next articles of this series:
▪ Flare Capa Explorer: http://github.com/mandiant/capa.git
▪ ApplyCalleeType: https://github.com/mandiant/flare-ida
▪ StructTyper: https://github.com/mandiant/flare-ida
▪ HashDB: https://github.com/OALabs/hashdb-ida
▪ Findcrypt-yara: https://github.com/polymorf/findcrypt-yara.git
10 | P a g e
https://exploitreversing.com
Please, if you don’t know how to install all of these plugins, so read the second article of the this series
where I showed further details about how to do it.
Open up the unpacked binary on IDA Pro and go to View → Open Subviews → Type Libraries (SHIFT+F11
hotkey) and insert important libraries (INS hotkey) such as:
▪ mssdk_win7 (already inserted automatically)
▪ ntapi or ntapi_win7
▪ ntddk_win7
▪ vc10 (not always)
Although it is not necessary and doesn’t make different in this article, it’s always advisable to add some
signatures, which will help you in most of reversing cases, by going to View → Open Subviews →
Signatures (SHIFT+F5) and inserting (INS hotkey) few library modules such as:
▪ vc32rtf
▪ vc32ucrt
▪ vcseh
As we’re going to use decompiler, it’s also recommended to decompile the entire file first to avoid
misunderstandings while analyzing code. Thus, go to File → Produce File → Create C File (CTRL+F5) and
save the .c file in the same directory of the unpacked malware. The decompiling process take some
seconds to finish. Now open up a Pseudo Code window and setup if side by side with the Assembly View
window and synchronize it with the IDA View (right click → Synchronize with).
To collect contextualized information, go to Edit → Plugins → Flare Capa Explorer and starts the analysis
of our first findings, but this time against the assembly code:
[Figure 14] Evaluating malware capabilities through Flare Capa Explorer on IDA Pro
Unfortunately we didn’t get too much information, but we learned that:
▪ There’s a parsing of a PE header, which used for hashing functions and shellcode.
▪ There’re a possible Base64 manipulation.
▪ There’re some XOR operations.
▪ Finally, a subroutine (sub_B963F0) might be using a hashing algorithm named murmur3.
11 | P a g e
https://exploitreversing.com
Of course, the recommendation is to always check all information presented by Flare Capa Explorer, but
whether the malware is really using a hash function as murmur, so we know that:
▪ It’s a well know non-cryptographic hash function.
▪ Produces a 32-bit or 128-bit hash value.
▪ We’re able to find its implementation in several programming languages on the Internet.
There’re other weird points about this sample:
▪ IDA Pro only shows three strings (SHIFT+F12)
▪ There isn’t imported functions, so possibly all of them are resolved dynamically.
▪ There’re the native DLLEntryPoint( ) and only one user function exported: DllRegisterServer( )
As readers already know, strings usually offer a good guide along reversing tasks, but this time we don’t
have any one here. If we jump to DllRegisterServer( ), the first impression is not good because there’re
many XOR and ADD operations with hexadecimal numbers that, initially, we don’t have any clue about
what they are and do:
[Figure 15] Hexadecimal constant being manipulated (xor, add) against a structure
12 | P a g e
https://exploitreversing.com
A next issue is that, on decompiler, most of constants are represented in decimal format instead of having
them in hexadecimal format, as shown in sub_B91FD0 method:
13 | P a g e
https://exploitreversing.com
It’d recommended to produce a new C file again (File → Produce File → Create C File (CTRL+F5)) and, if
you still see decimal representation, so just refresh the pseudo code representation by pressing F5 hotkey.
Maybe you might think that there’re something very strange in Figure 16 and, indeed, there’s some
obfuscation techniques being deployed. In you want to have an overview about what’s happening, it’s
enough to get a graph (View → Open SubViews → Proximity Browser) to see “messed up” control flows:
Entry Point
Dispatcher
As readers might have realized, depending on entry point (state variable) the dispatcher decides by
execution of a different block. The concept of Control-Flow Flattening technique is also used for protectors
that virtualize function’s code. If you remember of first article of this series (MAS), modern obfuscators
have some interesting characteristics:
a. They have special focus on 64-bit code (but they some of them also cover 32-bit code).
b. Not all instructions are virtualized.
c. Strings are encrypted (obviously)
d. Native instructions are translated to virtualized ones (RISC virtual machine instruction set).
e. DLLs and APIs are renamed or hashed.
f. Obfuscation is stack-based.
g. There’re fake push instructions.
h. They use code re-ordering.
i. There thousands of dead-code instructions.
j. They use Control Flow Flattening.
k. Virtualized code is polymorphic, so one native instruction can be translated to many different
virtualized instruction representations, where one or another could be used anytime.
l. There are usually critical context switch during the transition from native execution to virtualized
execution and vice versa.
The execution cycle is composed by fetching, decoding (translation from x86 to RISC context), dispatcher
(depending on the instruction a determined handler is executed) and handler (the implementation of the
virtual machine instruction set). Therefore, given a decoded instruction, the dispatcher decides which
handler will be executed:
A B C D E F G H I
15 | P a g e
https://exploitreversing.com
Only to supplement the previous explanation (it isn’t related to Emoted sample), in cases of malware using
virtualized instruction set, these instructions are usually stored in an array (encrypted form), and to
execute any virtualized instruction, an index is provided, which refers to array’s slot. So the instruction is
decrypted and the retrieved opcode points to a function pointer (handler) that’s is finally executed, as
shown below:
decrypted
vm_add vm_sub vm_xor vm_push vm_pop ... vm_n instructions
encrypted
encr_1 encr_2 encr_3 encr_5 encr_4 ... encr_n instructions
1 2 3 4 5 n-1 n indexes
recovering and
decrypting functions
16 | P a g e
https://exploitreversing.com
17 | P a g e
https://exploitreversing.com
▪ sub_B91FD0
▪ sub_B8BA9C
Going inside the first one (sub_B91FD0), there’re many calls to subroutines and it is a large function.
Anyway, there’re some methods that could be interesting:
▪ sub_B9ACFF (called many times)
▪ sub_B8B9D7 (called many times)
▪ sub_B9D14C → sub_B84BB4 (called many times)
▪ sub_B86A8D
o sub_B9BFF0 (called many times)
o sub_B9B558 (PE parsing)
▪ sub_BA1AE9 (DLL related)
▪ sub_B9B558 (called many times)
▪ sub_B86A8D (called many times)
Please, I’d like to remember you that I’m showing real steps during a malware analysis because it’d very
practical (and non-natural) to go to the “right functions” without providing a reasonable and rational
explanation of taken decisions. Furthermore, I’m always focused on explaining how to accomplish the most
important reversing steps instead only showing you the final reversed function, so be patient, please.
Certainly I won’t reverse the entire malware sample in this article (not even close), but I hope I can show
few relevant steps that could help you in your studies. Don’t worry: this series (MAS -- Malware Analysis
Series) will be composed by many articles and we have enough time to discuss different concepts, analysis
and details related to reverse engineering and, mainly, malware analysis.
If reader are wondering how to get the number of cross references to each function call, so there two
obvious alternatives:
▪ readers can manually parse each subroutine call and get its cross-references (X hotkey).
▪ readers can write a script to do it automatically.
18 | P a g e
https://exploitreversing.com
Before proceeding, you can list the available segments (sections) of the malware in IDA Pro by going to
View → Open Subviews → Segments or pressing SHIFT + F7 hotkey:
That’s a good result because we’ve confirmed that some encrypted data related to string, API name or
DLL name is stored there (we don’t know what’s exactly) and, additionally, there’re many other cross data
references (DATA XREF) around the given address. If we expand our searching and look at start of the .text
section by listing segments (SHIFT + F7 or CTRL + S) and double-clicking .text section, we have the
following content shown in the figure below:
Before renaming variables and methods, we have the following context from line 18 onward:
▪ v3 seems to be an array of bytes.
▪ On line 18, (char *)(v3 + 2) points 8 positions ahead. This value is associated to v4. Additionally,
the cast to (char *) is our strong indication that v4 represents the decrypted string.
▪ One line 19, the first four bytes are XOR’d with the next four bytes (*v3 ^ *v3[1]), and stored into
v5.
▪ Notice that, on line 20, v15 is set with v3 content, so *v15 = *v3.
▪ If you examine the remaining of the subroutine below, v12 is set to v4’s content, so *v12 = *v4.
▪ On line 39, v15 (holding v3 content) is XORed with v12 (holding the v4 content).Therefore, so far
v3 (first 4 bytes) seems to be the key and v12 (v4) seems to be the encrypted content.
▪ What’s the encrypted data’s length? Probably it’s the *v3[1], but the real value is hidden under a
XOR operation (line 19), so we have to execute this XOR operation before getting the real length.
▪ Perform a XOR operation between key and the resulting xored string length. It will be the plain text
string’s length.
▪ Use the key to decode the encrypted data from byte 8 onward.
▪ At a second moment, alter the script to create comments next to referring instructions.
Once again, readers can use any development program or environment to write their Python scripts and
one of available options would be to use Jupyter notebook to make drafts while programming because it
offers good debugging messages and support, which are useful mainly at this drafts. To install and use it,
execute the following steps:
1. pip install jupyterlab
2. execute: jupyter-lab
3. Choose Python 3 Notebook (right side)
4. Rename the document (left side)
As I’m going to use IDC/IDA Python functions, so I will be using the own IDA Pro script environment
available in File → Script Command (SHIFT+F2). The following script is well-commented, but I’ll leave some
additional comments after it:
23 | P a g e
https://exploitreversing.com
24 | P a g e
https://exploitreversing.com
25 | P a g e
https://exploitreversing.com
26 | P a g e
https://exploitreversing.com
An educational experience can be done here. I commented the line 150 (fix_operand(xref.to, final_string))
of our script (Figure 33).
If readers to run the script, you will have the following piece of code including strings used as comment
next to instructions and the result will be similar to the visualized below:
[Figure 36] Code commented with decrypted strings and data references renamed
As readers are able to notice, this time we can see data references renamed using the name of decrypted
strings and, additionally, we kept all comments. Of course, we don’t need both ones, but I left them here
to show you the final effect. If you return to the .text section for any of these strings (for example,
urlmon.dll), you’ll see the following:
28 | P a g e
https://exploitreversing.com
▪ We could use byte arrays and, in this case, I made an option to keep everything as string to keep
the code simple.
▪ If reader doesn’t know about the repr( ) function on line 22, which is a Python built-in function, so
search about it on: https://docs.python.org/3/library/functions.html#repr
▪ The data extraction code (lines 27 to 32) is exactly the same from second article of this series, but
it was adapted to extract data from .text section.
▪ On lines 85 and 87 the scripts uses struct.unpack( ). Python struct is a powerful resource to
interpret bytes as packed binary data and it’s able to do this interpretation according to the byte
order (little-endian, big-endian or even native). You can read a bit more about Python structs and
learn from examples on https://docs.python.org/3/library/struct.html.
29 | P a g e
https://exploitreversing.com
▪ On line 101, the decrypter( ) routine is called and, once the result is returned, all single quotes are
removed and, soon after it, any sequence “\r\n” is converted to “\n”.
▪ On line 102, Unicode characters were removed because we don’t understand them and, for this
specific purpose, they won’t be useful.
▪ From line to 115 to 121, we have the routine used to patch the binary and change the data
reference names to the a name represented by the decrypted string. The sequence should be clear:
we converted decrypted strings to byte representation, appended the “\x00” to the end of the
sequence of bytes to get a well-formed string, patched the provided address with each letter of
the decrypted strings and, finally, we created a new string using a IDC function named strlit(long
ea, long len ). Note that we could have specified the length of the string, but we chose using a
string delimitator: https://hex-rays.com/products/ida/support/idadoc/207.shtml.
▪ On line 134, idautils.XrefsTo( ) function is used to get all references to the address of the given
encrypted string, so we are able to get all instructions’ addresses referring to the encrypted string:
https://hex-rays.com/products/ida/support/idapython_docs/idautils.html#idautils.XrefsTo
▪ On lines 138 and 139, it’s suitable to highlight that xref.to provides the address of the encrypted
string and xref.frm provides the address of the instruction referring to the encrypted string.
▪ One line 146, the set_cmt( ) function is used to set an indented comment: https://hex-
rays.com/products/ida/support/idadoc/204.shtml
▪ On line 150 the script call fix_operand( ) that is responsible for patching the idb database by
replacing the string reference by the string itself.
▪ Finally, on line 157, the script offers the option to decrypt only one given encrypted string, but it’s
necessary to comment out the whole while loop between lines 133 and 153.
▪ Readers are able to get both start and end addresses of encrypted strings by examining the .text
section (CTRL+S) as shown below:
[Figure 39] Getting start and end addresses of the encrypted strings
30 | P a g e
https://exploitreversing.com
Now we have decrypted strings, let’s move forward. Return to the beginning of the malware, which is the
exported DllRegisterServer( ), and go to sub_B91FD0( ) subroutine, which is effectively the first one to be
called within DllRegisterServer( ), as shown below:
31 | P a g e
https://exploitreversing.com
▪ The sub_B9BFF0( ) subroutine call (line 5) has several arguments, where the last one seems to be a
hash and, usually, this is expected when analyzing malware samples with obfuscation techniques.
▪ Returning a value/string to a local variable (v1) is an indication that there’s something related to
hash resolution (DLL or API hashing) and, as readers are going to see in this case, an API hashing
name resolution.
▪ Finally, it seems that v1 is contains the name of a function (API) because on line 6 the v1’s content
is used as the name of the called function, which includes several (and fake) arguments.
▪ Reading all 7 lines, the general idea is that the function on line 1 is a wrapper/proxy, where first an
API name is resolved for a given hash and, after being resolved, it’s called. As this wrapper
function on line doesn’t have any useful arguments, so the calling on line 6 doesn’t have any
concrete argument neither.
▪ Readers can easily to confirm that v1( ) call (line 6) doesn’t have arguments by checking the
assembly code and, as you’re able to see, between sub_B9BFF0( ) on line 5 and this one on line 06,
there’s only a stack adjustment:
32 | P a g e
https://exploitreversing.com
In any malware analysis case where there’s API hash resolution, the responsible routine is called many
times (once for each function hash), so the obvious step it to check how many time sub_B9BFF0( )
subroutine is called (X hotkey):
33 | P a g e
https://exploitreversing.com
▪ a possible array (lines 6, 9 and 11), which it seems being used to hold API’s names.
▪ a call to sub_BA1AE9, which the v4 argument comes from stack (ecx).
▪ a call to sub_B9B558 using v5 local variable (returned from sub_BA1AE9) as argument and a4
argument that, according to Figure 44 (sub_B8645E subroutine) is an hexadecimal and possible an
API hash.
Going into sub_BA1AE9( ) subroutine we see:
▪ At same line 6, the _PEB struct has a field named Ldr (offset 0xC), which is a pointer to
PEB_LDR_DATA structure (https://www.nirsoft.net/kernel_struct/vista/PEB.html):
▪ You can see the same _PEB structure on IDA Pro by going to Structure tab (SHIFT+F9), pressing
INSERT key, clicking on Add Standard Structure, searching for _PEB structure and adding it:
34 | P a g e
https://exploitreversing.com
▪ If you want to learn a bit more about PEB and navigate within its fields, a good reference follow:
https://processhacker.sourceforge.io/doc/struct___p_e_b.html.
▪ The _PEB_LDR_DATA structure, pointed by Ldr field from _PEB structure, is the representation of
a DLL module loaded in the process. Its internal composition has the following content according
to the IDA Pro, which readers can have access by repeating the same mentioned method: go to
Structure tab (SHIFT+F9) → press Insert → go to Add Standard Structure and search for
PEB_LDR_DATA structure (alternatively, you can check the same information, but presented in a
different format, on: https://www.nirsoft.net/kernel_struct/vista/PEB_LDR_DATA.html):
▪ Although readers probably already know meaning of these field, it’s worth to remember them here:
35 | P a g e
https://exploitreversing.com
o InLoadModuleList: it’s a double-linked list that organizes all modules (DLLs) in the the exact
order that they were loaded into a process on memory.
o InMemoryOrderList: it’s a double-linked list that organizes all modules (DLLs) in the order
that they appear on the process’s memory.
▪ All these fields from the _PEB_LDR_DATA structure are head of LDR_DATA_TABLE_ENTRY
structures (shown below), which are one represents a loaded DLL module:
▪ There’re several and quite interesting fields such as FullDllName (it holds the full path of DLL on
disk), BaseDllName (it holds the DLL name), LoadCount (contains the number of times this DLL was
loaded using LoadLibrary( )) and, finally, DllBase (contains the base address of the DLL).
▪ Therefore, the basic idea is: the code parses all modules loaded in the memory, gets the respective
DLL name, calculates the associated hash by using sub_B940AF subroutine, performs an XOR
operation with the given key (0x23FECA30) and compares each result with the calculated hash. If
there’s a match, so the DLL’s base address is returned.
▪ In fact, all hashing mechanisms have a similar modus-operandi and, basically, they changes only the
algorithm (obviously) and eventually have further logical operation as, in this case, an additional
step is done by performing a XOR instruction with an extra XOR key.
36 | P a g e
https://exploitreversing.com
Now that readers refreshed few important concepts, the altered code after having done a minimal work on
it follows:
37 | P a g e
https://exploitreversing.com
▪ Parses each letter (indexed by k) of the given name and calculates the hash by summing up three
operations: hash << 0x10, hash << 6 and (ptr_provided_name[k] – hash).
▪ Checks whether a letter of DLL name is in upper case and, if it’s, so change it to lower case before
continuing the interaction with each remaining letter.
▪ Finally, it returns the calculated hash.
Let’s go up two levels back to sub_B9BFF0 subroutine and move inside sub_B9B558 subroutine that,
supposedly has 4 arguments (of course, it doesn’t have):
▪ The XOR key is 0x32C9DB43, which it’s different from previous one used for DLL hashing.
38 | P a g e
https://exploitreversing.com
Before proceeding in our analysis, we need to add structures which will be necessary to improve our
reversing experience, so go to Structure view (SHIFT+F9) → Insert key → Add standard structure and add
the following ones:
▪ _IMAGE_DOS_HEADER
▪ _IMAGE_NT_HEADERS
▪ _IMAGE_EXPORT_DIRECTORY
▪ _IMAGE_FILE_HEADER (automatically loaded by the first three ones)
▪ _IMAGE_OPTIONAL_HEADERS32 (automatically loaded by the first three ones)
▪ _IMAGE_DATA_DIRECTORY (automatically loaded by the first three ones)
A good reference to Windows executable structure is available on:
https://github.com/corkami/pics/blob/master/binary/pe102/pe102.pdf.
First I’m going to show the final code and, afterwards, I’ll be explaining all necessary steps for that readers
are able to get to the same result:
39 | P a g e
https://exploitreversing.com
▪ If a4 argument changed its type, click on it, press Y hotkey and change it back to int a4.
▪ On the same a4 argument, rename it to dll_base_address.
▪ On line 16, press Y on v6 local variable and change its type to _IMAGE_EXPORT_DIRECTORY*.
▪ On line 16, rename v6 local variable to ptr_IMAGE_EXPORT_DIRECTORY.
▪ On line 19, rename v12 local variable to ptr_AddressOfFunctions.
▪ On line 20, rename v7 local variable to ptr_AddressOfNames.
▪ On line 21, rename v10 local variable to ptr_AddressOfNames (again).
▪ On line 22, rename v11 local variable to ptr_AddressOfNameOrdinals.
▪ On line 28, rename v5 local variable to counter.
▪ On line 29, as v4’s content is a pointer to the API name, so rename it to ptr_api_name.
▪ On line 38, rename sub_B9B384 to mw_w_api_hash_resolving because, basically, it’s the usage of
the recent defined routines.
▪ On line 25 is the calling of sub_B8B099, which is the actual hashing function to calculates the hash
value and almost identical to the respective DLL hashing function. Thus, rename it to
mw_api_resolving_algo, which is shown below:
▪ As readers can notice it, it’s the same algorithm of DLL hashing, but without having lines to
convert eventual upper case letters to lower case.
▪ Finally, let’s return to the sub_B9B558 subroutine and rename it to mw_api_hash_resolving.
We’ve done our quick analysis of subroutines directly related to DLL and API hashing, but our analysis so
far is only of a very small piece of the puzzle because, for example, we don’t have any API name yet.
There’re two ways to handle this issue:
▪ We can use a plugin like HashDB to help us and, no doubts, it will save time during our analysis.
▪ We could write our own script to handle all API hashes and find the associated names.
Using a plugin, mainly during working days, it’s the recommended approach. However, there’re small side
effects that, eventually, might be not suitable for you:
▪ You need an Internet connection to the HashDB plugin to communicate with OALabs servers and,
in critical premises, this access could be not available or allowed.
▪ HashDB could not have the wished algorithm for that particular sample / malware family and you
would need to write a script to manage API hash resolving anyway.
40 | P a g e
https://exploitreversing.com
In this article I’m going to continue using HashDB, but eventually I will show how doing your own script to
calculate and markup the idb file from IDA Pro in future articles.
Returning to sub_B8645E subroutine, we had the following:
▪ Go to Edit → Plugins → HashDB and set the XOR key (shown below).
41 | P a g e
https://exploitreversing.com
▪ Click on the Import button and wait few seconds until the hash importing task has been finished.
▪ If you go to Enumerations view (SHIFT + F10) you’ll see something similar to the following image:
▪ Put the cursor on sub_B9BFF0 subroutine, press “Y hotkey” and change the type of the last
argument, which is the API hash, to hashdb_strings_emotet (the name of the enumeration as
shown in the figure above):
▪ Press F5.
42 | P a g e
https://exploitreversing.com
[Figure 65] All API hashes resolved and respective subroutines renamed
44 | P a g e
https://exploitreversing.com
Certainly, the routine of marking up process, resolving all these API hashes and renaming functions,
though take a huge time, it makes the analysis much easier. Additionally, I searched for all necessary APIs
on MSDN and renamed all arguments for each of these resolved APIs to the correct name:
[Figure 66] All API hashes resolved and respective subroutines renamed
Following the same steps we took to extract and decode strings, let’s examine the content of the .data
section using CTRL+S hotkey and going to there:
45 | P a g e
https://exploitreversing.com
It’s interesting to notice that the data blob ends with double “\x00”, so we’re are going to use this fact
later.
Following the data cross-reference at the start of the .data section, we have lines of code from this
subroutine (sub_BA225A) as shown below:
[Figure 70] Further lines of code from sub_BA225A, which holds references to .data section
This is a typical code used for IP address formatting and the string “%u.%u.%u.%u” on line 51 confirms
that we’re right. Therefore, all bytes stored from 0x00ba4000 to 0x00ba4208 from .data section are
encrypted/encoded IP addresses.
I wrote a simple Python script, without using IDA Python or IDC instructions, to extract, decode and format
all IP addresses used as C2 by Emotet. This script assumes that encrypted IP address are stored at start of
the .data section and, just in case it changes, so it’s quite simple to adapt it.
[Figure 71] Script to extract, decrypt and formatting C2 IP Addresses (first part)
47 | P a g e
https://exploitreversing.com
48 | P a g e
https://exploitreversing.com
[Figure 72] Script to extract, decrypt and formatting C2 IP Addresses (second and last part)
[Figure 73] Extracted C2 IP addresses: exactly equal to Triage’s output (Figure 03 – page 03)
49 | P a g e
https://exploitreversing.com
5. Conclusion
This article follows the same educational path from first articles and the choice for the Emotet is due the
fact it offers interesting concepts and tasks such as extracting and decrypting strings and C2 IP addresses.
In the other side, analyzing the entire malware can take a significant time because of control flow
flattening obfuscation, but it isn’t hard. Probably we’ll return to this topic in the future when analyzing
similar malware samples.
My goal continue being to offer a review of malware analysis to make possible for that reverse engineers
can learn something new, have a sort of guideline to follow and source of research when and whether it’s
necessary. Of course, it isn’t a course about malware analysis, but I think it could be helpful by offering
something really applied and practical, which tries to explain taken decisions and how to proceed when
analyzing similar contexts.
I could have chosen a more complex malware sample, but it wasn’t the idea. If the final objective is writing
a series of articles explaining important concepts, strategies, techniques and approaches used during
malware analysis of different threats, so proposing hard samples wouldn’t help anyone and it would be
useless, in my opinion.
Probably this article will have errors, but it isn’t big deal. Soon I find them, I’ll release a new revision of this
document.
6. Acknowledgments
I’d like to publicly thank Ilfak Guilfanov (@ilfak) and Hex-Rays (@HexRaysSA) for supporting this project
by providing me with a personal license of the IDA Pro.
My gratitude is endless because certainly I couldn’t keep writing this series without a personal license (not
depending on corporate licenses). Honestly, I don’t have enough words to say how much I got happy in last
JAN/06/2022 when he replied my message and agreed with this project. As I promised him, I will keep
writing this series of articles for the next months and years.
Once again: thank you for everything, Ilfak.
Alexandre Borges
50 | P a g e