0% found this document useful (0 votes)
54 views50 pages

Mas 3

Malware Analysis part 3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views50 pages

Mas 3

Malware Analysis part 3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

https://exploitreversing.

com

Malware Analysis Series (MAS):


Article 3
by Alexandre Borges
release date: MAY/05/2022 | rev: A

1. Introduction
Welcome to the third article in the MAS (Malware Analysis Series). After two articles that, hopefully,
provided you with information for an initial foundation and motivation about malware analysis, so let’s
move forward to learn other interesting aspects of malicious Windows binaries from well-known samples,
which are available to download from public sandboxes.
While I’m not sure whether you’ve read or not the first two articles, you can get them from the following
links below:
▪ MAS_1: https://exploitreversing.com/2021/12/03/malware-analysis-series-mas-article-1/
▪ MAS_2: https://exploitreversing.com/2022/02/03/malware-analysis-series-mas-article-2/
I will not review all concepts presented in my last two articles and, if necessary, so I recommend reading
them when it’s possible. Of course, in practical terms and over the time, several techniques and
approaches already explained will be repeated over and over again to provide you with more experience
on the proposed topics.
I received several questions from professionals who have asked me about the purpose of this series, so it’s
time to make it clear: the purpose is to show several malware analysis techniques, approaches, contexts
and concepts associated with the topic, as already mentioned in previous articles.
On lab setup, readers could use the procedure of the lab setup and tools that I mentioned in the last two
article, and just if need be, so I’ll point out any tool that we haven’t used previously. Please, in case you
need it, I recommend that you read the previous articles in this series.
Anyway, before proceeding, it’s recommended to take a snapshot of your virtual machines and turn off any
network communication and shared folders. While we aren’t handling a ransomware case, avoid exposing
your virtual machines to the local network when analyzing malware samples. Additionally, I’ll be using
REMnux and Windows 8.1/10 (64 bits) to perform any analysis. Thus, if you have the configured lab
proposed in the last article, so you can re-use it.
Now we’re ready to start our analysis.
This time, we are analyzing this sample:
SHA 256: ed22dd68fd9923411084acc6dc9a2db1673a2aab14842a78329b4f5bb8453215
You can easily get it by using Malwoverview (https://github.com/alexandreborges/malwoverview) and
downloading it from Malware Bazaar as shown in the command below:
1|Page
https://exploitreversing.com

▪ malwoverview.py -b 5 -B
ed22dd68fd9923411084acc6dc9a2db1673a2aab14842a78329b4f5bb8453215

2. Gathering information
As usual, our first steps are collecting enough information about the given malware threat. There’re
several tools to accomplish this task, so let’s start by checking it against Virus Total:

[Figure 01] First evaluation of the malware sample against Virus Total using Malwoverview.
Great! Using the same Malwoverview, it’s quite simple to search for our sample on Triage and gather
further information as shows figures below:

[Figure 02] Determine the task ID on Triage using Malwoverview


2|Page
https://exploitreversing.com

[Figure 03] Summarized information collected from Triage by using Malwoverview

3|Page
https://exploitreversing.com

Finally, we can try Capa from Mandiant (https://github.com/mandiant/capa/releases/tag/v3.2.0), which


brings valuable information about the binary:

[Figure 04] First information and MITRE tactics presented by Capa

[Figure 05] Gathering malware’s capabilities information using Capa


4|Page
https://exploitreversing.com

So far we have the following important information:


▪ The binary seems to be Emotet.
▪ The botnet is Epoch 5.
▪ There’s a long list of C2 IP addresses (the listing above was truncated).
▪ This Emotet sample seems to be using Elliptic Curve Cryptography.
▪ The composition of EnumeratesProcess + WriteProcessMemory can indicate that malware is
looking for a target process to perform code injection (we need to confirm it later).
▪ There’s an indication of the presence of RC4 (symmetric algorithm) encrypting information in .data
section.
▪ The malware also enumerates PE sections.
There is an relevant point: part of the collected information so far might be associated to the packer itself,
so the next step is to understand whether the malware is packed or not to be able to confirm some of
these gathered facts.

3. Unpacking
Using Die tool (https://github.com/horsicq/Detect-It-Easy) to check further information on the sample, we
have the following points:
▪ The sample was compiled using MS Visual C++ 2005.
▪ It includes MFC library.
▪ Apparently there is a simple anti-debugging trick (IsDebugPresent( ))
▪ The entropy is high for .rsrc and .text sections, but it isn’t always a solid fact to confirm that the
sample is packed.
Using PE-Bear (https://github.com/hasherezade/bearparser) we are able to collect further information:
▪ It’s a 32-bit sample (from FileHdr tab).

▪ It’s a DLL sample (from FileHdr tab).

▪ It doesn’t statically load any DLL related to network communication using WinSock2, WinINet,
COM and so on. This fact might be strange because, unless it’s a wiper, common malware threats
usually establishes a network communication to its creator. Thus, either the malware loads
network API dynamically or it might be packed.

▪ The malware imports some resource-related APIs, which could indicate that resources could
contain some data configuration and other useful information. Some of these APIs are:

▪ FindResource
▪ FindResourceExA
▪ LoadResource
▪ SizeOfResource
▪ LockResource
5|Page
https://exploitreversing.com

▪ FreeResource

▪ The malicious binary exports two functions:

▪ DllRegisterServer
▪ DllUnregisterServer
Using Resource Hacker tool (http://www.angusj.com/resourcehacker/), we confirm that there is some
data within resources, but it might not have any relation to the real payload:

[Figure 06] Examining the resource content using Resource Hacker tool
So far, we aren’t sure whether the sample is packed or not, so we have to use debugger to confirm it.
Remember that it is a DLL, so we need to debug the rundll32.exe and provide, as argument, the DLL and
one of the exported functions, which is the DllRegisterServer( ) (function #1).
As I mentioned in the first article of this series, there’re many ways to unpack malware samples, which
some of them are semi-manual (using debuggers), automatic (pe-sieve and hollows_hunter) and even
completely manual through scripts.
Whatever be your choice, start up your virtual machine (Windows 8.1 or Windows 10), open up the
x32dbg (it’s a 32-bit DLL -- https://x64dbg.com/) and load the rundll32.dll
(C:\Windows\SysWOW64\rundll32.exe) for debugging. Go to File → Change Command Line and type a
similar line, providing the DLL and the first exported function (or its respective ordinal number):
▪ "C:\Windows\SysWOW64\rundll32.exe" C:\Users\Administrator\Desktop\MAS_3\mas_3.bin, #1
Press CTRL + F2 to reload the binary with the provided argument and, likely, the debugger had stopped on
the System Breakpoint. Play F5 once and you’ll have stopped one the Entry Point.
Before proceeding, double-check to be sure that the virtual machine doesn’t have any shared folder and
network communication with any system (internal or external to your lab). Typically I disable any
network interface.
6|Page
https://exploitreversing.com

Let’s set up breakpoints on the following functions: VirtualAlloc( ), WriteProcessMemory( ),


CreateProcessInternalW( ) and ResumeThread( )
The breakpoint on the VirtualAlloc( ) is going to be hit soon after the entry point, but take care: each
section will be copied into this allocated one by one. In other words, you won’t have the entire malware at
first hit, so pay attention to the addresses and, likely, there’re will be four or five hits in a row since the
start to “complete” the entire “unpacked binary” in the memory. Right click the dump area and pick up
“Follow in Memory Map”. Right-click the memory region and go to “Dump Memory to File”. Give a name
and save the unpacked sample.

[Figure 07] Unpacking and extracting the PE binary during a x32dbg session
Open up the dumped sample in PE-Bear and you’ll notice that sections headers are messed up:

[Figure 08] Messed up section headers


As I mentioned in the second article of this series, you can fix them by copying the same values from
Virtual Address to Raw Address (this is a mapped file and .text section starts at 0x1000), adjusting its sizes
to keep the same sizes in Raw Size and Virtual Size. If you don’t understand how to do the math, it’s very
simple:
7|Page
https://exploitreversing.com

▪ .rdata size - .text size == 23000 – 1000 == 22000, so fill .text size with this value.
▪ .data size - .rdata size == 24000 – 23000 == 1000, so fill .rdata size with this value.
▪ .reloc size - .data size == 26000 – 24000 == 2000, so fill .data size with this value.

[Figure 09] Fixing section headers using PE-Bear


The final result is not clean but it can be managed by resizing the binary using the button pointed below:

[Figure 10] Resizing the PE binary through PE-Bear


If you open it up again on PE-Bear you’ll have a clean binary in terms of section headers. Save the clean
binary.

[Figure 11] Section headers list


8|Page
https://exploitreversing.com

Another simpler approach to unpack the malware is through hollows_hunter tool, which there’re versions
to x86 and x64 (https://github.com/hasherezade/hollows_hunter). In this case, you should:
a. run hollows_hunter64 in loop to ensure to catch any implant on memory: hollows_hunter64 /loop
b. run the malware: rundll32 mas_3.bin,#1 (take care: the malware is going to remove itself, so keep
a backup of it)
Hollows_hunter64 will provide two DLLs almost of same size, but you should prefer the larger one. Of
course, you can observe the injected DLL in the the running regsvr32.exe process:

[Figure 12] Examining injected code on memory through Process Hacker


Listing strings and, afterwards, using a regex expression ( (?:(?:\d|[01]?\d\d|2[0-4]\d|25[0-
5])\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d|\d)(?:\/\d{1,2})? ) to search URLs on the memory of the potential
malicious process makes possible to discover good information:

[Figure 13] Hunting URLs and IP addresses through regex on Process Hacker
9|Page
https://exploitreversing.com

4. Reversing
As usual, let’s start our reversing session using IDA Pro 7.7.x and, just in case you don’t have this version,
you could follow the reversing session using IDA Home 7.7.x. (https://hex-rays.com/ida-home/). We’re
going to keep the focus on few objectives such as:
▪ Renaming variables and functions.
▪ Decrypting strings
▪ Extracting C2 data configuration
▪ Handling hashed functions
▪ Extracting eventual public keys
▪ Fixing calling conventions whether it’s necessary
▪ Creating C++ structures whether they are necessary and make our understanding easier.
Differently from last article, my intention is not to enter in deep details and I’ll try to keep this article short.
Some professionals asked about reasons why I don’t use dynamic analysis. Actually, it’s a matter of
personal preference for static analysis although I think dynamic analysis very useful and I also use it in
several stages such as:
1. Understanding a network protocol communication for writing an emulation script.
2. Confirming whether a given function of the malware works as I think it does.
3. Unpacking (in general).
4. Handling specific shellcode analysis.
5. Analyzing first stages in .NET format (next article).
At same way, in several consulting services, I usually extract valuable information by performing memory
analysis (Volatility) and gathering important indicators, information and artifacts such as:
a. Created services (persistence)
b. Network communication information (C2)
c. Code injected (evasion)
d. Hooked functions (evasion)
e. Detecting callbacks (rootkits)
f. Unpacked binary (unpacking)
g. Created/Changed Registry entries (persistence)
Certainly I could write a large section using memory analysis before starting the reversing phase, but this
article would become so big and, eventually, it’s a good opportunity to a near future.
During this section, we’ll use the same IDA plugins presented in the second article of Malware Analysis
Series (MAS), though there’re other good ones I’d like to show you in next articles of this series:
▪ Flare Capa Explorer: http://github.com/mandiant/capa.git
▪ ApplyCalleeType: https://github.com/mandiant/flare-ida
▪ StructTyper: https://github.com/mandiant/flare-ida
▪ HashDB: https://github.com/OALabs/hashdb-ida
▪ Findcrypt-yara: https://github.com/polymorf/findcrypt-yara.git

10 | P a g e
https://exploitreversing.com

Please, if you don’t know how to install all of these plugins, so read the second article of the this series
where I showed further details about how to do it.
Open up the unpacked binary on IDA Pro and go to View → Open Subviews → Type Libraries (SHIFT+F11
hotkey) and insert important libraries (INS hotkey) such as:
▪ mssdk_win7 (already inserted automatically)
▪ ntapi or ntapi_win7
▪ ntddk_win7
▪ vc10 (not always)
Although it is not necessary and doesn’t make different in this article, it’s always advisable to add some
signatures, which will help you in most of reversing cases, by going to View → Open Subviews →
Signatures (SHIFT+F5) and inserting (INS hotkey) few library modules such as:
▪ vc32rtf
▪ vc32ucrt
▪ vcseh
As we’re going to use decompiler, it’s also recommended to decompile the entire file first to avoid
misunderstandings while analyzing code. Thus, go to File → Produce File → Create C File (CTRL+F5) and
save the .c file in the same directory of the unpacked malware. The decompiling process take some
seconds to finish. Now open up a Pseudo Code window and setup if side by side with the Assembly View
window and synchronize it with the IDA View (right click → Synchronize with).
To collect contextualized information, go to Edit → Plugins → Flare Capa Explorer and starts the analysis
of our first findings, but this time against the assembly code:

[Figure 14] Evaluating malware capabilities through Flare Capa Explorer on IDA Pro
Unfortunately we didn’t get too much information, but we learned that:
▪ There’s a parsing of a PE header, which used for hashing functions and shellcode.
▪ There’re a possible Base64 manipulation.
▪ There’re some XOR operations.
▪ Finally, a subroutine (sub_B963F0) might be using a hashing algorithm named murmur3.
11 | P a g e
https://exploitreversing.com

Of course, the recommendation is to always check all information presented by Flare Capa Explorer, but
whether the malware is really using a hash function as murmur, so we know that:
▪ It’s a well know non-cryptographic hash function.
▪ Produces a 32-bit or 128-bit hash value.
▪ We’re able to find its implementation in several programming languages on the Internet.
There’re other weird points about this sample:
▪ IDA Pro only shows three strings (SHIFT+F12)
▪ There isn’t imported functions, so possibly all of them are resolved dynamically.
▪ There’re the native DLLEntryPoint( ) and only one user function exported: DllRegisterServer( )
As readers already know, strings usually offer a good guide along reversing tasks, but this time we don’t
have any one here. If we jump to DllRegisterServer( ), the first impression is not good because there’re
many XOR and ADD operations with hexadecimal numbers that, initially, we don’t have any clue about
what they are and do:

[Figure 15] Hexadecimal constant being manipulated (xor, add) against a structure

12 | P a g e
https://exploitreversing.com

A next issue is that, on decompiler, most of constants are represented in decimal format instead of having
them in hexadecimal format, as shown in sub_B91FD0 method:

[Figure 16] Constant represented as decimal instead of having them as hexadecimal


If it’s suitable, we can change decimal representation to hexadecimal representation by pressing H hotkey,
but it takes so much time to do it in each decimal found over the code, so it’d better go to Edit → Plugins
→ Hex-Rays Decompiler → Options and change the default radix from 0 to 16, as shown below:

[Figure 17] Change Decompiler Representation

13 | P a g e
https://exploitreversing.com

It’d recommended to produce a new C file again (File → Produce File → Create C File (CTRL+F5)) and, if
you still see decimal representation, so just refresh the pseudo code representation by pressing F5 hotkey.
Maybe you might think that there’re something very strange in Figure 16 and, indeed, there’s some
obfuscation techniques being deployed. In you want to have an overview about what’s happening, it’s
enough to get a graph (View → Open SubViews → Proximity Browser) to see “messed up” control flows:

[Figure 18] IDA Pro graph showing several decision branches


Unfortunately, Emotet have used state variables, which establishes the next piece of code to be executed.
Additionally, the technique used for the Emotet and represented in the graph above is known as Control
Flow Flattening (also known Code Flattening), which might be considered a sorted of state machine
controlled by one or many state variables . In very few and imprecise words, Control Flow Flattening
transforms a linear execution in a multi-branched execution. This technique is obviously used by many
packers and, mainly, by modern protectors that virtualize functions. As examples, obfuscators such as
Obfuscator-LLVM (https://github.com/obfuscator-llvm/obfuscator/wiki), malware like FIN7
(https://malpedia.caad.fkie.fraunhofer.de/actor/fin7) and Emotet use this technique.
A simple representation about Control Flow Flattening would be:

Entry Point

Dispatcher

block 1 block 2 block 3 block 3 block 5

[Figure 19] Control Flow Flattening representation


14 | P a g e
https://exploitreversing.com

As readers might have realized, depending on entry point (state variable) the dispatcher decides by
execution of a different block. The concept of Control-Flow Flattening technique is also used for protectors
that virtualize function’s code. If you remember of first article of this series (MAS), modern obfuscators
have some interesting characteristics:
a. They have special focus on 64-bit code (but they some of them also cover 32-bit code).
b. Not all instructions are virtualized.
c. Strings are encrypted (obviously)
d. Native instructions are translated to virtualized ones (RISC virtual machine instruction set).
e. DLLs and APIs are renamed or hashed.
f. Obfuscation is stack-based.
g. There’re fake push instructions.
h. They use code re-ordering.
i. There thousands of dead-code instructions.
j. They use Control Flow Flattening.
k. Virtualized code is polymorphic, so one native instruction can be translated to many different
virtualized instruction representations, where one or another could be used anytime.
l. There are usually critical context switch during the transition from native execution to virtualized
execution and vice versa.
The execution cycle is composed by fetching, decoding (translation from x86 to RISC context), dispatcher
(depending on the instruction a determined handler is executed) and handler (the implementation of the
virtual machine instruction set). Therefore, given a decoded instruction, the dispatcher decides which
handler will be executed:

RVA → RVA + process


base address and other
tasks. Opcodes from a custom
Instruction
instruction set.
Initialization
Fetch Instruction
decoder
Decode

Instructions are stored in DISPATCHER


an
encrypted format.

A B C D E F G H I

A, B, C, ... are handlers such as


handler_add, handler_sub, 3
handler_push...

[Figure 20] Virtualized Instruction Execution

15 | P a g e
https://exploitreversing.com

Only to supplement the previous explanation (it isn’t related to Emoted sample), in cases of malware using
virtualized instruction set, these instructions are usually stored in an array (encrypted form), and to
execute any virtualized instruction, an index is provided, which refers to array’s slot. So the instruction is
decrypted and the retrieved opcode points to a function pointer (handler) that’s is finally executed, as
shown below:

decrypted
vm_add vm_sub vm_xor vm_push vm_pop ... vm_n instructions

encrypted
encr_1 encr_2 encr_3 encr_5 encr_4 ... encr_n instructions

1 2 3 4 5 n-1 n indexes

recovering and
decrypting functions

opcode 1 function pointer 1 handler 1


opcode 2 function pointer 2 handler 2
opcode 3 function pointer 3 handler 3
opcode 4 function pointer 4 handler 4
opcode 5 function pointer 5 handler 5
opcode 6 function pointer 6 handler 6
opcode 7 function pointer 7 handler 7

function pointer table


(likely encrypted)

[Figure 21] Virtualized Instruction Execution – part 2


Of course, this topic is really fascinating, but it’s out of our scope to this article and there’re lots of complex
details involved in each showed concept. I’ve made an introductory presentation at DEF CON China (2019)
and, just in case readers have curiosity in examining slides, they are available on:
▪ abstract: https://www.defcon.org/html/dc-china-1/dc-cn-1-speakers.html#Borges
▪ slides: https://exploitreversing.files.wordpress.com/2021/12/defcon_china_alexandre-2.pdf
Returning to our Emotet code analysis, we also have Control Flow Flattening and the dispatcher is
represented as by a switch case construction and, depending on the state variable, a next block of code will
be chosen to be executed.
The next picture is the same function of Figure 15, but expanded and including further instructions,
where you can notice the mentioned state variable:

16 | P a g e
https://exploitreversing.com

[Figure 22] Emotet Control Flow Flattening (state variable)


Readers can have realized that v0 is the state variable, which is used in several lines of the code and,
depending on its value, different switch case instructions determine the next block of code (functions and
variable state attribution) to be executed.
Is it possible to improve the code representation? Yes, it’s. Nonetheless, in my opinion, maybe it isn’t
worth to invest so much time to analyze this malware sample and we’re able to proceed even handling this
ugly code. Of course, we’ll handle this scenario future articles.
Starting our analysis, we have only two calls inside DllRegisterServer function (exported):

17 | P a g e
https://exploitreversing.com

▪ sub_B91FD0
▪ sub_B8BA9C
Going inside the first one (sub_B91FD0), there’re many calls to subroutines and it is a large function.
Anyway, there’re some methods that could be interesting:
▪ sub_B9ACFF (called many times)
▪ sub_B8B9D7 (called many times)
▪ sub_B9D14C → sub_B84BB4 (called many times)
▪ sub_B86A8D
o sub_B9BFF0 (called many times)
o sub_B9B558 (PE parsing)
▪ sub_BA1AE9 (DLL related)
▪ sub_B9B558 (called many times)
▪ sub_B86A8D (called many times)
Please, I’d like to remember you that I’m showing real steps during a malware analysis because it’d very
practical (and non-natural) to go to the “right functions” without providing a reasonable and rational
explanation of taken decisions. Furthermore, I’m always focused on explaining how to accomplish the most
important reversing steps instead only showing you the final reversed function, so be patient, please.
Certainly I won’t reverse the entire malware sample in this article (not even close), but I hope I can show
few relevant steps that could help you in your studies. Don’t worry: this series (MAS -- Malware Analysis
Series) will be composed by many articles and we have enough time to discuss different concepts, analysis
and details related to reverse engineering and, mainly, malware analysis.
If reader are wondering how to get the number of cross references to each function call, so there two
obvious alternatives:
▪ readers can manually parse each subroutine call and get its cross-references (X hotkey).
▪ readers can write a script to do it automatically.

[Figure 23] Getting number of references using IDA Python/IDC


The result is shown below:

18 | P a g e
https://exploitreversing.com

[Figure 24] Result: number of references to each call instruction


Returning to our problem, few considerations follow below:
▪ At this first fast overview, I wasn’t concerned to examining methods being called from
sub_B91FD0 (a matter of restricted time).
▪ My first goal was finding methods being called many times (reasons follow below).
▪ I quickly inspected only one or other subroutine and, when I notice something useful, so I wrote
down.
▪ Obviously I lost many good functions and important details, but they don’t matter for now.
This slopy approach is usual when I start an analysis because I don’t know what expect for, but it could
takes to the next step, so pay attention to the following key facts:
▪ We have three strings (SHIFT+F12) throughout the sample (only two in .rdata and one in .data
sections), so it suggests that there’re one or more subroutines that perform string decryption.
▪ We don’t have any explicit DLL or function name in the code, so probably there’re one or more
subroutines responsible for accomplishing this task.
▪ Considering that malware threats usually have many strings, and one or more related subroutines
would be called to decode them, so likely these subroutines would be called several times.
▪ At same way, one or more methods would be called many times to decode the DLL name and API
name.
▪ Strings, DLLs and API are usually stored in somewhere inside of sections.
▪ Code involved with PE parsing might be an additional indicator of hashing.
19 | P a g e
https://exploitreversing.com

Before proceeding, you can list the available segments (sections) of the malware in IDA Pro by going to
View → Open Subviews → Segments or pressing SHIFT + F7 hotkey:

[Figure 25] Segments shown in the IDA Pro


The unpacked sample has only three sections and, curiously, it doesn’t have a .rsrc section, so possible
strings, DLL names and API functions are encoded and stored in one of these ones available. Let’s go to
sub_B9D14C (third one in the list on page 18) and check it inside:

[Figure 26] Examining the suspicious sub_B9D14C subroutine


The call on line 30 is interesting because it refers to dword_B814C8 global variable, which represents a
respective address. Checking this place we found out it’s located within the .text section, as shown below:

[Figure 27] Address represented by dword_b814C8 global variable


20 | P a g e
https://exploitreversing.com

That’s a good result because we’ve confirmed that some encrypted data related to string, API name or
DLL name is stored there (we don’t know what’s exactly) and, additionally, there’re many other cross data
references (DATA XREF) around the given address. If we expand our searching and look at start of the .text
section by listing segments (SHIFT + F7 or CTRL + S) and double-clicking .text section, we have the
following content shown in the figure below:

[Figure 28] Possible encrypted data at start of the .text section


That’s great! We just have learned that there’re more encrypted data (byte representation) at beginning
of .text section and associated data cross references (DATA XREF) to each one of these bytes. According
to our previous analysis, we’ve found that one of these references is sub_B84BB4 subroutine (line 30,
Figure 26), which has the following content as first instructions:

[Figure 29] Possible decrypter (sub_B84BB4 subroutine) of referenced data


On line 19 there’s an interesting instruction involving an XOR operation, which it’s a good indication we’re
handling the “decrypting” subroutine.
21 | P a g e
https://exploitreversing.com

Before renaming variables and methods, we have the following context from line 18 onward:
▪ v3 seems to be an array of bytes.
▪ On line 18, (char *)(v3 + 2) points 8 positions ahead. This value is associated to v4. Additionally,
the cast to (char *) is our strong indication that v4 represents the decrypted string.
▪ One line 19, the first four bytes are XOR’d with the next four bytes (*v3 ^ *v3[1]), and stored into
v5.
▪ Notice that, on line 20, v15 is set with v3 content, so *v15 = *v3.
▪ If you examine the remaining of the subroutine below, v12 is set to v4’s content, so *v12 = *v4.
▪ On line 39, v15 (holding v3 content) is XORed with v12 (holding the v4 content).Therefore, so far
v3 (first 4 bytes) seems to be the key and v12 (v4) seems to be the encrypted content.
▪ What’s the encrypted data’s length? Probably it’s the *v3[1], but the real value is hidden under a
XOR operation (line 19), so we have to execute this XOR operation before getting the real length.

[Figure 30] Sub_B84BB4 subroutine of referenced data (second part)


Therefore, at end, we have:
▪ The decrypted stuff is given by: *v3 ^ *(v3 + 2), where *(v3 + 2)’s size is given by (*v3 ^
*v3[1]).
▪ Another good hint that the operation (*v3 ^ *v3[1]) is probably the wished length is provided
by the line 22 (v6 = v5 + 1) from Figure 29, where the operation is adding one because the size
of the end of the string (‘\x00’).
▪ The data format is: [xor key] (4 bytes) + [xored string’s length] + [encrypted string], where the
actual (plain text) string length is given by (*v3 ^ *v3[1]).
Based on this interpretation, we can write a simple script in Python 3 to try to emulate exactly these steps.
Additionally, in the second part of this script, once we got the decrypted information (likely strings), we can
make comments within IDA idb file using the result as content of such comments. Summarizing our next
steps, it’s necessary to:
▪ Read the encrypted data from file.
▪ Create a variable holding the first dword (key).
▪ Create a variable holding the second dword (xored string’s length).
22 | P a g e
https://exploitreversing.com

▪ Perform a XOR operation between key and the resulting xored string length. It will be the plain text
string’s length.
▪ Use the key to decode the encrypted data from byte 8 onward.
▪ At a second moment, alter the script to create comments next to referring instructions.
Once again, readers can use any development program or environment to write their Python scripts and
one of available options would be to use Jupyter notebook to make drafts while programming because it
offers good debugging messages and support, which are useful mainly at this drafts. To install and use it,
execute the following steps:
1. pip install jupyterlab
2. execute: jupyter-lab
3. Choose Python 3 Notebook (right side)
4. Rename the document (left side)
As I’m going to use IDC/IDA Python functions, so I will be using the own IDA Pro script environment
available in File → Script Command (SHIFT+F2). The following script is well-commented, but I’ll leave some
additional comments after it:

[Figure 31] Script to decrypt strings (first part)

23 | P a g e
https://exploitreversing.com

[Figure 32] Script to decrypt strings (second part)

24 | P a g e
https://exploitreversing.com

[Figure 33] Script to decrypt strings (third and last part)


The content of the IDA’s Output window is the following one:

25 | P a g e
https://exploitreversing.com

[Figure 34] Decrypted strings

26 | P a g e
https://exploitreversing.com

An educational experience can be done here. I commented the line 150 (fix_operand(xref.to, final_string))
of our script (Figure 33).
If readers to run the script, you will have the following piece of code including strings used as comment
next to instructions and the result will be similar to the visualized below:

[Figure 35] Code commented using decrypted strings


As readers might notice, data references (dword_<address>) haven’t been renamed and only comments
were created next to respective references, as expected. This is a welcome approach because readers are
able to see all decrypted strings in the Disassembly view without needing to change instruction operands
in the idb database. In the other hand, we don’t have the same comment on the pseudocode’s view,
which could be an issue for some demanding professionals.
Uncommenting the line 150 (as in the original script in Figure 33), the result is a bit different in IDA View
and Pseudocode View, as shown below:
27 | P a g e
https://exploitreversing.com

[Figure 36] Code commented with decrypted strings and data references renamed
As readers are able to notice, this time we can see data references renamed using the name of decrypted
strings and, additionally, we kept all comments. Of course, we don’t need both ones, but I left them here
to show you the final effect. If you return to the .text section for any of these strings (for example,
urlmon.dll), you’ll see the following:

[Figure 37] Renamed data reference in .text section


Finally, and maybe the most important, the pseudocode view presents the following instructions:

28 | P a g e
https://exploitreversing.com

[Figure 38] Renamed data references in pseudocode view


The pseudo code above shows the expected result, which includes all strings as part of instructions.
At end of the script (Figure 33), I inserted a comment explaining that readers could comment the entire
while loop block (line 133 to 153) and uncomment the line 157 for being able to decrypt only string for
testing purposes.
Now readers have seen the entire script and respective results, I’d like to make few considerations about
the IDC/IDA Python script:
▪ I used IDC functions because in many opportunities it makes our work much easier, so both idautils
and idc libraries were imported (lines 4 and 5).

▪ We could use byte arrays and, in this case, I made an option to keep everything as string to keep
the code simple.

▪ If reader doesn’t know about the repr( ) function on line 22, which is a Python built-in function, so
search about it on: https://docs.python.org/3/library/functions.html#repr

▪ The data extraction code (lines 27 to 32) is exactly the same from second article of this series, but
it was adapted to extract data from .text section.

▪ On lines 85 and 87 the scripts uses struct.unpack( ). Python struct is a powerful resource to
interpret bytes as packed binary data and it’s able to do this interpretation according to the byte
order (little-endian, big-endian or even native). You can read a bit more about Python structs and
learn from examples on https://docs.python.org/3/library/struct.html.
29 | P a g e
https://exploitreversing.com

▪ On line 101, the decrypter( ) routine is called and, once the result is returned, all single quotes are
removed and, soon after it, any sequence “\r\n” is converted to “\n”.

▪ On line 102, Unicode characters were removed because we don’t understand them and, for this
specific purpose, they won’t be useful.

▪ From line to 115 to 121, we have the routine used to patch the binary and change the data
reference names to the a name represented by the decrypted string. The sequence should be clear:
we converted decrypted strings to byte representation, appended the “\x00” to the end of the
sequence of bytes to get a well-formed string, patched the provided address with each letter of
the decrypted strings and, finally, we created a new string using a IDC function named strlit(long
ea, long len ). Note that we could have specified the length of the string, but we chose using a
string delimitator: https://hex-rays.com/products/ida/support/idadoc/207.shtml.

▪ On line 134, idautils.XrefsTo( ) function is used to get all references to the address of the given
encrypted string, so we are able to get all instructions’ addresses referring to the encrypted string:
https://hex-rays.com/products/ida/support/idapython_docs/idautils.html#idautils.XrefsTo

▪ On lines 138 and 139, it’s suitable to highlight that xref.to provides the address of the encrypted
string and xref.frm provides the address of the instruction referring to the encrypted string.

▪ One line 146, the set_cmt( ) function is used to set an indented comment: https://hex-
rays.com/products/ida/support/idadoc/204.shtml

▪ On line 150 the script call fix_operand( ) that is responsible for patching the idb database by
replacing the string reference by the string itself.

▪ Finally, on line 157, the script offers the option to decrypt only one given encrypted string, but it’s
necessary to comment out the whole while loop between lines 133 and 153.

▪ Readers are able to get both start and end addresses of encrypted strings by examining the .text
section (CTRL+S) as shown below:

[Figure 39] Getting start and end addresses of the encrypted strings

30 | P a g e
https://exploitreversing.com

Now we have decrypted strings, let’s move forward. Return to the beginning of the malware, which is the
exported DllRegisterServer( ), and go to sub_B91FD0( ) subroutine, which is effectively the first one to be
called within DllRegisterServer( ), as shown below:

[Figure 40] First function to be called in DllRegisterServer( ) (exported by malware)


Going inside the given routine (sub_B91FD0()), we have the following:

[Figure 41] Beginning of sub_B91FD0 subroutine


On line 53 we found a call to sub_B9EBA2( ) subroutine, which has the content below:

[Figure 42] Sub_B9EBA2 subroutine


There’s nothing so attractive, except by few non-sense values. Thus, let’s carry on and go into the first
subroutine (sub_B9EAA3) and next to sub_B8645E( ), where we will find such sequence of code:

31 | P a g e
https://exploitreversing.com

[Figure 43] Sub_B9EAA3 subroutine

[Figure 44] Sub_B8645E subroutine


From Figure 42, we didn’t find anything relevant again, and we have two calls: sub_B8645E( ) and
sub_B91B22( ), both including some strange hexadecimal numbers. If we examine the content of
sub_B8645E( ), we’re going to discover the content of Figure 44 and things starts to be interesting, so we
can do first considerations:
▪ Initially it seems we have a C++ function call, but soon below the cdecl calling convention is being
used on the sub_B9BFF0( ) function call.

▪ The sub_B9BFF0( ) subroutine call (line 5) has several arguments, where the last one seems to be a
hash and, usually, this is expected when analyzing malware samples with obfuscation techniques.

▪ Returning a value/string to a local variable (v1) is an indication that there’s something related to
hash resolution (DLL or API hashing) and, as readers are going to see in this case, an API hashing
name resolution.

▪ Finally, it seems that v1 is contains the name of a function (API) because on line 6 the v1’s content
is used as the name of the called function, which includes several (and fake) arguments.

▪ Reading all 7 lines, the general idea is that the function on line 1 is a wrapper/proxy, where first an
API name is resolved for a given hash and, after being resolved, it’s called. As this wrapper
function on line doesn’t have any useful arguments, so the calling on line 6 doesn’t have any
concrete argument neither.

▪ Readers can easily to confirm that v1( ) call (line 6) doesn’t have arguments by checking the
assembly code and, as you’re able to see, between sub_B9BFF0( ) on line 5 and this one on line 06,
there’s only a stack adjustment:

[Figure 45] No arguments for sub_B8645E (call eax)

32 | P a g e
https://exploitreversing.com

In any malware analysis case where there’s API hash resolution, the responsible routine is called many
times (once for each function hash), so the obvious step it to check how many time sub_B9BFF0( )
subroutine is called (X hotkey):

[Figure 46] Cross-references to sub_B9BFF0( ) subroutine


There’re 109 cross-references to it, so it seems a promising subroutine. Stepping into it, we have:

[Figure 47] sub_B9BFF0( ) subroutine


In this subroutine we have the following artifacts:

33 | P a g e
https://exploitreversing.com

▪ a possible array (lines 6, 9 and 11), which it seems being used to hold API’s names.
▪ a call to sub_BA1AE9, which the v4 argument comes from stack (ecx).
▪ a call to sub_B9B558 using v5 local variable (returned from sub_BA1AE9) as argument and a4
argument that, according to Figure 44 (sub_B8645E subroutine) is an hexadecimal and possible an
API hash.
Going into sub_BA1AE9( ) subroutine we see:

[Figure 47] sub_BA1AE9( ) subroutine


Wow! We found the subroutine responsible for DLL hashing resolving, which after finding the right DLL
name, it returns its respective address. Thus, some eventual considerations follows below:
▪ On line 6, the sub_B9AA52 subroutine gets the PEB (Process Environment Block) and it’s trivial to
understand it because of NtCurrentPeb( ) call (instruction mov eax, large fs:30h at address
0x00B9AA52).

▪ At same line 6, the _PEB struct has a field named Ldr (offset 0xC), which is a pointer to
PEB_LDR_DATA structure (https://www.nirsoft.net/kernel_struct/vista/PEB.html):

[Figure 48] _PEB structure

▪ You can see the same _PEB structure on IDA Pro by going to Structure tab (SHIFT+F9), pressing
INSERT key, clicking on Add Standard Structure, searching for _PEB structure and adding it:

34 | P a g e
https://exploitreversing.com

[Figure 49] _PEB structure (from IDA Pro)

▪ If you want to learn a bit more about PEB and navigate within its fields, a good reference follow:
https://processhacker.sourceforge.io/doc/struct___p_e_b.html.

▪ The _PEB_LDR_DATA structure, pointed by Ldr field from _PEB structure, is the representation of
a DLL module loaded in the process. Its internal composition has the following content according
to the IDA Pro, which readers can have access by repeating the same mentioned method: go to
Structure tab (SHIFT+F9) → press Insert → go to Add Standard Structure and search for
PEB_LDR_DATA structure (alternatively, you can check the same information, but presented in a
different format, on: https://www.nirsoft.net/kernel_struct/vista/PEB_LDR_DATA.html):

[Figure 50] PEB_LDR_DATA structure (from IDA Pro)

▪ The InLoadOrderModuleList points to a _LIST_ENTRY structure, which represents a double linked


list, as shown below (extracted from IDA Pro -- Structure tab). Of course,
InMemoryOrderModuleList and InInitializationOrderModuleList has the same representation:

[Figure 51] _LIST_ENTRY structure (from IDA Pro)

▪ Although readers probably already know meaning of these field, it’s worth to remember them here:
35 | P a g e
https://exploitreversing.com

o InLoadModuleList: it’s a double-linked list that organizes all modules (DLLs) in the the exact
order that they were loaded into a process on memory.

o InMemoryOrderList: it’s a double-linked list that organizes all modules (DLLs) in the order
that they appear on the process’s memory.

o InInitializationOrderModuleList: it’s a double-linked list that organizes all modules (DLLs) in


the order that they were initialized.

▪ All these fields from the _PEB_LDR_DATA structure are head of LDR_DATA_TABLE_ENTRY
structures (shown below), which are one represents a loaded DLL module:

[Figure 52] _PEB_LDR_DATA structure (from IDA Pro)

▪ There’re several and quite interesting fields such as FullDllName (it holds the full path of DLL on
disk), BaseDllName (it holds the DLL name), LoadCount (contains the number of times this DLL was
loaded using LoadLibrary( )) and, finally, DllBase (contains the base address of the DLL).

▪ Therefore, the basic idea is: the code parses all modules loaded in the memory, gets the respective
DLL name, calculates the associated hash by using sub_B940AF subroutine, performs an XOR
operation with the given key (0x23FECA30) and compares each result with the calculated hash. If
there’s a match, so the DLL’s base address is returned.

▪ In fact, all hashing mechanisms have a similar modus-operandi and, basically, they changes only the
algorithm (obviously) and eventually have further logical operation as, in this case, an additional
step is done by performing a XOR instruction with an extra XOR key.
36 | P a g e
https://exploitreversing.com

Now that readers refreshed few important concepts, the altered code after having done a minimal work on
it follows:

[Figure 53] _PEB_LDR_DATA structure (from IDA Pro)


Surprisingly, I made few changes in the code above whether compared to Figure 47:
▪ In the Structure view (SHIFT+F9), I inserted the LDR_DATA_TABLE_ENTRY structure.
▪ I renamed (N hotkey) sub_A1AE9 subroutine to mw_dll_hashing.
▪ I renamed (N hotkey) “i” local variable to ptr_dll_representation (you can give whatever name you
want).
▪ (trick) I changed ptr_dll_representation type (using Y hotkey) to LDR_DATA_TABLE_ENTRY*.
▪ I renamed (N hotkey) sub_B940AF subroutine to mw_dll_hashing_algo.
▪ Press F5 to “recompile” the code and update the pseudo code.
▪ Observe that the DLL base address is returned on line 16.
As readers can notice, it wasn’t hard. The code confirms the previous explanation about how it works and,
much better, including field names certainly makes the understanding easier.
The sub_B940AF subroutine (renamed to mw_dll_hashing_algo), which is used to DLL hashing, it’s quite
simple as readers can see below:

[Figure 54] sub_B940AF subroutine


The algorithm above performs the following operations:
▪ receives a pointer to the DLL name.

37 | P a g e
https://exploitreversing.com

▪ Parses each letter (indexed by k) of the given name and calculates the hash by summing up three
operations: hash << 0x10, hash << 6 and (ptr_provided_name[k] – hash).
▪ Checks whether a letter of DLL name is in upper case and, if it’s, so change it to lower case before
continuing the interaction with each remaining letter.
▪ Finally, it returns the calculated hash.
Let’s go up two levels back to sub_B9BFF0 subroutine and move inside sub_B9B558 subroutine that,
supposedly has 4 arguments (of course, it doesn’t have):

[Figure 55] sub_B9B558 subroutine


No doubts, this routine is very similar to the previous one about DLL hashing, but it’s used to API hashing.
Clearly there’s a PE parsing operation happening in the routine and, if you reader to pay attention,
there’re first few relevant facts:
▪ a similar operation of “hash comparison”, as we seen on Figure 53 for DLL, it’s happening on line
23 and comparing the found hash against the provided one (a2), which comes from a4 in the parent
subroutine (sub_B9B558) and, finally, comes from a4 from subroutine sub_B9BFF0 that has the
hash 0x76FC34E6 as its fourth argument (Figure 44).

▪ The XOR key is 0x32C9DB43, which it’s different from previous one used for DLL hashing.

▪ The possible subroutine handling the actual hashing operation is sub_B8B099.

38 | P a g e
https://exploitreversing.com

Before proceeding in our analysis, we need to add structures which will be necessary to improve our
reversing experience, so go to Structure view (SHIFT+F9) → Insert key → Add standard structure and add
the following ones:
▪ _IMAGE_DOS_HEADER
▪ _IMAGE_NT_HEADERS
▪ _IMAGE_EXPORT_DIRECTORY
▪ _IMAGE_FILE_HEADER (automatically loaded by the first three ones)
▪ _IMAGE_OPTIONAL_HEADERS32 (automatically loaded by the first three ones)
▪ _IMAGE_DATA_DIRECTORY (automatically loaded by the first three ones)
A good reference to Windows executable structure is available on:
https://github.com/corkami/pics/blob/master/binary/pe102/pe102.pdf.
First I’m going to show the final code and, afterwards, I’ll be explaining all necessary steps for that readers
are able to get to the same result:

[Figure 56] sub_B9B558 subroutine after some reversing actions


Due explanations follow (pay attention that line numbers might not be the same):
▪ Click on v13 variable, press N hotkey and rename it to ptr_IMAGE_NT_HEADERS. Afterwards, press
Y hotkey and change its type to _IMAGE_NT_HEADERS* instead of int.
▪ On 0x3C (line 15), press T hotkey and choose _IMAGE_DOS_HEADER.

39 | P a g e
https://exploitreversing.com

▪ If a4 argument changed its type, click on it, press Y hotkey and change it back to int a4.
▪ On the same a4 argument, rename it to dll_base_address.
▪ On line 16, press Y on v6 local variable and change its type to _IMAGE_EXPORT_DIRECTORY*.
▪ On line 16, rename v6 local variable to ptr_IMAGE_EXPORT_DIRECTORY.
▪ On line 19, rename v12 local variable to ptr_AddressOfFunctions.
▪ On line 20, rename v7 local variable to ptr_AddressOfNames.
▪ On line 21, rename v10 local variable to ptr_AddressOfNames (again).
▪ On line 22, rename v11 local variable to ptr_AddressOfNameOrdinals.
▪ On line 28, rename v5 local variable to counter.
▪ On line 29, as v4’s content is a pointer to the API name, so rename it to ptr_api_name.
▪ On line 38, rename sub_B9B384 to mw_w_api_hash_resolving because, basically, it’s the usage of
the recent defined routines.
▪ On line 25 is the calling of sub_B8B099, which is the actual hashing function to calculates the hash
value and almost identical to the respective DLL hashing function. Thus, rename it to
mw_api_resolving_algo, which is shown below:

[Figure 57] Subroutine containing the API hashing algorithm

▪ As readers can notice it, it’s the same algorithm of DLL hashing, but without having lines to
convert eventual upper case letters to lower case.
▪ Finally, let’s return to the sub_B9B558 subroutine and rename it to mw_api_hash_resolving.
We’ve done our quick analysis of subroutines directly related to DLL and API hashing, but our analysis so
far is only of a very small piece of the puzzle because, for example, we don’t have any API name yet.
There’re two ways to handle this issue:
▪ We can use a plugin like HashDB to help us and, no doubts, it will save time during our analysis.
▪ We could write our own script to handle all API hashes and find the associated names.
Using a plugin, mainly during working days, it’s the recommended approach. However, there’re small side
effects that, eventually, might be not suitable for you:
▪ You need an Internet connection to the HashDB plugin to communicate with OALabs servers and,
in critical premises, this access could be not available or allowed.

▪ HashDB could not have the wished algorithm for that particular sample / malware family and you
would need to write a script to manage API hash resolving anyway.

40 | P a g e
https://exploitreversing.com

In this article I’m going to continue using HashDB, but eventually I will show how doing your own script to
calculate and markup the idb file from IDA Pro in future articles.
Returning to sub_B8645E subroutine, we had the following:

[Figure 58] sub_B8645E subroutine


We already know that:
▪ The prototype at the first line is a stub/proxy used for passing useful arguments when required.
▪ The first argument of the subroutine is a sort of index used to locate the API in the “API table”.
▪ The last argument is the API hash.
▪ On line 7, the resolved API (from line 6) doesn’t have any real argument and all supposed
arguments are garbage (in this specific function in the image above).
▪ As we explained on page 38, the XOR key for decrypting API hashes is 0x32C9DB43.
Therefore, before proceeding the decryption, we must to set the XOR key on HashDB, so readers has two
options:
▪ Return to sub_B9B558 subroutine (Figure 54), right click the XOR value and choose “HashDB set
XOR key”.

▪ Go to Edit → Plugins → HashDB and set the XOR key (shown below).

[Figure 59] sub_B8645E subroutine


After having set up the XOR key (take care: not always exist a XOR key), you should do the following:

41 | P a g e
https://exploitreversing.com

▪ Right-click the API hash and choose HashDB Hunting Algorithm.


▪ Probably one or two algorithms will be returned, but as we know that is an Emotet sample, so mark
“emotet”.
▪ Right click the API hash again and choose HashDB Lookup.
▪ You’ll see the following window and choose kernel32:

[Figure 60] HashDB Bulk Import

▪ Click on the Import button and wait few seconds until the hash importing task has been finished.
▪ If you go to Enumerations view (SHIFT + F10) you’ll see something similar to the following image:

[Figure 61] Part of an enumeration created by HashDB

▪ Put the cursor on sub_B9BFF0 subroutine, press “Y hotkey” and change the type of the last
argument, which is the API hash, to hashdb_strings_emotet (the name of the enumeration as
shown in the figure above):

o int __cdecl sub_B9BFF0(int, int, int, hashdb_strings_emotet)

▪ Press F5.

42 | P a g e
https://exploitreversing.com

You should see the following result:

[Figure 62] API hash resolved to GetProcessHeap_0


Finally:
▪ rename v1 to GetProcessHeap.
▪ rename sub_B8645E to mw_GetProcessHeap.
▪ rename the API hash resolving routine (sub_B9BFF0) to mw_api_resolving.
You you have something like:

[Figure 63] API hash resolved and renamed


That’s ok! The problem is that there’re many other API hashes in the code and we need to apply the same
procedure for all of them. Keep the cursor on mw_api_resolving and press X to list all references:

[Figure 64] References to mw_api_resolving subroutine


Yes, there’re 109 references, unfortunately. After you have resolved all API hashes, you’ll have:
43 | P a g e
https://exploitreversing.com

[Figure 65] All API hashes resolved and respective subroutines renamed

44 | P a g e
https://exploitreversing.com

Certainly, the routine of marking up process, resolving all these API hashes and renaming functions,
though take a huge time, it makes the analysis much easier. Additionally, I searched for all necessary APIs
on MSDN and renamed all arguments for each of these resolved APIs to the correct name:

[Figure 66] All API hashes resolved and respective subroutines renamed
Following the same steps we took to extract and decode strings, let’s examine the content of the .data
section using CTRL+S hotkey and going to there:

[Figure 67] Content of .data section

45 | P a g e
https://exploitreversing.com

It’s interesting to notice that the data blob ends with double “\x00”, so we’re are going to use this fact
later.
Following the data cross-reference at the start of the .data section, we have lines of code from this
subroutine (sub_BA225A) as shown below:

[Figure 68] Code from sub_BA225A referring to .data section’s bytes


If you examine the content of sub_B9ACFF( ), you’ll find the following code:

[Figure 69] First lines of sub_B9ACFF subroutine


That’s exactly the same decrypting routine used for decoding strings, but in this case bytes that are stored
within the .data section aren’t strings. Analyzing additional lines of code from previous subroutine
(sub_BA225A), we have:
46 | P a g e
https://exploitreversing.com

[Figure 70] Further lines of code from sub_BA225A, which holds references to .data section
This is a typical code used for IP address formatting and the string “%u.%u.%u.%u” on line 51 confirms
that we’re right. Therefore, all bytes stored from 0x00ba4000 to 0x00ba4208 from .data section are
encrypted/encoded IP addresses.
I wrote a simple Python script, without using IDA Python or IDC instructions, to extract, decode and format
all IP addresses used as C2 by Emotet. This script assumes that encrypted IP address are stored at start of
the .data section and, just in case it changes, so it’s quite simple to adapt it.

[Figure 71] Script to extract, decrypt and formatting C2 IP Addresses (first part)

47 | P a g e
https://exploitreversing.com

48 | P a g e
https://exploitreversing.com

[Figure 72] Script to extract, decrypt and formatting C2 IP Addresses (second and last part)

[Figure 73] Extracted C2 IP addresses: exactly equal to Triage’s output (Figure 03 – page 03)

49 | P a g e
https://exploitreversing.com

5. Conclusion
This article follows the same educational path from first articles and the choice for the Emotet is due the
fact it offers interesting concepts and tasks such as extracting and decrypting strings and C2 IP addresses.
In the other side, analyzing the entire malware can take a significant time because of control flow
flattening obfuscation, but it isn’t hard. Probably we’ll return to this topic in the future when analyzing
similar malware samples.
My goal continue being to offer a review of malware analysis to make possible for that reverse engineers
can learn something new, have a sort of guideline to follow and source of research when and whether it’s
necessary. Of course, it isn’t a course about malware analysis, but I think it could be helpful by offering
something really applied and practical, which tries to explain taken decisions and how to proceed when
analyzing similar contexts.
I could have chosen a more complex malware sample, but it wasn’t the idea. If the final objective is writing
a series of articles explaining important concepts, strategies, techniques and approaches used during
malware analysis of different threats, so proposing hard samples wouldn’t help anyone and it would be
useless, in my opinion.
Probably this article will have errors, but it isn’t big deal. Soon I find them, I’ll release a new revision of this
document.

6. Acknowledgments
I’d like to publicly thank Ilfak Guilfanov (@ilfak) and Hex-Rays (@HexRaysSA) for supporting this project
by providing me with a personal license of the IDA Pro.
My gratitude is endless because certainly I couldn’t keep writing this series without a personal license (not
depending on corporate licenses). Honestly, I don’t have enough words to say how much I got happy in last
JAN/06/2022 when he replied my message and agreed with this project. As I promised him, I will keep
writing this series of articles for the next months and years.
Once again: thank you for everything, Ilfak.

Just in case you want to keep in touch:


▪ Twitter: @ale_sp_brazil
▪ Blog: https://exploitreversing.com

Keep reversing and I see you at next time!

Alexandre Borges

50 | P a g e

You might also like