MALWARE ANALYSIS
Dr. Faisal Bashir Hussain
MALWARE - DEFINITION
Malware is any soft ware intentionally designed to:
• disruption to a computer, server, client, or computer network
• leak private information
• gain unauthorized access to information or systems
• deprive access to information
• unknowingly interferes with the user's computer security and privacy.
2
MALWARE - TYPES
Type Function Example
Ransomware Disables victim's access to data until ransom is paid RYUK
Fileless Malware Makes changes to files that are native to the OS Astaroth
Spyware Collects user activity data without their knowledge DarkHotel
Adware Serves unwanted advertisements Fireball
Trojans Disguises itself as desirable code Emotet
Worms Spreads through a network by replicating itself Stuxnet
Rootkits Gives hackers remote control of a victim's device Zacinlo
Keyloggers Monitors users' keystrokes Olympic Vision
Bots Launches a broad flood of attacks Echobot
Mobile Malware Infects mobile devices Triada
Wiper Malware Erases user data beyond recoverability. WhisperGate
3
IMPORTANCE OF MALWARE ANALYSIS
To assess damage.
To discover indicators of compromise.
To analyze the purpose of the intruder.
To assess its way of intrusion and spread.
To understand its functionality
4
MALWARE ANALYSIS
Malware landscape is diverse and constant evolving
• Large botnets
• Diverse propagation vectors, exploits, C&C
• Capabilities – backdoor, keylogging, rootkits,
• Logic bombs, time-bombs
• Diverse targets: desktops, mobile platforms, SCADA systems (Stuxnet)
Manual reverse-engineering is close to impossible
• Need automated techniques to extract system logic, interactions and side-effects, derive
intent, and devise mitigating strategies.
5
SIGNATURE BASED MALWARE ANALYSIS
A signature is a specific pattern that allows cybersecurity technologies to
recognize malicious threats, such as a byte sequence in network traffic or known
malicious instruction sequences used by families of malware.
Signatures are generated against known malware samples and are used later for
checking the maliciousness of a new file.
String Based Signatures Code Pattern Based Signatures
MALWARE ANALYSIS TECHNIQUES
Static Analysis
Dynamic Analysis
7
STATIC MALWARE ANALYSIS
Analysis of malware performed without actually executing the code
Analysis can be performed on any platform because you are not intending to run
the malware which may be platform specific (e. g., a Win32 executable)
Reverse-engineering with a disassembler
Complex, requires understanding of assembly code
8
STATIC MALWARE ANALYSIS – TRIVIAL STEPS
Check Using some anti virus
• Malware can easily change its signature and fool the antivirus
• VirusTotal is convenient, but using it may alert attackers that they’ve been caught
Find and match Hash
• Label a malware file
• Share the hash with other analysts to identify malware
• Search the hash online to see if someone else has already identified the file
• Fuzzy Hash can also be calculated and compared with the data base of fuzzy
hashes of known malwares
9
STATIC MALWARE ANALYSIS – DETAILED INVESTIGATION
Analyze the malware file to collect evidences
An important Question:
• What type of file is this:
• Windows executable (PE)
• Linux ELF
• Android apk
• Others
10
STATIC MALWARE ANALYSIS – DETAILED INVESTIGATION (PE FILES)
Headers:
• IMAGE_DOS_HEADER (contains signature “MZ”)
• IMAGE_FILE_HEADER (contains signature “PE”)
• IMAGE_OPTIONAL_HEADER
• IMAGE_SECTION_HEADER .text
• IMAGE_SECTION_HEADER .rdata
• IMAGE_SECTION_HEADER .data
• IMAGE_SECTION_HEADER .tls
• IMAGE_SECTION_HEADER .rsrc
• IMAGE_SECTION_HEADER .reloc
• IMAGE_SECTION_HEADER .debug
Secti ons:
• SECTION .text (holds program code)
• SECTION .rdata (holds IMPORTs and EXPORTs)
• SECTION .data (holds global variables)
• SECTION .tls (holds info about thread local storage)
• SECTION .rsrc (holds info about version, icon) “Peering Inside the PE: A Tour
• SECTION .reloc (holds a table of base relocations to adjust for variables’ actual loaded of the Win32 Portable
places) Executable File Format,”
• SECTION .debug (holds debug info, e.g., about the source file) http://msdn.microsoft.com/en-us 11
/magazine/ms809762.aspx
STATIC MALWARE ANALYSIS – DETAILED INVESTIGATION (ELF FILES)
Headers :
Secti ons :
•Text Section (.text): Contains the program's executable instructions.
•Data Section (.data): Stores initialized global and static variables.
•BSS Section (.bss): Reserves space for uninitialized global and static
variables.
•Symbol Table Section: Contains symbols (functions, variables) and https://linuxhint.com/understand
their associated information. ing_elf_file_format/ 12
STATIC MALWARE ANALYSIS – DETAILED INVESTIGATION (APK FILES)
Android application is a Zip File containing many
other files.
Important files for static analysis are:
• Manifest File:
• This file contains the information about the
configuration details like:
• Permissions
• Intents
• App components
• Classes.dex:
• Contains the code of the application in dex
format.
13
STATIC MALWARE ANALYSIS – ARTIFACT EXTRACTION
Requires disassembling file using a disassembler
• Windows: IDA Pro, Ghidra
• Linux: IDA Pro, Radare2
• Android: apktool
Possible Artifacts:
• System calls
• Application Code
• Functions
• Shared Libraries
• Permissions
• Configurations 14
STATIC MALWARE ANALYSIS – ARTIFACT EXTRACTION
SYSTEM CALLS
System calls are requests for operating
system services, such as memory and
filesystem access.
Applicable to all platforms
Can be used in various ways
• Check Occurrence
• Check Frequency
• Check sequences
• Create Call Graphs
Example System Calls 15
STATIC MALWARE ANALYSIS – ARTIFACT EXTRACTION
EXAMPLE SYSTEM CALLS EXTRACTION FROM APK FILE
16
API call Graph
STATIC MALWARE ANALYSIS – ARTIFACT EXTRACTION
APPLICATION CODE INSTRCUCTIONS
Code instruction can be analyzed
Opcodes Can be extracted from code
Applicable to all platforms
Opcode sequences be used in
various ways
1 gram
2 gram
.
Example Procedure for extraction of opcodes form Android APK
.
n gram
17
STATIC MALWARE ANALYSIS – ARTIFACT EXTRACTION
STRINGS
Search for all Strings in the executable
Look for suspicious Strings
Be careful about drawing conclusions,
attackers can plant them to deceive the
analyst.
Applicable to all platforms
Strings utility is present by default in Linux
Other tools are IDA Pro and Hex Workshop
Using Strings Utility for Extracting Strings from an
18 EXE
STATIC MALWARE ANALYSIS – ARTIFACT EXTRACTION
SHARED LIBRARIES
A shared library or shared object is a file that is intended to be shared by
multiple programs.
Contains common operating system specific operations required by most of the
applications.
Provides insights about the operation of the executable
• Windows: PE files use Dynamic Link Libraries (DLLs).
• Linux: ELF files use shared objects (.so files).
• Android: APK files include Java Archive (JAR) files and, if applicable, native shared
libraries (SO files).
19
STATIC MALWARE ANALYSIS – ARTIFACT EXTRACTION
SHARED LIBRARIES
The PE header lists every library and
function that will be loaded
Their names can reveal what the program
does.
URLDownloadToFile indicates that the
program downloads something.
20
Example PE File DLLs
STATIC MALWARE ANALYSIS – ARTIFACT EXTRACTION
PERMISSIONS
Specific to Android. An
application needs permissions in
order to access protected parts of
the system or other apps.
Sensitive permission usage
Provides insights about malicious
behavior
Evidence of permissions can be Permissions listed in Manifest File of APK
used with other evidences to
formulate concrete evidences.
21
LIMITATION OF STATIC ANALYSIS - OBFUSCATION
• Changing the structure of code so that the semantic of code can be hidden from
analysts.
Obfuscation Category Obfuscation type
Identifier renaming
Class encryption
Code reordering
Code Obfuscation Reflection
Junk code insertion Figure 1- Obfuscation by Re-naming
Control flow modification
Dynamic code loading
Preventive Techniques Anti emulation Figure 2- Obfuscation by String Encryption
22
ARTIFACT EXTRACTION FROM MALICIOUS FILE
Ideally Reality
Source Code Malware
Compiler Unpacking
Executable code Non-executable
code
Disassembly & Disassembly & Undo
Analysis Analysis Obfuscation
Assembly code Obfuscated
Assembly code
assembly
Decompilation Decompilation Decompilation
Legitimate C/C++ A mess Legitimate C/C++
that a compiler
would generate
23
DYNAMIC MALWARE ANALYSIS
Dynamic analysis refers to analyzing an application by executing it in a sandbox.
It is an effective method for run-time behavior modeling of applications as
compared to signature and static analysis based analysis schemes.
Runtime behavior extractions helps in overcoming code obfuscation techniques
Still suspected to preventive obfuscation.
Requires availability of a controlled and safe working environment
24
DYNAMIC MALWARE ANALYSIS – CREATING A SAFE ENVIRONMENT
Do not run malware on your computer
Use virtualization
• Vmware
Do not allow malware to touch the real network
Use host only networking feature of the virtualization platform
Establish real services (DNS, Web) on your host machine or other virtual
machines
25
DYNAMIC MALWARE ANALYSIS – ARTIFACT EXTRACTION
Requires generation of events on the application
• Manual
• Automatic
• Monkey
Possible Artifacts:
• System calls
• Network Activity
• Resource Usage
• Memory Artifacts
26
DYNAMIC MALWARE ANALYSIS – ARTIFACT EXTRACTION
REGISTERY AND FILE CHANGES
Runtime changes in registry
Runtime changes in File system
Tools:
• SysInternal Process Monitor
(Specific to Windows)
27
DYNAMIC MALWARE ANALYSIS – ARTIFACT EXTRACTION
NETWORK ACTIVITY
Capture the network packets
Analyze the packets to extract information
Helps in analyzing attacks launched from networks
Tools:
• Wireshark
Information extracted from Network Traffic
28
DYNAMIC MALWARE ANALYSIS – ARTIFACT EXTRACTION
API CALLS
Runtime API call extraction helps overcoming the problem of runtime code
coverage
Incorporates the code loaded at runtime
Tools:
• ProcMan (Windows)
• Strace (Linux)
• Frida(Android)
29
DYNAMIC MALWARE ANALYSIS – ARTIFACT EXTRACTION
VOLATILE MEMORY
Volatile memory provides a rich source of valuable evidence such as process
details, network information, operating system specific information and
application code
The final executable code is run through memory and leaves its traces in many
memory structures.
Volatile memory is a rich source of evidence as the extracted evidence is resilient
to almost all code obfuscation techniques.
30
DYNAMIC MALWARE ANALYSIS – ARTIFACT EXTRACTION
VOLATILE MEMORY
Many structures are present in volatile memory that represent evidence
• Kernel Task Structure
• Contains meta data information about the running process.
• Heap
• Contains object structure of the application. Helpful in analyzing similarities
between samples of a malware family.
• Application Code
31
DYNAMIC MALWARE ANALYSIS – ARTIFACT EXTRACTION
VOLATILE MEMORY
Information ‘ Plugins
Tool – Volatility linux pslist, linux pstree, linux psaux and
(Memory Forensics Running processes linux threads
Framework) Hidden processes linux psxview and linux pid hashtable.
Usage of bash commands linux bash
Available for all Network activity in terms of udp, tcp linux_netstat, linux_list raw and
connection and interfaces linux_ifconfig.
platforms Modifications to the file operations and
sequence operations of udp tcp
structures Linux_check_afinfo
Many Plugins available Discrepancies in the arp table linux_ arp
for extracting The types of accessed files and number linux_ proc maps, linux_enumerate files
information. of files in each type and linux_lsof
Permissions against associated files of a
process linux_ proc_maps
Number of elevated processes linux_check_cred
32
USING MACHINE LEARNING FOR MALWARE PREDICTION
All extracted information (Static and
Dynamic) can be converted to feature
representations.
Fed to a machine learning algorithm for
training
The trained model can be used for
prediction of an unknown file.
Different machine learning models can
be used for training on the feature set
like Random Forest, KNN, and SVM etc
Machine Learning Approach for malware classification [1]
33
[1] Singh et al, A survey on machine learning-based malware detection in executable files (2021)
USING MACHINE LEARNING FOR MALWARE PREDICTION
Overview of dynamic analysis based Malware Analysis for PE files [1]
Overview of static analysis based Malware Analysis for PE files [1]
34
[1] Singh et al, A survey on machine learning-based malware detection in executable files (2021)
USING MACHINE LEARNING FOR MALWARE PREDICTION
The accuracy and effectiveness of machine learning based malware classification
models is an ongoing research area.
Important questions to answer:
• Best features for application modeling
• Time and resource efficient evidence extraction
• Most suited machine learning models to use
Data sets Available
• The Zoo
• Contagio
• CICAndMal2020
35
USING DEEP LEARNING FOR MALWARE PREDICTION
Deep Learning (DL) has been proposed for malware detection to overcome the
obstacles of feature engineering, with the expectation of replicating DL’s success
in image classification, machine translation, and text classification.
Features Deep Classifiers Analysis Type
Raw File
CNN Static
(exe, elf, classes.dex)
Raw File in RGB
CNN Static
(exe, elf, classes.dex)
Volatile Memory Image CNN Dynamic
API call Sequences LSTM Static
RunTime API call Sequences LSTM Dynamic
Opcode Sequences LSTM Static
36
USING DEEP LEARNING FOR MALWARE PREDICTION
Deep Learning for Malware Classification using images [2]
Example Memory Images of malware [1]
[1] Bozker et al, Catch them alive: A malware detection approach through memory forensics, manifold learning and computer vision (2021)
37
[2] Naeem et al, A Malware Detection Scheme via Smart Memory Forensics for Windows Devices , (2022)
EXPANDING MALWARE ANALYSIS
Detection of a file as malicious and benign is the first step of malware analysis
Further investigation involves:
• Finding the category of malware
• Adware
• Ransom
• Banking Trojan
• SMS Trojan
• Others..
• Finding the Family of a malware sample
38
THANK YOU
39