Hodur Recon2024
Hodur Recon2024
TAKAHIRO HARUYAMA
BINARLY
2
WHO AM I?
• Takahiro Haruyama (@cci_forensics)
• Principal Security Researcher at Binarly
• Previously Staff Threat Researcher at Carbon Black TAU
• Past Research
• Scalable RE automation (e.g., hunting vulnerable drivers)
• Anti-Forensics (e.g., firmware acquisition MitM attack)
• Malware Analysis (e.g., Internet-wide C2 scanning)
3
AGENDA
BACKGROUND
PEELING HODUR: DEFEATING
COMPILER-LEVEL
OBFUSCATIONS
HODUR PROTOCOL REVERSING
HODUR PROTOCOL EMULATION
WRAP-UP
4
BACKGROUND
5
WHY MALWARE C2 SCANNING?
• IP reputation is not effective for catching fresh C2s
• Internet-wide C2 scanning is beneficial from both
detection and threat intel perspectives
6
HOW MALWARE C2 SCANNING?
PEELING HODUR:
DEFEATING
COMPILER-LEVEL
OBFUSCATIONS
9
CONTROL FLOW
FLATTENING
DEFEATING COMPILER-LEVEL OBFUSCATIONS
WHAT’S CONTROL FLOW 10
FLATTENING?
• Control flow flattening (CFF) transforms a program's
control flow to make it much harder to understand,
while preserving the original functionality
First Block(s)
Control Flow
Dispatcher(s)
Flattened
Blocks
http://tigress.cs.arizona.edu/transformPage/docs/flatten/index.html
11
HOW CFF WORKS
• Control flow dispatchers decide which block to
execute next based on a state variable
• The state variable is updated in first/flattened blocks
CONTROL FLOW UNFLATTENING: 12
BASIC STRATEGY
1. Identify control flow dispatchers and state variables
2. Trace back the state variable values from the end of
flattened blocks
3. Associate the values with the block IDs
4. Re-order the code flow based on the associations
• I Use IDA Pro microcode for the unflattening task
• Intermediate representation used by Hex-Rays decompiler
• We can implement the algorithm in the optblock_t callback
CONTROL FLOW UNFLATTENING: 13
BASIC STRATEGY
1. Identify control flow dispatchers and state variables
2. Track back the state variable values from the end of
flattened blocks
3. Associate the values with the block IDs
4. Re-order the code flow based on the associations
• I Use IDA Pro microcode for the unflattening task
• Intermediate representation used by Hex-Rays decompiler
• We can implement the algorithm in the optblock_t callback
CONTROL FLOW UNFLATTENING: 14
• The dispatcher
detection algorithm
misses dispatchers
whose predecessors
are conditional jumps
by the state variable
• The genmc plugin
was useful for
troubleshooting
predecessor
17
ISSUE1: FIX
• I added another dispatcher detection algorithm
• The algorithm simply guesses a dispatcher block based on
the biggest number of predecessors
• The dispatcher will be validated based on the entropy
value of the state variable (only effective for OLLVM)
18
ISSUE1: FIX
• I added another dispatcher detection algorithm
• The algorithm simply guesses a dispatcher block based on
the biggest number of predecessors
• The dispatcher will be validated based on the entropy
value of the state variable (only effective for OLLVM)
ISSUE2: BLOCK STATE VARIABLE 19
TRACKING FAILURE
• The state variable tracking fails if the value is assigned
in the first blocks
• D-810 only traces in the flattened blocks and doesn’t
recognize the dispatcher has been reached -> loop L
MIXED BOOLEAN
ARITHMETIC
EXPRESSIONS
DEFEATING COMPILER-LEVEL OBFUSCATIONS
23
• Mixed Boolean
Arithmetic (MBA)
expressions
transform a
simple expression
into a complex
but semantically
equivalent form
POLYMORPHIC
STACK STRINGS
DEFEATING COMPILER-LEVEL OBFUSCATIONS
29
STACK STRINGS
• All strings are constructed and decoded in the stack area
• After defeating CFF and MBA expressions, the decoding
algorithm was identified
• enc[i] ^= (i + Const) ^ Const
• The constant value is different per function
COPYING THE ENCODED STRING 30
• Detect the length and constant value used in the decoding algorithm
Combination of
global variable and
hard-coded bytes
Length and
constant value
31
VARIOUS ACCESS PATTERNS
Additional XORs
before decoding
Referencing
another variable
(enc is decoded)
HODUR PROTOCOL
REVERSING
40
PROTOCOL OVERVIEW
• The latest Hodur samples only support HTTP/HTTPS
• Two header values (Sec-Dest/Sec-Site) used to
authenticate clients
• GET request for the initial handshake
• A RC4 key returned
• Periodical POST requests to receive C2 commands
after the handshake
• The request/response data are encrypted with the key
41
AUTHENTICATION HEADERS
• Sec-Dest: %2.2X%ws (e.g., “7BnqmmCg”)
• A random byte (0x64-0x99)
HODUR SCANNER
DEVELOPMENT
46
FAKE C2 SERVER FOR VALIDATION
• Developed a fake C2 server to validate the request
data of the PoC scanner and other recent samples
• fakenet (IP diverter) + Python HTTPS server
o_displ
o_near
{ 55 8B EC 6A ?? 68 ?? ?? ?? ?? 64 A1 ?? ?? ?? ?? 50 81 EC ?? ?? ?? ??
53 56 57 A1 ?? ?? ?? ?? 33 C5 50 8D 45 ?? 64 A3 ?? ?? ?? ?? 89 65 ??
8B 45 ?? 50 8D 8D ?? ?? ?? ?? E8 }
48
HUNTING RECENT SAMPLES (CONT.)
• One of the rules hit the
latest sample in Dec last
year
• CFF was not applied to
the sample
• The C2 included in the
sample was active J
• I could check the
Content-Length and the
format of the GET
response
49
APPROACH BASED ON VALIDATION
• All recent samples had exactly the same C2 protocol
encryption and data format
• Every sample’s C2 protocol/port is HTTPS/443
• No need to send the POST request after handshake
• The C2 likely responded without content until commands
are specified by operators
• I started to implement a scanner just checking the
difference between GET requests with/without the
authentication headers
50
TLS HANDSHAKE ISSUE
• OpenSSL caused an internal error during the TLS
handshake
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS header, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS header, Unknown (21):
* TLSv1.2 (OUT), TLS alert, internal error (592):
* error:0800006A:elliptic curve routines::point at infinity
* Closing connection 0
curl: (35) error:0800006A:elliptic curve routines::point at infinity
51
TLS HANDSHAKE ISSUE (CONT.)
• I tested major open source TLS clients
• Only LibreSSL (pylibtls) worked for the TLS handshake
WRAP-UP
57
WRAP-UP
• Defeating compiler-level obfuscations is easier than
before
• 2-3 months for APT10 ANEL -> 3-4 weeks for Hodur
• We still need to improve or create tools when RE requires
de-obfuscating code precisely
• Code will be available online after the conference
• The developed scanner keeps tracking the malware
C2s on the Internet
• We can respond proactively using the intel