8 2 Data Encoding
8 2 Data Encoding
Chapter 13
Data Encoding
Goals
Defeat signature-detection by obfuscating malicious content. Disguise internal
working
Encrypt network communication
Hide command and control location
Hide staging file before transmission
Hide from “strings” analysis
Simple Ciphers
Low overhead, simple, less obvious, light-weight – prevent
basic analysis
Casesar Cipher
Shift/Rotate characters (e.g. shifting letters three characters to
the right)
XOR (e.g. XOR with 0x3C)
Bit-wise XOR of data with a fixed byte or generated byte stream
Brute-force XOR Encoding
For a fixed byte XOR, can brute force all 256 values to find a
header that makes sense. -> just try out (single-byte encoding)
MZ header 4d, 5a
Brute-Forcing Many Files
Know: PE file header contain a string: This program must be
running under Win32/This program cannot be run is DOS.
Enumerate through all possible keys to find a match
Easy to break for single-byte XOR cipher
Null-preserving XOR encoding
Easy to see by glance through the hex file for NULL – 0x12 (original) xor
0x12 (key) -> NULL
Some malware uses null-preserving XOR to make detection less obvious
Skip if original is NULL or key itself
Otherwise, XOR with key
Key is less obvious this way
Identify XOR Loops
Use Search->Text to find all the XOR
3 Cases XOR are used:
XOR of a register with itself
XOR of a register with a constant
XOR of one register with a different register
Encoding -> XOR with a constant inside a loop -> use IDA Pro to identify the
loop (graphical view)
Base 64
Base-64
From MIME standard
Represents binary data in an ASCII string format
Binary data converted into one of 64 primary characters
Every 3-bytes of binary data is encoded in 4-bytes of Base64
ATT (24 bits/3 bytes -> regroup into 4 groups (6 bits each)
Decode Base 64 (Padding)
Decoding is the same (watch out for padding)
DES-encryption
Hide command-
and-control
Normal Code
about 5.6 peak
Custom Encoding
Malware uses homegrown encoding – e.g. XOR + Base64
Trace execution to see suspicious activity in a tight loop
Example: pseudo-random number generation followed by
xor (Figure 13-14, 13-15, p. 287)
Reverse engineering to break custom encoding is more
difficult
Decoding
Self-decoding malware
Malware packaged with decoding routine
Indications : strings that don't appear in binary file on disk, but appear
in debugger
Decrypt by setting a breakpoint directly after decryption routine
finishes execution
Malware may not decrypt the info you want (uncontrollable)
Malware employing decoding functions
Can sometimes use standard libraries to decode
Python's base64.decodestring() or PyCrypto's functions
(see examples Listing 13-8 to Listing 13-10)
Programmatically use debugger to re-run malware’s decoding code
with chosen parameters (use the malware to decode/against itself)
ImmDbg (allow Python to program the debugger)
In Class Homeworks