0% found this document useful (0 votes)
26 views

8 2 Data Encoding

This document discusses various techniques for encoding data in malware to avoid detection. It describes simple ciphers like Caesar and XOR encryption, as well as more advanced techniques like base64 encoding and cryptographic algorithms. It also covers identifying and decoding encoded content by looking for encryption routines, high entropy regions, and magic constants used in common algorithms. Tracing code execution and reversing custom encoding schemes developed by malware authors are presented as additional challenges.

Uploaded by

Jayesh Shinde
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

8 2 Data Encoding

This document discusses various techniques for encoding data in malware to avoid detection. It describes simple ciphers like Caesar and XOR encryption, as well as more advanced techniques like base64 encoding and cryptographic algorithms. It also covers identifying and decoding encoded content by looking for encryption routines, high entropy regions, and magic constants used in common algorithms. Tracing code execution and reversing custom encoding schemes developed by malware authors are presented as additional challenges.

Uploaded by

Jayesh Shinde
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Data Encoding

Chapter 13
Data Encoding
 Goals
 Defeat signature-detection by obfuscating malicious content. Disguise internal
working
 Encrypt network communication
 Hide command and control location
 Hide staging file before transmission
 Hide from “strings” analysis
Simple Ciphers
 Low overhead, simple, less obvious, light-weight – prevent
basic analysis
 Casesar Cipher
 Shift/Rotate characters (e.g. shifting letters three characters to
the right)
 XOR (e.g. XOR with 0x3C)
 Bit-wise XOR of data with a fixed byte or generated byte stream
Brute-force XOR Encoding
 For a fixed byte XOR, can brute force all 256 values to find a
header that makes sense. -> just try out (single-byte encoding)

MZ header 4d, 5a
Brute-Forcing Many Files
 Know: PE file header contain a string: This program must be
running under Win32/This program cannot be run is DOS.
 Enumerate through all possible keys to find a match
 Easy to break for single-byte XOR cipher
Null-preserving XOR encoding
 Easy to see by glance through the hex file for NULL – 0x12 (original) xor
0x12 (key) -> NULL
 Some malware uses null-preserving XOR to make detection less obvious
 Skip if original is NULL or key itself
 Otherwise, XOR with key
 Key is less obvious this way
Identify XOR Loops
 Use Search->Text to find all the XOR
 3 Cases XOR are used:
 XOR of a register with itself
 XOR of a register with a constant
 XOR of one register with a different register
 Encoding -> XOR with a constant inside a loop -> use IDA Pro to identify the
loop (graphical view)
Base 64
 Base-64
From MIME standard
Represents binary data in an ASCII string format
Binary data converted into one of 64 primary characters
Every 3-bytes of binary data is encoded in 4-bytes of Base64
ATT (24 bits/3 bytes -> regroup into 4 groups (6 bits each)
Decode Base 64 (Padding)
 Decoding is the same (watch out for padding)

Length of 11- should be divisible of 4

Add a padding character

 Bot54164 -> The attacker is managing the bots through the ID


 Look for a string used as an index table
 ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz012345
6789+/
 Try on-line conversion tools
 Caution: Malware can easily modify index table to create custom
substitution ciphers very easily (see book example)
Decode Base 64
 Malware can implement their own substitution cipher.

Unsuccessful decoding – not standard

 “a” moving to the front to make it appear to be standard Base 64.


Cryptographic Algorithm
 Simple cipher cannot be protected from brute-force
 Drawbacks of standard crypto:
 Crypto libraries are large and easily detected
 Must hide the key for symmetric encryption algorithms
 Reduced portability
 Recognizing encrypted code
 Imports include well-known OpenSSL or Microsoft functions
 Searching cryptographic constants
 FindCrypt2 plugin in IDA Pro (search program for crypto)
 Or
 Krypto ANALyzer plugin for PEiD
Cryptographic Algorithm
 Most crypto employs some magic constant (fixed string
of bits)
 Recognizing encrypted data
Some malware employs crypto algorithms that do not have
constants (RC4, IDEA generate at run-time) or do not rely on
libraries
Krypto ANAlyzer
Identify a wide range of constants (some false positives)
High-Entropy Content
 In case magic constants are not found – search for high-entropy content
 Entropy – expected information content of the symbol it outputs (amount of
randomness)
 IDA Entropy Plugin (graphical views)

DES-encryption
Hide command-
and-control

Normal Code
about 5.6 peak
Custom Encoding
 Malware uses homegrown encoding – e.g. XOR + Base64
 Trace execution to see suspicious activity in a tight loop
 Example: pseudo-random number generation followed by
xor (Figure 13-14, 13-15, p. 287)
 Reverse engineering to break custom encoding is more
difficult
Decoding
 Self-decoding malware
 Malware packaged with decoding routine
 Indications : strings that don't appear in binary file on disk, but appear
in debugger
 Decrypt by setting a breakpoint directly after decryption routine
finishes execution
 Malware may not decrypt the info you want (uncontrollable)
 Malware employing decoding functions
 Can sometimes use standard libraries to decode
 Python's base64.decodestring() or PyCrypto's functions
 (see examples Listing 13-8 to Listing 13-10)
 Programmatically use debugger to re-run malware’s decoding code
with chosen parameters (use the malware to decode/against itself)
 ImmDbg (allow Python to program the debugger)
In Class Homeworks

You might also like