0% found this document useful (0 votes)
311 views

Lecture - 02a - Basic Static Analysis

This document provides an overview of basic static analysis techniques for malware analysis and reverse engineering, including: 1) Running antivirus scans to identify known malware, but recognizing limitations as malware can evade detection. 2) Calculating hashes to fingerprint malware and search for prior analyses. 3) Using string searching tools to find potentially meaningful strings like URLs, error messages, and function names that offer clues about malware behavior. 4) Noting that packed and obfuscated malware will limit static analysis by hiding code and requiring dynamic analysis.

Uploaded by

Elena Damon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
311 views

Lecture - 02a - Basic Static Analysis

This document provides an overview of basic static analysis techniques for malware analysis and reverse engineering, including: 1) Running antivirus scans to identify known malware, but recognizing limitations as malware can evade detection. 2) Calculating hashes to fingerprint malware and search for prior analyses. 3) Using string searching tools to find potentially meaningful strings like URLs, error messages, and function names that offer clues about malware behavior. 4) Noting that packed and obfuscated malware will limit static analysis by hiding code and requiring dynamic analysis.

Uploaded by

Elena Damon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

IS 873: Malware Analysis and Reverse Engineering

Basic Static Analysis


Overview
• AV scanning
• Hashing
• Finding strings
• Packed and Obfuscated malware
• Linked libraries and functions
• Static analysis in practice
• PE file format
Anti-virus Scanning
• A useful first step - run it through multiple AV programs
• AVs might already have identified the malware
• AVs are certainly not perfect
– Rely mainly on a database of identifiable pieces of known
suspicious code (file signatures), as well as
– behavioral and pattern-matching analysis (heuristics) to identify
suspect files
– Malware writers can easily modify their code thereby changing
their program’s signature and evading virus scanners.
– Also, rare malware often goes undetected by antivirus software
because it’s simply not in the database.
– Finally, heuristics, while often successful in identifying unknown
malicious code, can be bypassed by new and unique malware.
Anti-virus Scanning
• Because the various antivirus programs use different
signatures and heuristics, it’s useful to run several
different antivirus programs against the same piece of
suspected malware.
• Websites such as virustotal.com allow you to upload a
file for scanning by multiple antivirus engines.
• VirusTotal generates a report that provides the total
number of engines that marked the file as malicious, the
malware name, and, if available, additional information
about the malware.
Anti-virus Scanning
Hashing
• Hashing is a common method used to uniquely identify
malware – Provides a fingerprint
• Software is run through a hashing program that
produces a unique hash
• The MD5 hash function is the one most commonly used
for malware analysis, though the Secure Hash Algorithm
1 (SHA-1) is also popular.
• For example, using the freely available WinMD5
program to calculate the hash of the notepad program
that comes with Windows is shown:
Hashing
Hashing
• Once you have a unique hash for a piece of malware,
you can use it as follows:
– Search for that hash online to see if the file has already been
identified.
– Share that hash with other analysts to help them to identify
malware.
Finding Strings

• A string is a sequence of characters such as “MyFilename”


• A program contains strings if it prints a message, connects to a
URL, or copies a file to a specific location.
• Searching through the strings can be a simple way to get hints
about the functionality of a program.
• For example, if the program accesses a URL, then you will see the
URL accessed stored as a string in the program.
• You can use the Strings program (http://bit.ly/ic4plL), to search
an executable for strings, which are typically stored in either ASCII
or Unicode format.
Finding Strings
• WannaCry ransomware appeared in May 2017
• Its early version was neutralized using a “Kill Switch”
– http://www.iuqerfsodp9ifjaposdfjhgosurijfaewrwergwea. com/
Finding Strings
Finding Strings
• Both ASCII and Unicode formats store characters in sequences
that end with a NULL terminator to indicate that the string is
complete.
• ASCII strings use 1 byte per character, and Unicode uses 2 bytes
per character.
• Following figure shows the string “BAD” stored as ASCII.

• The ASCII string is stored as the bytes 0x42, 0x41, 0x44, and 0x00,
where 0x42 is the ASCII representation of a capital letter B, 0x41
represents the letter A, and so on.
• The 0x00 at the end is the NULL terminator.
Finding Strings
• Following figure shows the string “BAD” stored as Unicode.

• The Unicode string is stored as the bytes 0x42, 0x00, 0x41 …..
• Strings searches for a three-letter or greater sequence of ASCII
and Unicode characters, followed by a string termination
character.
• Strings program ignores context and formatting, so that it can
analyze any file type and detect strings across an entire file
• Though this also means that it may identify bytes of characters as
strings when they are not.
Finding Strings
• Most invalid strings are obvious, because they do not represent
legitimate text.
• For example, the following excerpt shows the result of running
Strings program against the file bp6.ex_:
Finding Strings
• If a string is short and doesn’t correspond to words,
it’s probably meaningless.
• On the other hand, the strings GetLayout and
SetLayout are Windows functions used by the
Windows graphics library.
• We can easily identify these as meaningful strings
because Windows function names normally begin
with a capital letter and subsequent words also begin
with a capital letter.
• GDI32.DLL is meaningful because it’s the name of a
common Windows dynamic link library (DLL) used by
graphics programs.
• DLL files contain executable code that is shared
among multiple applications.
Finding Strings
• 99.124.22.1 is an IP address—most likely one that the
malware will use in some fashion.
• The string “Mail system DLL is invalid.!Send Mail failed
to send message.” is an error message.
• Often, the most useful information obtained by running
Strings is found in error messages. This particular
message reveals two things:
– The subject malware sends messages (through email), and
– It depends on a mail system DLL.
• This information suggests that should:
– check email logs for suspicious traffic, and
– another DLL (Mail system DLL) might be associated with this
particular malware.
Finding Strings
• Note
– the missing DLL itself is not necessarily malicious
– malware often uses legitimate libraries and DLLs to further its goals.
Packed and Obfuscated Malware
• Malware authors often use packing or obfuscation to make their
files more difficult to detect or analyze.
• Obfuscated programs are ones for which the author has
attempted to hide execution.
• It is the deliberate act of creating source or machine code that is
difficult for humans to understand. It may use needlessly
roundabout expressions to compose statements
• Types include simple keyword substitution, use or non-use of
whitespace and self-generating or heavily compressed programs.
• Packed programs are a subset of obfuscated programs in which
the malicious program is compressed and cannot be analyzed.
• Both of these techniques will severely limit your attempts to
statically analyze the malware.
Packed and Obfuscated Malware

• Obfuscators typically turn small fragments of readable source


code (JavaScript example):
for (i=0; i < M.length; i++){
// Adjust position of clock hands
var ML=(ns)?document.layers['nsMinutes'+i]:ieMinutes[i].style;
ML.top=y[i]+HandY+(i*HandHeight)*Math.sin(min)+scrll;
ML.left=x[i]+HandX+(i*HandWidth)*Math.cos(min);
}
• into this:
for(O79=0;O79<l6x.length;O79++){var O63=(l70)?document.layers["nsM
\151\156u\164\145s"+O79]:ieMinutes[O79].style;O63.top=l61[O79]+O76+(O79*O7
5)*Math.sin(O51)+l73;O63.left=l75[O79]+l77+(O79*l76)*Math.cos(O51);}

Source: Semantic Designs http://www.semdesigns.com/Products/Obfuscators/


Packed and Obfuscated Malware
• Unlike malware, legitimate programs almost always contain many
strings

• If you discover that a program has very few strings, it probably


means it is packed or obfuscated – suggesting a malware

• Packed and obfuscated code will often include the functions


LoadLibrary and GetProcAddress, which are used to load and gain
access to additional functions.
Packed and Obfuscated Malware
• When the packed program is run, a small wrapper program also
runs to decompress the packed file and then run the unpacked
file, as shown

• When a packed program is analyzed statically, only the small


wrapper program can be dissected.
• We will discuss this topic in detail in “Anti-reverse engineering”.
References

• Practical Malware Analysis A hands-on guide to dissecting Malicious Software


by Michael Sikorski

• WinMD5 program from Edwin Olson:


http://www.blisstonia.com/software/WinMD5
• Strings program from Microsoft:
https://technet.microsoft.com/en-us/sysinternals/bb897439

You might also like