IDAPython Book

Download as pdf or txt
Download as pdf or txt
You are on page 1of 48

The Beginner’s Guide to IDAPython

by Alexander Hanel

Introduction

Hello!

This is a book about IDAPython.

I orig inally wrote it as a ref erence f or myself - I wanted a place to go to where I could f ind
examples of f unctions that I commonly use (and f orget) in IDAPython. Since I started this
bookI have used it many times as a quick ref erence to understand syntax or see an example
of some code - if you f ollow my blog you may notice a f ew f amiliar f aces – lots of scripts
that I cover here are result of sophomoric experiments that I documented online.

Over the years I have received numerous emails asking what is the best g uide f or learning
IDAPython. Usually I will point them to to Ero Carrera’s Introduction to IDAPython or the
example scripts in the IDAPython’s public repo. They are excellent sources f or learning but
they don’t cover some common issues that I have come across. I wanted to create a book
that covers these issues.I f eel this book will be of value f or anyone learning IDAPython or
wanting a quick ref erence f or examples and snippets. Being an e-book it will not be a static
document and I plan on updating it in the f uture on regular basis.

If you come across any issues, typos or have questions please send me an email
alexander< dot >hanel< at >gmail< dot > com.

Updates

Version 1.0 - Published

Intended Audience & Disclaimer


This book is not intended f or beg inner reverse engineers. It is also not to serve as an
introduction to IDA. If you are new to IDA, I would recommend purchasing Chris Eagles The
IDA PRO Book . It is an excellent book and is worth every penny.

There area a couple of prerequisites f or purchasers of this book. You should be


comf ortable with reading assembly, a background in reverse engineering and know your
way around IDA. If you have hit a point where you have asked yourself “How can I automate
this task using IDAPython?” then this book might be f or you. If you already have a handf ul
of prog ramming in IDAPython under your belt then odds are this books is not f or you. This
book is f or beginners of IDAPython. It will serve as a handy ref erence to f ind examples of
commonly used f unctions but odds are you already have your own ref erences of one of f
scripts.

It should be stated that my background is in reverse eng ineering of malware. This book
does not cover compiler concepts such as basic blocks or other academic concepts used in
static analysis. The reason be, is I rarely ever use these concepts when reverse engineering
malware. Occasionally I have used them f or de-obf uscating code but not of ten enough
that I f eel they would be of value f or a beginner. Af ter reading this book the reader will f eel
comf ortable with dig ging into the IDAPython documentation on their own. One last
disclaimer, f unctions f or IDA’s debugger are not covered.

Conventions

IDA’s Output Windows (command line interf ace) was used f or the examples and output. For
brevity some examples do not contain the assig nment of the current address to a variable.
Usually represented as ea = here() . All of the code can be cut and paste into the
command line or IDA’s script command option shift-F2 . Reading f rom beg inning to end
is the recommend approach f or this book. There are a number of examples that are not
explained line by line because it assumed the reader understands the code f rom previous
examples. Dif f erent authors will call IDAPython’s in dif f erent ways. Sometimes the code will
be called as idc.SegName(ea) or SegName(ea) . In this book we will be using the f irst
style. I have f ound this convention to be easier to read and debug. Sometimes when using
this convention an error will be thrown as shown below.

Python>DataRefsTo(here())
<generator object refs at 0x05247828>
Python>idautils.DataRefsTo(here())
Traceback (most recent call last):
File "<string>", line 1, in <module>
NameError: name 'idautils' is not defined
Python>import idautils # manual importing of module
Python>idautils.DataRefsTo(here())
<generator object refs at 0x06A398C8>

If this happens the module will be need to be manually imported as shown above.

IDAPython Background

IDAPython was created in 2004 . It was a joint ef f ort by Gergely Erdelyi and Ero Carrera. Their
g oal was to combine the power of Python with the analysis automation of IDA’s IDC C-like
scripting lang uage. IDAPython consists of three separate modules. The f irst is idc . It is a
compatibility module f or wrapping IDA’s IDC f unctions. The second module is idautils . It
is a hig h level utility f unctions f or IDA. The third module is idaapi . It allows access to more
low level data. This data could be classes used by IDA.

Basics

Bef ore we dig too deep we should def ine some keywords and g o over the structure of IDA’s
disassembly output. We can use the f ollowing line of code as an example.

.text:00012529 m ov esi, [esp+4+arg_0]

The .text is the section name and the address is 00012529 . The displayed address is in a
hexadecimal f ormat. The instruction mov is ref erred to as a mnemonic. Af ter the
mnemonic is the f irst operand esi and the second operand is [esp+4+arg_0] . When
working with IDAPython f unctions the most common passed variable is the address. In the
IDAPython documentation the address is ref erenced as ea . The address can be accessed
manually by a couple of dif f erent f unctions. The most commonly used f unctions are
idc.ScreenEA() or here() . They will return an integer value. If we want to g et the
minimum address that is present in an IDB we can use MinEA() or to g et the max we can
use MaxEA() .

Python>ea = idc.ScreenEA()
Python>print "0x% x % s" % (ea, ea)
0x12529 75049
Python>ea = here()
Python>print "0x% x % s" % (ea, ea)
0x12529 75049
Python>hex(MinEA())
0x401000
Python>hex(MaxEA())
0x437000

Each described element in the disassembly output can be accessed by a f unction in


IDAPython. Below is an example of how to access each element. Please recall that we
previously stored the address in ea .

Python>idc.SegName(ea) # get text


.text
Python>idc.GetDisasm(ea) # get disassembly
m ov esi, [esp+4+arg_0]
Python>idc.GetMnem(ea) # get mnemonic
m ov
Python>idc.GetOpnd(ea,0) # get first operand
esi
Python>idc.GetOpnd(ea,1) # get second operand
[esp+4+arg_0]

To g et a string representation of the seg ments name we would use idc.SegName(ea)


with ea being an address within the seg ment. Printing a string of the disassembly can be
done withe idc.GetDisasm(ea) . It’s worth noting the spelling of the f unction. To g et
the mnemonic or the instruction name we would call idc.GetMnem(ea) . To g et the
operands of the mnemonic we would call idc.GetOpnd(ea, long n) . The f irst
argument is the address and the second long n is the operand index. The f irst operand is
0 and the second is 1.

In some situations it will be important to verif y an address exists. idaapi.BADADDR or


BADADDR can be used to check f or valid addresses.

Python>idaapi.BADADDR
4294967295
Python>hex(idaapi.BADADDR)
0xffffffffL
Python>if BADADDR != here(): print "valid address"
valid address
Segments

Printing a sing le line is not very usef ul. The power of IDAPython comes f rom iterating
throug h all instructions, cross-ref erences addresses and searching f or code or data. The
last two will be described in more details later. Iterating through all seg ments will be a g ood
place to start.

Python>for seg in idautils.Segments():


print idc.SegName(seg), idc.SegStart(seg), idc.SegEnd(seg)
HEADER 65536 66208
.idata 66208 66636
.text 66636 212000
.data 212000 217088
.edata 217088 217184
INIT 217184 219872
.reloc 219872 225696
GAP 225696 229376

idautils.Segments() returns an iterator type object. We can loop throug h the object by
using a f or loop. Each item in the list is a segment’s start address. The address can be used
to get the name if we pass it as an argument to idc.SegName(ea) . The start and end of
the seg ments can be f ound by calling idc.SegStart(ea) or idc.SegEnd(ea) . The
address or ea needs to be within the rang e of the start or end of the seg ment. If we didn’t
want to iterate throug h all seg ments but wanted to f ind the next seg ment we could use
idc.NextSeg(ea) . The address can be any address within the segment rang e f or which
we would want to f ind the next segment f or. If by chance we wanted to g et a segment’s
start address by name we could use idc.SegByName(segname) .

Functions

Now that we know how to iterate throug h all segments we should go over how to iterate
throug h all known f unctions.

Python>for func in idautils.Functions():


print hex(func), idc.GetFunctionName(func)
Python>
0x401000 ?DefWindowProcA@CWnd@@MAEJIIJ@Z
0x401006 ?
LoadFrame@CFrameWnd@@UAEHIKPAVCWnd@@PAUCCreateContext@@@Z
0x40100c ??2@YAPAXI@Z
0x401020 save_xored
0x401030 sub_401030
....
0x45c7b9 sub_45C7B9
0x45c7c3 sub_45C7C3
0x45c7cd SEH_44A590
0x45c7e0 unknown_libname_14
0x45c7ea SEH_43EE30

idautils.Functions() will return a list of known f unctions. The list will contain the start
address of each f unction. idautils.Functions() can be passed arg uments to search
within a range. If we wanted to do this we would pass the start and end address
idautils.Functions(start_addr, end_addr) . To g et a f unctions name we use
idc.GetFunctionName(ea) . ea can be any address within the f unction boundaries.
IDAPython contains a larg e set of APIs f or working with f unctions. Let’s start with a simple
f unction. The semantics of this f unction is not important but we should create a mental
note of the addresses.

.text:0045C7C3 sub_45C7C3 proc near


.text:0045C7C3 m ov eax, [ebp-60h]
.text:0045C7C6 push eax ; void *
.text:0045C7C7 call w_delete
.text:0045C7CC retn
.text:0045C7CC sub_45C7C3 endp

To g et the boundaries we can use idaapi.get_func(ea) .

Python>func = idaapi.get_func(ea)
Python>type(func)
<class 'idaapi.func_t'>
Python>print "Start: 0x% x, End: 0x% x" % (func.startEA,
func.endEA)
Start: 0x45c7c3, End: 0x45c7cd

idaapi.get_func(ea) returns a class of idaapi.func_t . Sometimes it is not always


obvious how to use a class returned by a f unction call. A usef ul command to explore classes
in Python is the dir(class) f unction.
Python>dir(func)
['__class__', '__del__', '__delattr__', '__dict__', '__doc__',
'__eq__', '__format__', '__getattribute__', '__gt__',
'__hash__', '__init__', '__lt__', '__module__', '__ne__',
'__new__', '__reduce__', '__reduce_ex__', '__repr__',
'__setattr__', '__sizeof__', '__str__', '__subclasshook__',
'__swig_destroy__', '__weakref__', '_print', 'analyzed_sp',
'argsize', 'clear', 'color', 'compare', 'contains', 'does_return',
'empty', 'endEA', 'extend', 'flags', 'fpd', 'frame', 'frregs',
'frsize', 'intersect', 'is_far', 'llabelqty', 'llabels',
'overlaps', 'owner', 'pntqty', 'points', 'referers', 'refqty',
'regargqty', 'regargs', 'regvarqty', 'regvars', 'size', 'startEA',
'tailqty', 'tails', 'this', 'thisown']

From the output we can see the startEA and endEA this is used to access the start and
end of the f unction. These attributes are only applicable towards the current f unction. If we
wanted to access surrounding f unctions we could use idc.NextFunction(ea) and
idc.PrevFunction(ea) . The value of ea only needs to be an address within the
boundaries of the analyzed f unction. A caveat with enumerating f unctions is that it only
works if IDA has identif ied the block of code as a f unction. Until the block of code is marked
as a f unction it will be skipped during the f unction enumeration process. Code that is not
marked as f unctions will be labeled red in the leg end (colored bar at the top). These can be
manually f ixed or automated.

IDAPython has a lot of dif f erent ways to access the same data. A common approach f or
accessing the boundaries within a f unction is using idc.GetFunctionAttr(ea,
FUNCATTR_START) and idc.GetFunctionAttr(ea, FUNCATTR_END) .

Python>ea = here()
Python>start = idc.GetFunctionAttr(ea, FUNCATTR_START)
Python>end = idc.GetFunctionAttr(ea, FUNCAT T R_END)
Python>cur_addr = start
Python>while cur_addr <= end:
print hex(cur_addr), idc.GetDisasm(cur_addr)
cur_addr = idc.NextHead(cur_addr, end)
Python>
0x45c7c3 mov eax, [ebp-60h]
0x45c7c6 push eax ; void *
0x45c7c7 call w_delete
0x45c7cc retn

idc.GetFunctionAttr(ea, attr) is used to g et the start and end of the f unction. We


then print the current address and the disassembly by using idc.GetDisasm(ea) . We use
idc.NextHead(eax) to g et the start of the next instruction and continue until we reach
the end of this f unction. A f law to this approach is it relies on the instructions to be
contained within the boundaries of the start and end of the f unction. If there was a jump to
an address hig her than the end of the f unction the loop would prematurely exit. These
types of jumps are quite common in obf uscation techniques such as code transf ormation.
Since boundaries can be unreliable it is best practice to call idautils.FuncItems(ea) to
loop throug h addresses in a f unction. We will g o into more details about this approach in
the f ollowing section.

Similar to idc.GetFunctionAttr(ea, attr) another usef ul f unction f or g athering


inf ormation about f unctions is GetFunctionFlags(ea) . It can be used to retrieve
inf ormation about a f unction such as if it’s library code or if the f unction doesn’t return a
value. There are nine possible f lag s f or a f unction. If we wanted to enumerate all the f lag s
f or all the f unctions we could use the f ollowing code.

Python>im port idautils


Python>for func in idautils.Functions():
flags = idc.GetFunctionFlags(func)
if flags & FUNC_NORET:
print hex(func), "FUNC_NORET"
if flags & FUNC_FAR:
print hex(func), "FUNC_FAR"
if flags & FUNC_LIB:
print hex(func), "FUNC_LIB"
if flags & FUNC_STATIC:
print hex(func), "FUNC_STATIC"
if flags & FUNC_FRAME:
print hex(func), "FUNC_FRAME"
if flags & FUNC_USERFAR:
print hex(func), "FUNC_USERFAR"
if flags & FUNC_HIDDEN:
print hex(func), "FUNC_HIDDEN"
if flags & FUNC_THUNK:
print hex(func), "FUNC_THUNK"
if flags & FUNC_LIB:
print hex(func), "FUNC_BOTTOMBP"

We use idautils.Functions() to g et a list of all known f unctions addresses and then


we use idc.GetFunctionFlags(ea) to get the f lags. We check the value by using a
log ical & on the returned value. For example to check if the f unction does not have a
return value we would use the f ollowing comparison if flags & FUNC_NORET . Now lets
g o over all the f lag s. Some of these f lags are very common while the other are rare.
FUNC_NORET

This f lag is used to identif y a f unction that does not execute a return instruction. It’s
internally represented as equal to 1. An example of a f unction that does not return a value
can be seen below.

CODE:004028F8 sub_4028F8 proc near


CODE:004028F8
CODE:004028F8 and eax, 7Fh
CODE:004028FB m ov edx, [esp+0]
CODE:004028FE jm p sub_4028AC
CODE:004028FE sub_4028F8 endp

Notice how ret or leave is not the last instruction.

FUNC_FAR

This f lag is rarely seen unless reversing sof tware that uses seg mented memory. It is
internally represented as an integ er of 2.

FUNC_USERFAR

This f lag is rarely seen and has very little documentation. HexRays describes the f lag as
“user has specif ied f ar-ness of the f unction”. It has an internal value of 32.

FUNC_LIB

This f lag is used to f ind library code. Identif ying library code is very usef ul because it is code
that typically can be ignored when doing analysis. It’ internally represented as an integ er
value of 4 . Below is an example of it’s usag e and f unctions it has identif ied.

Python>for func in idautils.Functions():


flags = idc.GetFunctionFlags(func)
if flags & FUNC_LIB:
print hex(func), "FUNC_LIB", GetFunctionName(func)
Python>
0x1a711160 FUNC_LIB _strcpy
0x1a711170 FUNC_LIB _strcat
0x1a711260 FUNC_LIB _memcmp
0x1a711320 FUNC_LIB _memcpy
0x1a711662 FUNC_LIB __onexit
...
0x1a711915 FUNC_LIB _exit
0x1a711926 FUNC_LIB __exit
0x1a711937 FUNC_LIB __cexit
0x1a711946 FUNC_LIB __c_exit
0x1a711955 FUNC_LIB _puts
0x1a7119c0 FUNC_LIB _strcmp

FUNC_STATIC

This f lag is used to identif y f unctions that were compiled as a static f unction. In C f unctions
are g lobal by def ault. If the author def ines a f unction as static it can be only accessed by
other f unctions within that f ile. In a limited way this could be used to aid in understanding
how the source code was structured.

FUNC_FRAME

This f lag indicates the f unction uses a f rame pointer ebp . Functions that use f rame
pointers will typically start with the standard f unction prolog ue f or setting up the stack
f rame.

.text:1A716697 push ebp


.text:1A716698 m ov ebp, esp
.text:1A71669A sub esp, 5Ch

FUNC_BOTTOMBP

Similar to FUNC_FRAME this f lag is used to track the f rame pointer. It will identif y f unctions
that f rame pointers is equal to the stack pointer.

FUNC_HIDDEN

Functions with the FUNC_HIDDEN f lag means they are hidden and will need to be expanded
to view. If we were to go to an address of a f unction that is marked as hidden it would
automatically be expanded.

FUNC_THUNK

This f lag identif ies f unctions that are thunk f unctions. They are simple f unctions that jump
to another f unction.

.text:1A710606 Process32Next proc near


.text:1A710606 jm p ds:__imp_Process32Next
.text:1A710606 Process32Next endp
It should be noted that a f unction can consist of multiple f lags.

0x1a716697 FUNC_LIB
0x1a716697 FUNC_FRAME
0x1a716697 FUNC_HIDDEN
0x1a716697 FUNC_BOTTOMBP

Instructions

Since we know how to work with f unctions go over how to access their instructions. If we
have the address of a f unction we can use idautils.FuncItems(ea) to get a list of all
the addresses.

Python>dism_addr = list(idautils.FuncItems(here()))
Python>type(dism_addr)
<type 'list'>
Python>print dism_addr
[4573123, 4573126, 4573127, 4573132]
Python>for line in dism_addr: print hex(line),
idc.GetDisasm(line)
0x45c7c3 mov eax, [ebp-60h]
0x45c7c6 push eax ; void *
0x45c7c7 call w_delete
0x45c7cc retn

idautils.FuncItems(ea) actually returns an iterator type but is cast to a list . The list
will contain the start address of each instruction in consecutive order. Now that we have a
g ood knowledg e base f or looping throug h segments, f unctions and instructions let show a
usef ul example. Sometimes when reversing packed code it is usef ul to only know where
dynamic calls happens. A dynamic call would be a call or jump to an operand that is a reg ister
such as call eax or jmp edi .

Python>
for func in idautils.Functions():
flags = idc.GetFunctionFlags(func)
if flags & FUNC_LIB or flags & FUNC_THUNK:
continue
dism_addr = list(idautils.FuncItems(func))
for line in dism_addr:
m = idc.GetMnem(line)
if m == 'call' or m == 'jmp':
op = idc.GetOpType(line, 0)
if op == o_reg:
print "0x% x % s" % (line, idc.GetDisasm(line))
Python>
0x43ebde call eax ; VirtualProtect

We call idautils.Functions() to get a list of all known f unctions. For each f unction we
retrieve the f unctions f lag s by calling idc.GetFunctionFlags(ea) . If the f unction is
library code or a thunk f unction the f unction is passed. Next we call
idautils.FuncItems(ea) to get all the addresses within the f unction. We loop throug h
the list using a for loop. Since we are only interested in call and jmp instructions we
need to g et the mnemonic by calling idc.GetMnem(ea) . We then use a simple string
comparison to check the mnemonic. If the mnemonic is a jump or call we g et the operand
type by calling idc.GetOpType(ea, n) . This f unction will return a integer that is
internally called op_t.type . This value can be used to determine if the operand is a
register, memory ref erence, etc. We then check if the op_t.type is a register. If so, we
print the line. Casting the return of idautils.FuncItems(ea) into a list is usef ul
because iterators do not have objects such as len() . By casting it as a list we could easily
g et the number of lines or instructions in a f unction.

Python>ea = here()
Python>len(idautils.FuncItems(ea))
Traceback (most recent call last):
File "<string>", line 1, in <module>
TypeError: object of type 'generator' has no len()
Python>len(list(idautils.FuncItems(ea)))
39

In the previous example we used a list that contained all addresses within a f unction. We
looped each entity to access the next instruction. What if we only had an address and
wanted to get the next instruction? To move to the next instruction address we can use
idc.NextHead(ea) and to get the previous instruction address we use
idc.PrevHead(ea) . These f unctions will g et the start of the next instruction but not the
next address. To get the next address we use idc.NextAddr(ea) and to g et the previous
address we use idc.PrevAddr(ea) .

Python>ea = here()
Python>print hex(ea), idc.GetDisasm(ea)
0x10004f24 call sub_10004F32
Python>next_instr = idc.NextHead(ea)
Python>print hex(next_instr), idc.GetDisasm(next_instr)
0x10004f29 mov [esi], eax
Python>prev_instr = idc.PrevHead(ea)
Python>print hex(prev_instr), idc.GetDisasm(prev_instr)
0x10004f1e mov [esi+98h], eax
Python>print hex(idc.NextAddr(ea))
0x10004f25
Python>print hex(idc.PrevAddr(ea))
0x10004f23

Operands

Operand types are commonly used so it will be benef icial to go over all the types. As
previous stated we can use idc.GetOpType(ea,n) to g et the operand type. ea is the
address and n is the index. There are eig ht dif f erent type of operand types.

o_void

If an instruction does not have any operands it will return 0.

Python>print hex(ea), idc.GetDisasm(ea)


0xa09166 retn
Python>print idc.GetOpType(ea,0)
0

o_reg

If an operand is a g eneral register it will return this type. This value is internally represented
as 1.

Python>print hex(ea), idc.GetDisasm(ea)


0xa09163 pop edi
Python>print idc.GetOpType(ea,0)
1

o_mem
If an operand is direct memory ref erence it will return this type. This value is internally
represented as 2. This type is usef ul f or f inding ref erences to DATA.

Python>print hex(ea), idc.GetDisasm(ea)


0xa05d86 cmp ds:dword_A152B8, 0
Python>print idc.GetOpType(ea,0)
2

o_phrase

This operand is returned if the operand consists of a base reg ister and/or a index reg ister.
This value is internally represented as 3.

Python>print hex(ea), idc.GetDisasm(ea)


0x1000b8c2 mov [edi+ecx], eax
Python>print idc.GetOpType(ea,0)
3

o_displ

This operand is returned if the operand consists of reg isters and a displacement value. The
displacement is an integ er value such 0x18. It is commonly seen when an instruction
accesses values in a structure. Internally it is represented as a value of 4 .

Python>print hex(ea), idc.GetDisasm(ea)


0xa05dc1 mov eax, [edi+18h]
Python>print idc.GetOpType(ea,1)
4

o_imm

Operands that are a value such as an integer of 0xC are of this type. Internally it is
represented as 5.

Python>print hex(ea), idc.GetDisasm(ea)


0xa05da1 add esp, 0Ch
Python>print idc.GetOpType(ea,1)
5
o_f ar

This operand is not very common when reversing x86 or x86_64 . It is used to f ind operands
that are accessing immediate f ar addresses. It is represented internally as 6

o_near

This operand is not very common when reversing x86 or x86_64 . It is used to f ind operands
that are accessing immediate near addresses. It is represented internally as 7.

Example

While reversing an executable we might notice that the code keeps ref erencing recurring
displacement values. This is a likely indicator that the code is passing a structure to
dif f erent f unctions. g o over an example to create a Python dictionary that contains all the
displacements as keys and each key will have a list of the addresses. In the code below there
will be a new f unction that has yet to be described. The f unction is similar to
idc.GetOpType(ea, n) .

import idautils
import idaapi
displace = {}

# for each known function


for func in idautils.Functions():
flags = idc.GetFunctionFlags(func)
# skip library & thunk functions
if flags & FUNC_LIB or flags & FUNC_THUNK:
continue
dism_addr = list(idautils.FuncItems(func))
for curr_addr in dism_addr:
op = None
index = None
# same as idc.GetOptype, just a different way of accessing
the types
idaapi.decode_insn(curr_addr)
if idaapi.cmd.Op1.type == idaapi.o_displ:
op = 1
if idaapi.cmd.Op2.type == idaapi.o_displ:
op = 2
if op == None:
continue
if "bp" in idaapi.tag_remove(idaapi.ua_outop2(curr_addr,
0)) or \
"bp" in
idaapi.tag_remove(idaapi.ua_outop2(curr_addr, 1)):
# ebp will return a negative number
if op == 1:
index = (~(int(idaapi.cmd.Op1.addr) - 1) &
0xFFFFFFFF)
else:
index = (~(int(idaapi.cmd.Op2.addr) - 1) &
0xFFFFFFFF)
else:
if op == 1:
index = int(idaapi.cmd.Op1.addr)
else:
index = int(idaapi.cmd.Op2.addr)
# create key for each unique displacement value
if index:
if displace.has_key(index) == False:
displace[index] = []
displace[index].append(curr_addr)

The start of the code should already look f amiliar. We use a combination of
idautils.Functions() and GetFunctionFlags(ea) to get all applicable f unctions
while ig noring libraries and thunks. We g et each instruction in a f unction by calling
idautils.FuncItems(ea) . From here this is where are new f unction
idaapi.decode_insn(ea) is called. This f unction takes the address of instruction we
want decoded. Once it is decoded we can access dif f erent properties of the instruction by
accessing it via idaapi.cmd .

Python>dir(idaapi.cmd)
['Op1', 'Op2', 'Op3', 'Op4', 'Op5', 'Op6', 'Operands', .....,
'assign', 'auxpref', 'clink', 'clink_ptr', 'copy', 'cs', 'ea',
'flags', 'get_canon_feature', 'get_canon_mnem', 'insnpref', 'ip',
'is_canon_insn', 'is_macro', 'itype', 'segpref', 'size']

As we can see f rom the dir() command idaapi.cmd has a good amount of attributes.
Now back to our example. The operand type is accessed by using idaapi.cmd.Op1.type .
Please note that the operand index starts at 1 rather than 0 which is dif f erent
than idc.GetOpType(ea,n) . We then check if the operand one or operand two is of
o_displ type. We use idaapi.tag_remove(idaapi.ua_outop2(ea, n)) to g et a
string representation of the operand. It would be shorter and easier to read if we called
idc.GetOpnd(ea, n) . For example purposes this is a good way to show that there is
more than one f unction to access attributes using IDAPython. If we were to look at the
IDAPython source code f or idc.GetOpnd(ea, n) we would see the lower level approach.

def GetOpnd(ea, n):


"""
Get operand of an instruction

@param ea: linear address of instruction


@param n: number of operand:
0 - the first operand
1 - the second operand

@return: the current text representation of operand


"""
res = idaapi.ua_outop2(ea, n)

if not res:
return ""
else:
return idaapi.tag_remove(res)

Now back to our example. Since we have the string we need to check if the operand
contains the string "bp" . This is a quick way to determine if the register bp , ebp or rbp
is present in the operand. We check f or “bp” because we need to determine if the
displacement value is neg ative or not. To access the displacement value we use
idaapi.cmd.Op1.addr . This will return a string. Now that we have the address we
convert it to an integ er, make it positive if needed, and then added it to our dictionary
named displace . If there is a displacement value that we wanted to search f or we could
access it using the f ollowing f or loop.

Python>for x in displace[0x130]: print hex(x), GetDisasm (x)


0x10004f12 m ov [esi+130h], eax
0x10004f68 m ov [esi+130h], eax
0x10004fda push dword ptr [esi+130h] ; hObject
0x10005260 push dword ptr [esi+130h] ; hObject
0x10005293 push dword ptr [eax+130h] ; hHandle
0x100056be push dword ptr [esi+130h] ; hEvent
0x10005ac7 push dword ptr [esi+130h] ; hEvent

0x130 is the displacement value we are interested in. This can be modif ied to print other
displacements.
Example

Sometimes when reversing a memory dump of an executable the operands are not
recognized as an of f set.

seg000:00BC1388 push 0Ch


seg000:00BC138A push 0BC10B8h
seg000:00BC138F push [esp+10h+arg_0]
seg000:00BC1393 call ds:_strnicmp

The second value being pushed is a memory of f set. If we were to rig ht click on it and
chang e it to a data type; we would see the of f set to a string . This is okay to do once or
twice but af ter that we mig ht as well automate the process.

min = MinEA()
max = MaxEA()
# for each known function
for func in idautils.Functions():
flags = idc.GetFunctionFlags(func)
# skip library & thunk functions
if flags & FUNC_LIB or flags & FUNC_THUNK:
continue
dism_addr = list(idautils.FuncItems(func))
for curr_addr in dism_addr:
if idc.GetOpType(curr_addr, 0) == 5 and \
(min < idc.GetOperandValue(curr_addr,0) < max):
idc.OpOff(curr_addr, 0, 0)
if idc.GetOpType(curr_addr, 1) == 5 and \
(min < idc.GetOperandValue(curr_addr,1) < max):
idc.OpOff(curr_addr, 1, 0)

Af ter running the above code we would now see the string .

seg000:00BC1388 push 0Ch


seg000:00BC138A push offset aNtoskrnl_exe ;
"ntoskrnl.exe"
seg000:00BC138F push [esp+10h+arg_0]
seg000:00BC1393 call ds:_strnicmp

At the start we g et the minimum and maximum address by calling MinEA() and MaxEA()
We loop throug h all f unctions and instructions. For each instruction we check if the operand
type is of o_imm and is represented internally as the number 5. o_imm types are values
such as an integ er or an of f set. Once a value is f ound we read the value by calling
idc.GetOperandValue(ea,n) . The value is then checked to see if it is in the rang e of the
minimum and maximum addresses. If so, we use idc.OpOff(ea, n, base) to convert
the operand to an of f set. The f irst arg ument ea is the address, n is the operand index
and base is the base address. Our example only needs to have a base of zero.

Xrefs

Being able to locate cross-ref erences aka xref s to data or code is very important. Xref s are
important because they provide locations of where certain data is being used or where a
f unction is being called f rom. For example what if we wanted to locate the address of
where WriteFile was called f rom. Using Xref s all we would need to do is locate the
address of WriteFile in the import table and then f ind all xref s to it.

Python>wf_addr = idc.LocByName("WriteFile")
Python>print hex(wf_addr), idc.GetDisasm(wf_addr)
0x1000e1b8 extrn WriteFile:dword
Python>for addr in idautils.CodeRefsTo(wf_addr, 0):\
print hex(addr), idc.GetDisasm(addr)
0x10004932 call ds:WriteFile
0x10005c38 call ds:WriteFile
0x10007458 call ds:WriteFile

In the f irst line we get the address of the API WriteFile by using idc.LocByName(str) .
This f unction will return the address of the API. We print out the address of WriteFile
and it’s string representation. Then loop through all code cross ref erences by calling
idautils.CodeRefsTo(ea, flow) . It will return an iterator that can be looped through.
ea is the address that we would like to have cross-ref erenced to. The argument f low is a
bool . It is used to specif y to f ollow normal code f low or not. Each cross ref erence to the
address is then displayed. A quick note about the use of idc.LocByName(str) . All
renamed f unctions and APIs in an IDB can be accessed by calling idautils.Names() . This
f unction returns an iterator object which can be lopped throug h to print or access the
names. Each named item is a tupple of (ea, str_name) .

Python>[x for x in Names()]


[(268439552, 'SetEventCreateThread'), (268439615, 'StartAddress'),
(268441102, 'SetSleepClose'),....
If we wanted to g et where code was ref erenced f rom we would use
idautisl.CodeRefsFrom(ea,flow) . For example lets g et the address of where
0x10004932 is ref erenced f rom.

Python>ea = 0x10004932
Python>print hex(ea), idc.GetDisasm(ea)
0x10004932 call ds:WriteFile
Python>for addr in idautils.CodeRefsFrom(ea, 0):\
print hex(addr), idc.GetDisasm(addr)
Python>
0x1000e1b8 extrn WriteFile:dword

If we review the idautils.CodeRefsTo(ea, flow) example we will see the address


0x10004932 is a to address to WriteFile . idautils.CodeRefsTo(ea, flow) and
idautils.CodeRefsFrom(ea, flow) are used to search f or cross ref erences to and
f rom code. A limitation of using idautils.CodeRefsTo(ea, flow) is that APIs that are
imported dynamically and then manually renamed will not show up as code cross-
ref erences. say we manually rename a dword address to "RtlCompareMemory" using
idc.MakeName(ea, name) .

Python>hex(ea)
0xa26c78
Python>idc.MakeName(ea, "RtlCompareMemory")
True
Python>for addr in idautils.CodeRefsTo(ea, 0):\
print hex(addr), idc.GetDisasm(addr)

IDA will not label these APIs as code cross ref erences. A litle later we will describe a generic
technique to get all cross ref erences. If we wanted to search f or cross ref erences to and
f rom data we could use idautils.DataRefsTo(e) or idautils.DataRefsFrom(ea) .

Python>print hex(ea), idc.GetDisasm(ea)


0x1000e3ec db 'vnc32',0
Python>for addr in idautils.DataRefsTo(ea): print hex(addr),
idc.GetDisasm(addr)
0x100038ac push offset aVnc32 ; "vnc32"

idautils.DataRefsTo(ea) takes an arg ument of the address and returns an iterator of


all the addresses that cross ref erence to the data.
Python>print hex(ea), idc.GetDisasm(ea)
0x100038ac push offset aVnc32 ; "vnc32"
Python>for addr in idautils.DataRefsFrom(ea): print hex(addr),
idc.GetDisasm(addr)
0x1000e3ec db 'vnc32',0

To do the reverse and show the f rom address we call idautils.DataRefsFrom(ea) ,


pass the address as an argument. Which returns an iterator of all the addresses that cross
ref erence back to the data. The dif f erent usage of code and data can be a little conf using .
As previously mentioned lets describe a more g eneric technique. This approach can be be
used to get all cross ref erences to an address by calling a single f unction. We can g et all
cross ref erences to an address using idautils.XrefsTo(ea, flags=0) and get all
cross ref erences f rom an address by calling idautils.XrefsFrom(ea, flags=0) .

Python>print hex(ea), idc.GetDisasm(ea)


0x1000eee0 unicode 0, <Path>,0
Python>for xref in idautils.XrefsTo(ea, 1):
print xref.type, idautils.XrefTypeName(xref.type), \
hex(xref.frm), hex(xref.to), xref.iscode
Python>
1 Data_Offset 0x1000ac0d 0x1000eee0 0
Python>print hex(xref.frm), idc.GetDisasm(xref.frm)
0x1000ac0d push offset KeyName ; "Path"

The f irst line displays our address and a string named <Path> . We use
idautils.XrefsTo(ea, 1) to get all cross ref erences to the string. We then use
xref.type to print the xref s type value. idautils.XrefTypeName(xref.type) is used
to print the string representation of this type. There are twelve dif f erent documented
ref erence type values. The value can be seen on the lef t and it’s correpsonding name can be
seen below.

0 = 'Data_Unknown'
1 = 'Data_Offset'
2 = 'Data_Write'
3 = 'Data_Read'
4 = 'Data_Text'
5 = 'Data_Informational'
16 = 'Code_Far_Call'
17 = 'Code_Near_Call'
18 = 'Code_Far_Jump'
19 = 'Code_Near_Jump'
20 = 'Code_User'
21 = 'Ordinary_Flow'

The xref.frm prints out the f rom address and xref.to prints out the two address.
xref.iscode prints if the xref is in a code seg ment. In the previous example we had the
f lag of idautils.XrefsTo(ea, 1) set to the value 1. If the f lag is zero any cross
ref erence will be displayed. say we have the below block of assembly.

.text:1000AAF6 jnb short loc_1000AB02 ; XREF


.text:1000AAF8 m ov eax, [ebx+0Ch]
.text:1000AAFB m ov ecx, [esi]
.text:1000AAFD sub eax, edi
.text:1000AAFF m ov [edi+ecx], eax
.text:1000AB02
.text:1000AB02 loc_1000AB02: ; ea is
here()
.text:1000AB02 m ov byte ptr [ebx], 1

We have the cursor at 1000AB02 . This address has a cross ref erence f rom 1000AAF6 but
it also has second cross ref erence.

Python>print hex(ea), idc.GetDisasm(ea)


0x1000ab02 m ov byte ptr [ebx], 1
Python>for xref in idautils.XrefsTo(ea, 1):
print xref.type, idautils.XrefTypeName(xref.type), \
hex(xref.frm), hex(xref.to), xref.iscode
Python>
19 Code_Near_Jump 0x1000aaf6 0x1000ab02 1
Python>for xref in idautils.XrefsTo(ea, 0):
print xref.type, idautils.XrefTypeName(xref.type), \
hex(xref.frm), hex(xref.to), xref.iscode
Python>
21 Ordinary_Flow 0x1000aaff 0x1000ab02 1
19 Code_Near_Jump 0x1000aaf6 0x1000ab02 1

The second cross ref erence is f rom 1000AAFF to 1000AB02 . Cross ref erences do not
have to be caused by branch instructions. They can also be caused by normal ordinary code
f low. If we set the f lag to 1 Ordinary_Flow ref erence types will not be added. go back to
our RtlCompareMemory example f rom eariler. We can use idautils.XrefsTo(ea,
flow) to g et all cross ref erences.

Python>hex(ea)
0xa26c78
Python>idc.MakeName(ea, "RtlCompareMemory")
True
Python>for xref in idautils.XrefsTo(ea, 1):
print xref.type, idautils.XrefTypeName(xref.type), \
hex(xref.frm), hex(xref.to), xref.iscode
Python>
3 Data_Read 0xa142a3 0xa26c78 0
3 Data_Read 0xa143e8 0xa26c78 0
3 Data_Read 0xa162da 0xa26c78 0

Getting all cross ref erences can be a little verbose sometimes.

Python>print hex(ea), idc.GetDisasm(ea)


0xa21138 extrn GetProcessHeap:dword
Python>for xref in idautils.XrefsTo(ea, 1):
print xref.type, idautils.XrefTypeName(xref.type), \
hex(xref.frm), hex(xref.to), xref.iscode
Python>
17 Code_Near_Call 0xa143b0 0xa21138 1
17 Code_Near_Call 0xa1bb1b 0xa21138 1
3 Data_Read 0xa143b0 0xa21138 0
3 Data_Read 0xa1bb1b 0xa21138 0
Python>print idc.GetDisasm(0xa143b0)
call ds:GetProcessHeap

The verboseness comes f rom the Data_Read and the Code_Near both added to the
xref s. Getting all the addresses and adding them to a set can be usef ul to slim down on all
the addresses.

def get_to_xrefs(ea):
xref_set = set([])
for xref in idautils.XrefsTo(ea, 1):
xref_set.add(xref.frm)
return xref_set

def get_frm _xrefs(ea):


xref_set = set([])
for xref in idautils.XrefsFrom(ea, 1):
xref_set.add(xref.to)
return xref_set
Exampe of the slim down f unctions on out GetProcessHeap example.

Python>print hex(ea), idc.GetDisasm(ea)


0xa21138 extrn GetProcessHeap:dword
Python>get_to_xrefs(ea)
set([10568624, 10599195])
Python>[hex(x) for x in get_to_xrefs(ea)]
['0xa143b0', '0xa1bb1b']

Searching

We have already g one over some basic searches by iterating over all known f unctions or
instructions. This is usef ul but sometimes we need to search f or specif ic bytes such as
0x55 0x8B 0xEC . This byte pattern is the classic f unction prologue push ebp, mov
ebp, esp . To search f or byte or binary patterns we can use idc.FindBinary(ea,
flag, searchstr, radix=16) . ea is the address that we would like to search f rom the
flag is the direction or condition. There are a number of dif f erent types of f lag s. The
names and values can be seen below.

SEARCH_UP = 0
SEARCH_DOWN = 1
SEARCH_NEXT = 2
SEARCH_CASE = 4
SEARCH_REGEX = 8
SEARCH_NOBRK = 16
SEARCH_NOSHOW = 32
SEARCH_UNICODE = 64 **
SEARCH_IDENT = 128 **
SEARCH_BRK = 256 **
** Older versions of IDAPython do not support these

Not all of these f lag s are worth g oing over but touch upon the most commonly used f lag s.

SEARCH_UP and SEARCH_DOWN are used to select the direction we would like our
search to f ollow.
SEARCH_NEXT is used to get the next f ound object.
SEARCH_CASE is used to specif y case sensitivity.
SEARCH_NOSHOW will not show the search progress.
SEARCH_UNICODE is used to treat all search strings as Unicode.

searchstr is the pattern we are search f or. The radix is used when writing processor
modules. This topic is outside of the scope of this book. I would recommend reading
Chapter 19 of Chris Eagle’s The IDA Pro Book. For now the radix f ield can be lef t blank. g o
over a quick walk throug h on f inding the f unction prologue byte patten mentioned earlier.

Python>pattern = '55 8B EC'


addr = MinEA()
for x in range(0,5):
addr = idc.FindBinary(addr, SEARCH_DOWN, pattern);
if addr != idc.BADADDR:
print hex(addr), idc.GetDisasm(addr)
Python>
0x401000 push ebp
0x401000 push ebp
0x401000 push ebp
0x401000 push ebp
0x401000 push ebp

In the f irst line we def ine our search pattern. The search pattern can be in the f ormat of
hexadecimal starting with 0x as in 0x55 0x8B 0xEC or as bytes appear in IDA’s hex view
55 8B EC . The f ormat \x55\x8B\xEC can not be used unless we were using
idc.FindText(ea, flag, y, x, searchstr) . MinEA() is used to get the f irst
address in the executable. We then assig n the return of idc.FindBinary(ea, flag,
searchstr, radix=16) to a variable called addr .

When searching it is important to verif y that the search did f ind the pattern. This is tested
by comparing addr with idc.BADADDR . We then print the address and disassembly.
Notice how the address did not increment? This is because we did not pass the
SEARCH_NEXT f lag . If this f lag is not passed the current address is used to search f or the
pattern. If the last address contained our byte pattern the search will never increment
passed it. Below is the corrected version.

Python>pattern = '55 8B EC'


addr = MinEA()
for x in range(0,5):
addr = idc.FindBinary(addr, SEARCH_DOWN|SEARCH_NEXT,
pattern);
if addr != idc.BADADDR:
print hex(addr), idc.GetDisasm(addr)
Python>
0x401040 push ebp
0x401070 push ebp
0x4010e0 push ebp
0x401150 push ebp
0x4011b0 push ebp

Searching f or byte patterns is usef ul but sometimes we might want to search f or string s
such as “chrome.dll”. We could convert the string s to a hex bytes using [hex(y) for y
in bytearray("chrome.dll")] but this is a little ugly. Also, if the string is unicode we
would have to account f or that f ormat. The simplest approach is using FindText(ea,
flag, y, x, searchstr) . Most of these f ields should look f amiliar because they are the
same as idc.FindBinary . ea is the start address and f lag is the direction and types to
search f or. y is the number of lines at ea to search f rom and x is the coordinate in the
line. These f ields are typically assig ned as 0 . Now search f or occurrences of the string
“Accept”. Any string f rom the string s window shift+F12 can be used f or this example.

Python>cur_addr = MinEA()
end = MaxEA()
while cur_addr < end:
cur_addr = idc.FindText(cur_addr, SEARCH_DOWN, 0, 0,
"Accept")
if cur_addr == idc.BADADDR:
break
else:
print hex(cur_addr), idc.GetDisasm(cur_addr)
cur_addr = idc.NextHead(cur_addr)
Python>
0x40da72 push offset aAcceptEncoding; "Accept-Encoding:\n"
0x40face push offset aHttp1_1Accept; " HTTP/1.1\r\nAccept: */*
\r\n "
0x40fadf push offset aAcceptLanguage; "Accept-Language: ru
\r\n"
...
0x423c00 db 'Accept',0
0x423c14 db 'Accept-Language',0
0x423c24 db 'Accept-Encoding',0
0x423ca4 db 'Accept-Ranges',0

We use MinEA() to get the minimum address and assign that to a variable named
cur_addr . This is similarly done ag ain f or the maximum address by calling MaxEA() and
assig ning the return to a variable named the end . Since we do not know how many
occurrences of the string will be present, we need to check that the search continues down
and is less than the maximum address. We then assign the return of idc.FindText to the
current address. Since we will be manually incrementing the address by calling
idc.NextHead(ea) we do not need the SEARCH_NEXT f lag. The reason why we manually
increment the current address to the f ollowing line is because a string can occur multiple
times on a sing le line. This can make it tricky to get the address of the next string.

Along with pattern searching previously described there a couple of f unctions that can be
used to f ind other types. The naming conventions of the f ind APIs makes it easy to inf er it’s
overall f unctionality. Bef ore we discuss f inding the dif f erent types we f irstly go over
identif ying types by their address. There is a subset of APIs that start with is that can be
used to determine an address’ type. The APIs return a Boolean value of True or False .

idc.isCode(f )

Returns True if IDA has marked the address as code.

idc.isData(f )

Returns True if IDA has marked the address as data.

idc.isTail(f )

Returns True if IDA has marked the address as tail.

idc.isUnknown(f )

Returns True if IDA has marked the address as unknown. This type is used when IDA has
not identif ied if the address is code or data.

idc.isHead(f )

Returns True if IDA has marked the address as head.

The f is new to us. Rather than passing an address we f irst need to g et the internal f lag s
representation and then pass it to our idc.is set of f unctions. To get the internal f lags
we use idc.GetFlags(ea) . Now that we have a basics on how the f unction can be used
and the dif f erent types lets do a quick example.

Python>print hex(ea), idc.GetDisasm(ea)


0x10001000 push ebp
Python>idc.isCode(idc.GetFlags(ea))
True

idc.FindCode(ea, f lag)
It is used to f ind the next address that is marked as code. This can be usef ul if we want to
f ind the end of a block of data. If ea is an address that is already marked as code it will
return the next address. The flag is used as previously described in idc.FindText .

Python>print hex(ea), idc.GetDisasm(ea)


0x4140e8 dd offset dword_4140EC
Python>addr = idc.FindCode(ea, SEARCH_DOWN|SEARCH_NEXT)
Python>print hex(addr), idc.GetDisasm(addr)
0x41410c push ebx

As we can see ea is the address 0x4140e8 of some data. We assign the return of
idc.FindCode(ea, SEARCH_DOWN|SEARCH_NEXT) to addr . Then we print addr and
it’s disassembly. By calling this sing le f unction we skipped 36 bytes of data to g et the start
of a section marked as code.

idc.FindData(ea, f lag )

It is used exactly as idc.FindCode except it will return the start of the next address that
is marked as a block of data. If we reverse the previous scenario and start f rom the address
of code and search up to f ind the start of the data.

Python>print hex(ea), idc.GetDisasm(ea)


0x41410c push ebx
Python>addr = idc.FindData(ea, SEARCH_UP|SEARCH_NEXT)
Python>print hex(addr), idc.GetDisasm(addr)
0x4140ec dd 49540E0Eh, 746E6564h, 4570614Dh, 7972746Eh, 8, 1,
4010BCh

The only thing that is slig htly dif f erent than the previous example is the direction of
SEARCH_UP|SEARCH_NEXT and searching f or data.

idc.FindUnexplored(ea, f lag )

This f unction is used to f ind the address of bytes that IDA did not identif y as code or data.
The unknown type will require f urther manual analysis either visually or throug h scripting.

Python>print hex(ea), idc.GetDisasm(ea)


0x406a05 jge short loc_406A3A
Python>addr = idc.FindUnexplored(ea, SEARCH_DOWN)
Python>print hex(addr), idc.GetDisasm(addr)
0x41b004 db 0DFh ; ?
idc.FindExplored(ea, f lag )

It is used to f ind an address that IDA identif ied as code or data.

0x41b900 db ? ;
Python>addr = idc.FindExplored(ea, SEARCH_UP)
Python>print hex(addr), idc.GetDisasm(addr)
0x41b5f4 dd ?

This might not seem of any real value but if we were to print the cross ref erences of addr
we would see it is being used.

Python>for xref in idautils.XrefsTo(addr, 1):


print hex(xref.frm), idc.GetDisasm(xref.frm)
Python>
0x4069c3 m ov eax, dword_41B5F4[ecx*4]

idc.FindImmediate(ea, f lag, value)

Rather than searching f or a type we mig ht want to search f or a specif ic value. say f or
example that we have a f eeling that the code calls rand to g enerate a random number but
we can’t f ind the code. If we knew that rand uses the value 0x343FD as a seed we could
search f or that number.

Python>addr = idc.FindImmediate(MinEA(), SEARCH_DOWN, 0x343FD )


Python>addr
[268453092, 0]
Python>print "0x% x % s % x" % (addr[0], idc.GetDisasm(addr[0]),
addr[1] )
0x100044e4 imul eax, 343FDh 0

In the f irst line we pass the minimum address via MinEA() , search down and then search
f or the value 0x343FD . Rather than returning an address as shown in the previous Find APIs
idc.FindImmediate returns a tupple. The f irst item in the tupple will be the address and
second will be the operand. Similar to the return of idc.GetOpnd the f irst operand starts
at zero. When we print the address and disassembly we can see the value is the second
operand. If we wanted to search f or all uses of an immediate value we could do the
f ollowing.

Python>addr = MinEA()
while True:
addr, operand = idc.FindImmediate(addr,
SEARCH_DOWN|SEARCH_NEXT, 0x7a )
if addr != BADADDR:
print hex(addr), idc.GetDisasm(addr), "Operand ", operand
else:
break
Python>
0x402434 dd 9, 0FF0Bh, 0Ch, 0FF0Dh, 0Dh, 0FF13h, 13h, 0FF1Bh, 1Bh
Operand 0
0x40acee cmp eax, 7Ah Operand 1
0x40b943 push 7Ah Operand 0
0x424a91 cmp eax, 7Ah Operand 1
0x424b3d cmp eax, 7Ah Operand 1
0x425507 cmp eax, 7Ah Operand 1

Most of the code should look f amiliar but since we are searching f or multiple values we will
be using a while loop and the SEARCH_DOWN|SEARCH_NEXT f lags.

Selecting Data

Not always will we want to write code that automatically searches f or code or data. In
some instances we already know the location of the code or data but we want to select it
f or analysis. In situations like this we mig ht just want to highlight the code and start
working with it in IDAPython. To get the boundaries of selected data we can use
idc.SelStart() to g et the start and idc.SelEnd() to g et the end. say we have the
below code selected.

.text:00408E46 push ebp


.text:00408E47 m ov ebp, esp
.text:00408E49 m ov al, byte ptr dword_42A508
.text:00408E4E sub esp, 78h
.text:00408E51 test al, 10h
.text:00408E53 jz short loc_408E78
.text:00408E55 lea eax, [ebp+Data]

We can use the f ollowing code to print out the addresses.

Python>start = idc.SelStart()
Python>hex(start)
0x408e46
Python>end = idc.SelEnd()
Python>hex(end)
0x408e58

We assig n the return of idc.SelStart() to start . This will be the address of the f irst
selected address. We then use the return of idc.SelEnd() and assig n it to end . One
thing to note is that end is not the last selected address but the start of the next address.
If we pref erred to make only one API call we could use idaapi.read_selection() . It
returns a tuple with the f irst value being a bool if the selection was read, the second being
the start address and the last address being the end.

Python>Worked, start, end = idaapi.read_selection()


Python>print Worked, hex(start), hex(end)
True 0x408e46 0x408e58

Be cautious when working with 64 bit samples. The base address is not always correct
because the selected start address will cause an integer overf low and the leading digit will
be incorrect.

Comments & Renaming

A personal belief of mine is that if I’m not writing I’m not reversing . Adding comments,
renaming f unctions and interacting with the assembly is one of the best ways to
understand what the code is doing . Over time some of the interaction becomes redundant.
In situations like this it usef ul to automate the process.

Bef ore we go over some examples we should f irst discuss the basics of comments and
renaming . There are two types of comments. The f irst one is a regular comment and the
second is a repeatable comment. A reg ular comment appears at address 0041136B as the
text regular comment . A repeatable comment can be seen at address 00411372 ,
00411386 and 00411392 . Only the last comment is a comment that was manually
entered. The other comments appear when an instruction ref erences an address (such as a
branch condition) that contains a repeatable comment.

00411365 m ov [ebp+var_214], eax


0041136B cmp [ebp+var_214], 0 ; regular comment
00411372 jnz short loc_411392 ; repeatable
comment
00411374 push offset sub_4110E0
00411379 call sub_40D060
0041137E add esp, 4
00411381 movzx edx, al
00411384 test edx, edx
00411386 jz short loc_411392 ; repeatable
comment
00411388 m ov dword_436B80, 1
00411392
00411392 loc_411392:
00411392
00411392 m ov dword_436B88, 1 ; repeatable
comment
0041139C push offset sub_4112C0

To add comments we use idc.MakeComm(ea, comment) and f or repeatable comments


we use idc.MakeRptCmt(ea, comment) . ea is the address and comment is a string we
would like added. The below code adds a comment every time an instruction zeroes out a
register or value with with XOR .

for func in idautils.Functions():


flags = idc.GetFunctionFlags(func)
# skip library & thunk functions
if flags & FUNC_LIB or flags & FUNC_THUNK:
continue
dism_addr = list(idautils.FuncItems(func))
for ea in dism_addr:
if idc.GetMnem(ea) == "xor":
if idc.GetOpnd(ea, 0) == idc.GetOpnd(ea, 1):
comment = "% s = 0" % (idc.GetOpnd(ea,0))
idc.MakeComm(ea, comment)

As previously described we loop through all f unctions by calling idautils.Functions()


and loop throug h all the instructions by calling list(idautils.FuncItems(func)) . We
read the mnemonic using idc.GetMnem(ea) and check it is equal to xor . If so, we verif y
the operands are equal with idc.GetOpnd(ea, n) . If equal, we create a string with the
operand and then make add a non-repeatable comment.

0040B0F7 xor al, al ; al = 0


0040B0F9 jmp short loc_40B163
To add a repeatable comment we would replace idc.MakeComm(ea, comment) with
MakeRptCmt(ea, comment) . This mig ht be a little more usef ul because we would see
ref erences to branches that zero out a value and likely return 0. To g et a comments we
simple use GetCommentEx(ea, repeatable) . ea is the address that contains the
comment and repeatable is a bool of True or False. To get the above comments we
would use the f ollowing code snippet.

Python>print hex(ea), idc.GetDisasm(ea)


0x40b0f7 xor al, al ; al = 0
Python>idc.GetCommentEx(ea, False)
al = 0

If the comment was repeatable we would replace idc.GetCommentEx(ea, False) with


idc.GetCommentEx(ea, True) . Instructions are not the only f ield that can have
comments added. Functions can also have comments added. To add f unction comment we
use idc.SetFunctionCmt(ea, cmt, repeatable) and to get a f unction comment we
call idc.GetFunctionCmt(ea, repeatable) . ea can be any address that is within the
boundaries of the start and end of the f unction. cmt is the string comment we would like
to add and repeatable is a boolean value if we want the comment to be repeatable or
not. This can be represented either 0 or f alse f or the comment not being repeatable or 1 or
True f or the comment to be repeatable. Having the f unction as repeatable will add a
comment f or when the comment is being called.

Python>print hex(ea), idc.GetDisasm(ea)


0x401040 push ebp
Python>idc.GetFunctionName(ea)
sub_401040
Python>idc.SetFunctionCmt(ea, "check out later", 1)
True

We print the address, disassembly and f unction name in the f irst couple of lines. We then
use idc.SetFunctionCmt(ea, comment, repeatable) to set a repatable comment of
"check out later" . If we look at the start of the f unction we will see our comment.

00401040 ; check out later


00401040 ; Attributes: bp-based frame
00401040
00401040 sub_401040 proc near
00401040 .
00401040 var_4 = dword ptr -4
00401040 arg_0 = dword ptr 8
00401040
00401040 push ebp
00401041 m ov ebp, esp
00401043 push ecx
00401044 push 723EB0D5h

Since the comment is repeatable, when there is a cross-ref ernece to the f unction we will
see the comment. This is a g reat place to add reminders or notes about a f unction.

00401C07 push ecx


00401C08 call sub_401040 ; check out later
00401C0D add esp, 4

Renaming f unctions and addresses is a commonly automated task, especially when dealing
with position independent code (PIC), packers or wrapper f unctions. The reason why this is
common in PIC or unpacked code is because the import table might not be present in the
dump. In the case of wrapper f unctions the f ull f unction simply calls an API.

10005B3E sub_10005B3E proc near


10005B3E
10005B3E dwBytes = dword ptr 8
10005B3E
10005B3E push ebp
10005B3F m ov ebp, esp
10005B41 push [ebp+dwBytes] ; dwBytes
10005B44 push 8 ; dwFlags
10005B46 push hHeap ; hHeap
10005B4C call ds:HeapAlloc
10005B52 pop ebp
10005B53 retn
10005B53 sub_10005B3E endp

In the above code the f unction could be called w_HeapAlloc . The w_ is short f or wrapper.
To rename an address we can use the f unction idc.MakeName(ea, name) . ea is the
address and name is the string name such as "w_HeapAlloc" . To rename a f unction ea
needs to be the f irst address of the f unction. To rename the f unction of our HeapAlloc
wrapper we would use the f ollowing code.

Python>print hex(ea), idc.GetDisasm(ea)


0x10005b3e push ebp
Python>idc.MakeName(ea, "w_HeapAlloc")
True

ea is the f irst address in the f unction and name is "w_HeapAlloc" .

10005B3E w_HeapAlloc proc near


10005B3E
10005B3E dwBytes = dword ptr 8
10005B3E
10005B3E push ebp
10005B3F m ov ebp, esp
10005B41 push [ebp+dwBytes] ; dwBytes
10005B44 push 8 ; dwFlags
10005B46 push hHeap ; hHeap
10005B4C call ds:HeapAlloc
10005B52 pop ebp
10005B53 retn
10005B53 w_HeapAlloc endp

Above we can see the f unction has been renamed. To conf irm it has been renamed we can
use idc.GetFunctionName(ea) to print the new f unction`s name.

Python>idc.GetFunctionName(ea)
w_HeapAlloc

Now that we have a g ood basis of knowledge. show an example of how we can use what
we have learned so f ar to automate the naming of wrapper f unctions. Please see the inline
comments to g et an idea about the logic.

im port idautils

def renam e_wrapper(name, func_addr):


if idc.MakeNameEx(func_addr, name, SN_NOWARN):
print "Function at 0x% x renamed % s" % ( func_addr
,idc.GetFunctionName(func))
else:
print "Rename at 0x% x failed. Function % s is being used."
% (func_addr, name)
return

def check_for_wrapper(func):
flags = idc.GetFunctionFlags(func)
# skip library & thunk functions
if flags & FUNC_LIB or flags & FUNC_THUNK:
return
dism_addr = list(idautils.FuncItems(func))
# get length of the function
func_length = len(dism_addr)
# if over 32 lines of instruction return
if func_length > 0x20:
return
func_call = 0
instr_cmp = 0
op = None
op_addr = None
op_type = None
# for each instruction in the function
for ea in dism_addr:
m = idc.GetMnem(ea)
if m == 'call' or m == 'jmp':
if m == 'jmp':
temp = idc.GetOperandValue(ea,0)
# ignore jump conditions within the function
boundaries
if temp in dism_addr:
continue
func_call += 1
# wrappers should not contain multiple function calls
if func_call == 2:
return
op_addr = idc.GetOperandValue(ea , 0)
op_type = idc.GetOpType(ea,0)
elif m == 'cmp' or m == 'test':
# wrappers functions should not contain much logic.
instr_cmp += 1
if instr_cmp == 3:
return
else:
continue
# all instructions in the function have been analyzed
if op_addr == None:
return
name = idc.Name(op_addr)
# skip mangled function names
if "[" in name or "$" in name or "?" in name or "@" in name
or name == "":
return
name = "w_" + name
if op_type == 7:
if idc.GetFunctionFlags(op_addr) & FUNC_THUNK:
rename_wrapper(name, func)
return
if op_type == 2 or op_type == 6:
rename_wrapper(name, func)
return

for func in idautils.Functions():


check_for_wrapper(func)

Example Output

Function at 0xa14040 renamed w_HeapFree


Function at 0xa14060 renamed w_HeapAlloc
Function at 0xa14300 renamed w_HeapReAlloc
Rename at 0xa14330 failed. Function w_HeapAlloc is being used.
Rename at 0xa14360 failed. Function w_HeapFree is being used.
Function at 0xa1b040 renamed w_RtlZeroMemory

Most of the code should be f amiliar. One notable dif f erence is the use of
idc.MakeNameEx(ea, name, flag) f rom rename_wrapper . We use this f unction
because idc.MakeName will throw a warning dialogue if the f unction name is already in use.
By passing a f lag value of SN_NOWARN or 256 we avoid the dialogue box. We could apply
some logic to rename the f unction to w_HeapFree_1 but f or brevity we will leave that out.

Accessing Raw Data

Being able to access raw data is essential when reverse eng ineering. Raw data is the binary
representation of the code or data. We can see the raw data or bytes of the instructions on
the lef t side f ollowing the address.

00A14380 8B 0D 0C 6D A2 00 m ov ecx, hHeap


00A14386 50 push eax
00A14387 6A 08 push 8
00A14389 51 push ecx
00A1438A FF 15 30 11 A2 00 call ds:HeapAlloc
00A14390 C3 retn
To access the data we f irst need to decide on the unit size. The naming convention of the
APIs used to access data is the unit size. To access a byte we would call idc.Byte(ea) or
to access a word we would call idc.Word(ea) , etc.

idc.Byte(ea)
idc.Word(ea)
idc.Dword(ea)
idc.Qword(ea)
idc.GetFloat(ea)
idc.GetDouble(ea)

If the cursor was at 00A14380 in the assembly f rom above we would have the f ollowing
output.

Python>print hex(ea), idc.GetDisasm(ea)


0xa14380 mov ecx, hHeap
Python>hex( idc.Byte(ea) )
0x8b
Python>hex( idc.Word(ea) )
0xd8b
Python>hex( idc.Dword(ea) )
0x6d0c0d8b
Python>hex( idc.Qword(ea) )
0x6a5000a26d0c0d8bL
Python>idc.GetFloat(ea) # Example not a float value
2.70901711372e+27
Python>idc.GetDouble(ea)
1.25430839165e+204

When writing decoders it is not always usef ul to get a sing le byte or read a dword but to
read a block of raw data. To read a specif ied size of bytes at an address we can use
idc.GetManyBytes(ea, size, use_dbg=False) . The last argument is optional and is
only needed if we wanted the debug gers memory.

Python>for byte in idc.GetManyBytes(ea, 6):


print "0x% X" % ord(byte),
0x8B 0xD 0xC 0x6D 0xA2 0x0

It should be noted that idc.GetManyBytes(ea, size) returns the char representation


of the byte(s). This is dif f erent than idc.Word(ea) or idc.Qword(ea) which returns an
integ er.
Patching

Sometimes when reversing malware the sample will have string s that are encoded. This is
done to slow down the analysis process and to thwart using a strings viewer to recover
indicators. In situations like this patching the IDB is usef ul. We could rename the address but
renaming is limited. This is due to the naming convention restrictions. To patch an address
with a value we can use the f ollowing f unctions.

idc.PatchByte(ea, value)
idc.PatchWord(ea, value)
idc.PatchDword(ea, value)

ea is the address and value is the integ er value that we would like to patch the IDB with.
The size of the value needs to match the size specif ied by the f unction name we choose.
say f or example that we f ound the f ollowing encoded strings.

.data:1001ED3C aGcquEUdg_bUfuD db 'gcqu^E]~UDG_B[uFU^DC',0


.data:1001ED51 align 8
.data:1001ED58 aGcqs_cuufuD db 'gcqs\_CUuFU^D',0
.data:1001ED66 align 4
.data:1001ED68 aWud@uubQU db 'WUD@UUB^Q]U',0
.data:1001ED74 align 8

During our analysis we were able to identif y the decoder f unction.

100012A0 push esi


100012A1 m ov esi, [esp+4+_size]
100012A5 xor eax, eax
100012A7 test esi, esi
100012A9 jle short _ret
100012AB m ov dl, [esp+4+_key] ; assign key
100012AF m ov ecx, [esp+4+_string]
100012B3 push ebx
100012B4
100012B4 _loop: ;
100012B4 m ov bl, [eax+ecx]
100012B7 xor bl, dl ; data ^ key
100012B9 m ov [eax+ecx], bl ; save off byte
100012BC inc eax ; index/count
100012BD cmp eax, esi
100012BF jl short _loop
100012C1 pop ebx
100012C2
100012C2 _ret: ;
100012C2 pop esi
100012C3 retn

The f unction is a standard XOR decoder f unction with arg uments of size, key and a decoded
buf f er.

Python>start = idc.SelStart()
Python>end = idc.SelEnd()
Python>print hex(start)
0x1001ed3c
Python>print hex(end)
0x1001ed50
Python>def xor(size, key, buff):
for index in range(0,size):
cur_addr = buff + index
temp = idc.Byte( cur_addr ) ^ key
idc.PatchByte(cur_addr, temp)
Python>
Python>xor(end - start, 0x30, start)
Python>idc.GetString(start)
WSAEnumNetworkEvents

We select the hig hlig hted data address start and end using idc.SelStart() and
idc.SelEnd() . Then we have a f unction that reads the byte by calling idc.Byte(ea) ,
XOR the byte with key passed to the f unction and then patch the byte by calling
idc.PatchByte(ea, value) .

Input and Output

Importing and exporting f iles into IDAPython can be usef ul when we do not know the f ile
path or when we do not know where the user wants to save their data. To import or save a
f ile by name we use AskFile(forsave, mask, prompt) . forsave can be a value of 0
if we want to open a dialog box or 1 is we want to open the save dialog box. mask is the
f ile extension or patten. If we want to open only .dll f iles we would use a mask of
"*.dll" and prompt is the title of the window. A good example of input and output and
selecting data is the f ollowing IO_DATA class.
im port sys
im port idaapi

class IO_DAT A():


def __init__(self):
self.start = SelStart()
self.end = SelEnd()
self.buffer = ''
self.ogLen = None
self.status = T rue
self.run()

def checkBounds(self):
if self.start is BADADDR or self.end is BADADDR:
self.status = False

def getData(self):
'''get data between start and end put them into
object.buffer'''
self.ogLen = self.end - self.start
self.buffer = ''
try:
for byte in idc.GetManyBytes(self.start, self.ogLen):
self.buffer = self.buffer + byte
except:
self.status = False
return

def run(self):
'''basically main'''
self.checkBounds()
if self.status == False:
sys.stdout.write('ERROR: Please select valid data\n')
return
self.getData()

def patch(self, temp = None):


'''patch idb with data in object.buffer'''
if temp != None:
self.buffer = temp
for index, byte in enumerate(self.buffer):
idc.PatchByte(self.start+index, ord(byte))

def im portb(self):
'''import file to save to buffer'''
fileName = idc.AskFile(0, "*.*", 'Import File')
try:
self.buffer = open(fileName, 'rb').read()
except:
sys.stdout.write('ERROR: Cannot access file')

def export(self):
'''save the selected buffer to a file'''
exportFile = idc.AskFile(1, "*.*", 'Export Buffer')
f = open(exportFile, 'wb')
f.write(self.buffer)
f.close()

def stats(self):
print "start: % s" % hex(self.start)
print "end: % s" % hex(self.end)
print "len: % s" % hex(len(self.buffer))

With this class data can be selected saved to a buf f er and then stored to a f ile. This is usef ul
f or encoded or encrypted data in an IDB. We can use IO_DATA to select the data decode
the buf f er in Python and then patch the IDB. Example of how to use the IO_DATA class.

Python>f = IO_DATA()
Python>f.stats()
start: 0x401528
end: 0x401549
len: 0x21

Rather than explaining each line of the code it would be usef ul f or the reader to g o over the
f unctions one by one and see how they work. The below bullet points explain each variable
and what the f unctions does. obj is whatever variable we assign the class. f is the obj
in f = IO_DATA() .

obj.start
contains the address of the start of the selected of f set

. obj.end
contains the address of the end of the selected of f set.

obj.buf f er
contains the binary data.

obj.og Len
contains the size of the buf f er.

obj.g etData()
copies the binary data between obj.start and obj.end to obj.buf f er obj.run() the
selected data is copied to the buf f er in a binary f ormat

obj.patch()
patch the IDB at obj.start with the data in the obj.buf f er.

obj.patch(d)
patch the IDB at obj.start with the argument data.

obj.importb()
opens a f ile and saves the data in

obj.buf f er. obj.export()


exports the data in obj.buf f er to a save as f ile.

obj.stats()
print hex of obj.start, obj.end and obj.buf f er length.

Intel Pin Logger

Pin is a dynamic binary instrumentation f ramework f or the IA-32 and x86-64 . Combing the
dynamic analysis results of PIN with the static analysis of IDA makes it a powerf ul mix. A
hurdle f or combing IDA and Pin is the initial setup and running of Pin. The below steps are
the 30 second (minus downloads) g uide to installing , executing a Pintool that traces an
executable and adds the executed addresses to an IDB.

Notes about steps


* Pre-install Visual Studio 2010 (vc10) or 2012 (vc11)
* If executing malware do steps 1,2,6,7,8,9,10 & 11 in an
analysis machine
1. Download PIN
* https://software.intel.com/en-us/articles/pintool-downloads
* Compiler Kit is for version of Visual Studio you are
using.
2. Unzip pin to the root dir and renam e the folder to "pin"
* example path C:\pin\
* There is a known but that Pin will not always parse the
arguments correctly if there is spacing in the file path
3. Open the following file in Visual Studio
* C:\pin\source\tools\MyPinTool\MyPinTool.sln
- This file contains all the needed setting for Visual
Studio.
- Useful to back up and reuse the directory when starting
new pintools.
4. Open the below file, then cut and paste the code into
MyPinTool.cpp (currently opened in Visual Studio)
* C:\pin\source\tools\ManualExamples\itrace.cpp
- This directory along with ../SimpleExamples is very
useful for example code.
5. Build Solution (F7)
6. Copy traceme.exe to C:\pin
7. Copy compiled MyPinTool.dll to C:\pin
* path C:\pin\source\tools\MyPinTool\Debug\MyPinTool.dll
8. Open a com m and line and set the working dir to C:\pin
9. Execute the following com m and
* pin -t traceme.exe -- MyPinTool.dll
- "-t" = name of file to be analyzed
- "-- MyPinTool.dll" = specifies that pin is to use the
following pintool/dll
10. While pin is executing open traceme.exe in IDA.
11. Once pin has completed (com m and line will have returned)
execute the following in IDAPython
* The pin output (itrace.out) must be in the working dir of
the IDB. \

itrace.cpp is a pintool that prints the EIPs of every instruction executed to


itrace.out . The data will look like the f ollowing output.

00401500
00401506
00401520
00401526
00401549
0040154F
0040155E
00401564
0040156A

Af ter the pintools has executed we can run the f ollowing IDAPython code to add comments
to all the executed addresses. The output f ile itrace.out will need to be in the working
directory of the IDB.
f = open('itrace.out', 'r')
lines = f.readlines()

for y in lines:
y = int(y,16)
idc.SetColor(y, CIC_ITEM, 0xfffff)
com = idc.GetCommentEx(y,0)
if com == None or 'count' not in com:
idc.MakeComm(y, "count:1")
else:
try:
count = int(com.split(':')[1],16)
except:
print hex(y)
tmp = "count:0x% x" % (count + 1)
idc.MakeComm(y, tmp)
f.close()

We f irst open up itrace.out and read all lines into a list. We then iterate over each line in
the list. Since the address in the output f ile was in hexadecimal string f ormat we need to
convert it into an integ er.

.text:00401500 loc_401500: ; CODE


XREF: sub_4013E0+106​j
.text:00401500 cmp ebx, 457F4C6Ah ;
count:0x16
.text:00401506 ja short loc_401520 ;
count:0x16
.text:00401508 cmp ebx, 1857B5C5h ; count:1
.text:0040150E jnz short loc_4014E0 ; count:1
.text:00401510 mov ebx, 80012FB8h ; count:1
.text:00401515 jmp short loc_4014E0 ; count:1
.text:00401515 ; --------------------------------------------------
-------
.text:00401517 align 10h
.text:00401520
.text:00401520 loc_401520: ; CODE
XREF: sub_4013E0+126​j
.text:00401520 cmp ebx, 4CC5E06Fh ;
count:0x15
.text:00401526 ja short loc_401549 ;
count:0x15
Batch File Generation

Sometimes it can be usef ul to create IDBs or ASMs f or all the f iles in a directory. This can
help save time when analyzing a set of samples that are part of the same f amily of
malware. It’s much easier to do batch f ile g eneration than doing it manually on a large set.
To do batch analysis we will need to pass the -B arg ument to the text idaw.exe . The
below code can be copied to the directory that contains all the f iles we would like to
g enerate f iles f or.

im port os
im port subprocess
im port glob
paths = glob.glob("*")
ida_path = os.path.join(os.environ['PROGRAMFILES'], "IDA",
"idaw.exe")

for file_path in paths:


if file_path.endswith(".py"):
continue
subprocess.call([ida_path, "-B", file_path])

We use glob.glob("*") to get a list of all f iles in the directory. The argument can be
modif ied if we wanted to only select a certain regular expression pattern or f ile type. If we
wanted to only g et f iles with a .exe extension we would use glob.glob("*.exe") .
os.path.join(os.environ['PROGRAMFILES'], "IDA", "idaw.exe") is used to the
g et the path to idaw.exe . Some versions of IDA have a f older name with the version
number present. If this is the case the argument "IDA" will need to be modif ied to the
f older name. Also, the whole command mig ht have to be modif ied if we choose to use a
non-standard install location f or IDA. For now lets assume the install path f or IDA is
C:\Program Files\IDA . Af ter we f ound the path we loop throug h all the f iles in the
directory that do not contain a .py extension and then pass them to IDA. For an individual
f ile it would look like C:\Prog ram Files\IDA\idaw.exe -B bad_f ile.exe`. Once ran it would
g enerate an ASM and IDB f or the f ile. All f iles will be written in the working directory. An
example output can be seen below.

C:\injected>dir

0?/**/____ 09:30 AM <DIR> .


0?/**/____ 09:30 AM <DIR> ..
0?/**/____ 10:48 AM 167,936 bad_file.exe
0?/**/____ 09:29 AM 270 batch_analysis.py
0?/**/____ 06:55 PM 104,889 injected.dll

C:\injected>python batch_analysis.py

Thank you for using IDA. Have a nice day!

C:\injected>dir

0?/**/____ 09:30 AM <DIR> .


0?/**/____ 09:30 AM <DIR> ..
0?/**/____ 09:30 AM 506,142 bad_file.asm
0?/**/____ 10:48 AM 167,936 bad_file.exe
0?/**/____ 09:30 AM 1,884,601 bad_file.idb
0?/**/____ 09:29 AM 270 batch_analysis.py
0?/**/____ 09:30 AM 682,602 injected.asm
0?/**/____ 06:55 PM 104,889 injected.dll
0?/**/____ 09:30 AM 1,384,765 injected.idb

bad_file.asm , bad_file.idb , injected.asm and injected.idb were generated


f iles.

Executing Scripts

IDAPython scripts can be executed f rom the command line. We can use the f ollowing code
to count each instruction in the IDB and then write it to a f ile named instru_count.txt .

im port idc
im port idaapi
im port idautils

idaapi.autoWait()

count = 0
for func in idautils.Functions():
# Ignore Library Code
flags = idc.GetFunctionFlags(func)
if flags & FUNC_LIB:
continue
for instru in idautils.FuncItems(func):
count += 1

f = open("instru_count.txt", 'w')
print_me = "Instruction Count is % d" % (count)
f.write(print_me)
f.close()

idc.Exit(0)

From a command line perspective the two most important f unctions are
idaapi.autoWait() and idc.Exit(0) . When IDA opens a f ile it is important to wait f or
the analysis to complete. This allows IDA to populate all f unctions, structures, or other
values that are based on IDA’s analysis eng ine. To wait f or the analysis to complete we call
idaapi.autoWait() . It will wait/pause until IDA is completed with its analysis. Once the
analysis is completed it will return control back to the script. It is important to execute this
at the beg inning of the script bef ore we call any IDAPython f unctions that rely on the
analysis to be completed. Once our script has executed we will need to call idc.Exit(0) .
This will stop execution of our script, close out the database and return to the caller of the
script. If not our IDB would not be closed properly.

If we wanted to execute the IDAPython to count all lines we IDB we would execute the
f ollowing command line.

C:\Cridix\idbs>"C:\Program Files (x86)\IDA 6.3\idaw.exe" -A -


Scount.py cur-analysis.idb

-A is f or Autonomous mode and -S signals f or IDA to run a script on the IDB once it has
opened. In the working directory we would see a f ile named instru_count.txt that
contained a count of all instructions.

You might also like