Assembler On x86 Platform - New
Assembler On x86 Platform - New
1
References
https://www.intel.com/content/www/us/en/developer/a
rticles/technical/intel-sdm.html
https://www.intel.com/content/www/us/en/developer/o
verview.html#gs.kemo1i
2
IA-32 và 64 ISA
IA-32 is the Instruction Set Architecture (ISA) of Intel 32-bit
3
MICROPROCESSOR PROCESSOR REGISTERS
(INTEL x86-32bit)
8 general-purpose registers – 32 bits.
6 segment registers – 16 bits.
1 EFLAGS register – 32 bits.
1 EIP, Instruction Pointer register – 32 bits.
4
General-Purpose Registers
Register Name Size (in bits) Purpose
AL, AH/AX/ Main register used in arithmetic calculations. Also known as accumulator, as it holds results
8,8/16/32
EAX of arithmetic operations and function return values.
BL, BH/BX/ The Base Register. Pointer to data in the DS segment. Used to store the base address of
8,8/16/32
EBX the program.
CL, CH/CX/ The Counter register is often used to hold a value representing the number of times a
8,8/16/3
ECX process is to be repeated. Used for loop and string operations.
DL, DH/DX/
8,8/16/32 A general purpose registers. Also used for I/O operations. Helps extend EAX to 64-bits.
EDX
Source Index register. Pointer to data in the segment pointed to by the DS register. Used as
SI/ESI 16/32 an offset address in string and array operations. It holds the address from where to read
data.
Destination Index register. Pointer to data (or destination) in the segment pointed to by the
DI/EDI 16/32 ES register. Used as an offset address in string and array operations. It holds the implied
write address of all string operations.
Base Pointer. Pointer to data on the stack (in the SS segment). It points to the bottom of the
BP/EBP 16/32
current stack frame. It is used to reference local variables.
Stack Pointer (in the SS segment). It points to the top of the current stack frame. It is used to
SP/ESP 16/32
reference local variables.
5
THE SEGMENT REGISTERS
Segmen
t Size
Registe (bits) Purpose
r
6
MEMORY ORGANIZATION
Registers, Physical memory: Random Access Memory (RAM)
Inside processor address space that will be used for addressing the RAM
The physical address space of the processor ranges from zero to the maximum
7
……
Operating systems that employ the processor will use processor’s memory
directly address physical memory. Instead they access memory using any of
three memory model: flat, segmented or real-address mode.
8
FLAT MEMORY MODEL
Memory appears to the program as a single, continuous address space,
9
SEGMENTED MEMORY MODEL
Memory appears to a program as a group of independent address spaces called
segments.
To address a byte in a segment, a program must issue a logical address (often referred
All the segments that are defined for a system are mapped into the processor’s linear
address space.
To access a memory location, the processor translates each logical address into a linear
10
….
Segmented addressing uses two components to specify a memory location:
11
12
REAL ADDRESS MODE MEMORY MODEL
13
THE EFLAGS
The 32 bits EFLAGS register contains a group of status flags, a control flags and a
14
x86-64 Architecture Diagram
15
Registers
Application Programmers generally use only the general
purpose registers, floating point registers, XMM, and YMM
registers.
16
General Purpose Registers
The registers are called R0...R15
Used for integer arithmetic and logic, and to hold both data and pointers to
memory
Can access the lower order 32-bits of each register using the names
R0D...R15D. The “D” stands for “doubleword”
Can access the lower order 16-bits of each register using the names
R0W...R15W
Can access the lower order 8-bits of each register using the names R0B...R15B
R0...R7 have aliases RAX, RCX, RBX, RDX, RSP, RBP, RSI, RDI, respectively.
R0D...R7D have aliases EAX, ECX, EBX, EDX, ESP, EBP, ESI, EDI,
respectively.
R0W...R7W have aliases AX, CX, BX, DX, SP, BP, SI, DI, respectively.
R0B...R7B have aliases AL, CL, BL, DL, SPL, BPL, SIL, DIL, respectively
17
Segment Registers
These are CS, DS, SS, ES, FS, and GS
18
XMM Registers
These are 128-bits wide. They are named XMM0...XMM15.
Use them for floating-point and integer arithmetic.
Can do operations on 128-bit integers, but can also take
advantage of their ability to do operations in parallel:
Two 64-bit integer operations in parallel
Four 32-bit integer operations in parallel
Eight 16-bit integer operations in parallel
Sixteen 8-bit integer operations in parallel
Two 64-bit floating-point operations in parallel
Four 32-bit floating-point operations in parallel
19
YMM Registers
These are 256-bits wide. They are named YMM0...YMM15.
Use them for floating-point arithmetic.
Can do operations on 128-bit integers, but can also take
advantage of their ability to do operations in parallel:
Four 64-bit floating-point operations in parallel
Eight 32-bit floating-point operations in parallel
20
FPU Registers
There are eight registers used for computing with 80-bit
floating point values.
The registers don’t have names because they are used in a
stack-like fashion.
21
Other Registers
The 8 32-bit processor control registers: CR0, CR1, CR2,
CR3, CR4, CR5, CR6, CR7. The lower 16 bits of CR0 is
called the Machine Status Word (MSW).
The 4 16-bit table registers: GDTR, IDTR, LDTR and TR.
The 8 32-bit debug registers: DR0, DR1, DR2, DR3, DR4,
DR5, DR6 and DR7.
The 5 test registers: TR3, TR4, TR5, TR6 and TR7.
The memory type range registers
The machine specific registers
The machine check registers
22
Addressing Memory
In protected mode, applications can choose a flat or segmented
memory model; in real mode only a 16-bit segmented model is
available.
Most programmers will only use protected mode and a flat-
memory model:
A memory reference has four parts and is often written as
[SELECTOR : BASE + INDEX * SCALE + OFFSET]
The selector is one of the six segment registers; the base is one of the eight
GPRs; the index is any of the GPRs except ESP; the scale is 1, 2, 4, or 8; and
the offset is any 32-bit number.
Example: [fs:ecx+esi*8+93221]
The minimal reference consists of only a base register or only an offset; a
scale can only appear if there is an index present
Sometimes the memory reference is written like this:
selector
23
offset(base,index,scale)
Data Types
24
Little Endianness
The IA-32 is little endian, meaning the least significant bytes come first in memory.
For example:
25
Flags Register
26
INSTRUCTION POINTER REGISTER –
EIP(32bit)/RIP(64bit)
EIP/RIP register contains the offset in the current code segment for the next
instruction to be executed
27
….
The only way to read the EIP/RIP register is to execute the CALL
instruction and then read the value of the return instruction pointer from the
function stack
when the CALL instruction executed, the EIP/RIP content of the next
address immediately after the CALL, is saved on the stack as return address
of the function. Then, the EIP/RIP can be loaded indirectly by modifying the
value of a return instruction pointer on the function stack and executing a
return, RET/IRET instruction.
28
ASSEMBLER on x86 PLATFORM
29
Machine Language
Here is the machine language for a chunk of code (for an IA-32
processor) that takes (from the stack) a single 32-bit integer
argument — let's call it n — and returns through eax the value 3n+1
if n is even and 4n-3 if n is odd.
1000101101001100001001000000010010001011110000011001100100110011
1100001001001011110000101000001111100000000000010011001111000010
0010101111000010100011010100010001001001000000010111010000000111
1000110100000100100011011111110111111111111111000011
Hexa:
8b 4c 24 04 8b c1 99 33 c2 2b c2 83 e0 01 33 c2
2b c2 8d 44 49 01 74 07 8d 04 8d fd ff ff ff c3
30
Assembly
What is 8B 4C 24 04 ?
mov ecx, [esp+4]
This human-friendly recoding of the machine language is
called assembly language
31
Disassembly
When go from machine language to assembly language, the
process is called "disassembling".
The disassembled code (using NASM syntax):
32
Introduction of x86 assembly language
programming
The topic of x86 assembly language programming is messy because:
There are many different assemblers out there: MASM, NASM, gas, as86,
TASM, a86,Terse, etc. All use radically different assembly languages.
There are differences in the way you have to code for Linux, OS/X, Windows,
etc.
Many different object file formats exist: ELF, COFF, Win32, OMF, a.out for
Linux, a.out for FreeBSD, rdf, IEEE-695, as86, etc.
When be calling functions residing in the operating system or other libraries so
have to know some technical details about how libraries are linked, and not all
linkers work the same way.
Modern x86 processors run in either 32 or 64-bit mode; there are quite a few
differences between these.
33
The programming process
34
Common Assemblers
MASM, the Microsoft Assembler. It outputs OMF files (but Microsoft’s
linker can convert them to win32 format). It supports a massive and
clunky assembly language. Memory addressing is not intuitive. The
directives required to set up a program make programming unpleasant.
GAS, the GNU assember. This uses the rather ugly AT&T-style syntax so
many people do not like it; however, you can configure it to use and
understand the Intel-style. It was designed to be part of the back end of
the GNU compiler collection (gcc).
NASM, the "Netwide Assembler." It is free, small, and best of all it can
output zillions of different types of object files. The language is much
more sensible than MASM in many respects.
35
Object file formats
OMF: used in DOS but has 32-bit extensions for Windows. Old.
AOUT: used in early Linux and BSD variants
COFF: "Common object file format"
Win, Win32: Microsoft’s version of COFF, not exactly the same!
Replaces OMF.
Win64: Microsoft’s format for Win64.
ELF, ELF32: Used in modern 32-bit Linux and elsewhere
ELF64: Used in 64-bit Linux and elsewhere
macho32: NeXTstep/OpenStep/Rhapsody/Darwin/OS X 32-bit
macho64: NeXTstep/OpenStep/Rhapsody/Darwin/OS X 64-bit
36
Linker
Need to get a linker that (1) understands the object file
formats, and (2) can write executables for the operating
systems want to run code on.
Some linkers out there include:
LINK.EXE, for Microsoft operating systems.
ld, which exists on all Unix systems; Windows programmers
get this in any gcc distribution.
37
Assembly Basic Syntax
An assembly program can be divided into three sections:
The data section
The bss section
The text section
38
The data Section
The data section is used for declaring initialized data or
constants
section .data
39
The bss Section
bss (block starting symbol)
The bss section is used for declaring variables. The syntax for declaring
bss section is:
section .bss
40
The text section
The text section is used for keeping the actual code. This section must begin with
the declaration global main, which tells the kernel where the program execution
begins.
section .text
global main
main:
41
Comments
Assembly language comment begins with a semicolon (;)
42
Assembly Language Statements
Assembly language programs consist of three types of statements:
Assembler directives
Macros
43
Executable instruction
Executable instructions tell the processor what to do
44
Assembler directives
The assembler directives or pseudo-ops tell the assembler about the
instructions
45
Macros
Macros are basically a text substitution mechanism
46
Syntax of Assembly Language Statements
47
Examples of typical assembly
language statements
48
Programming Using System Calls
64-bit Linux installations use the processor’s SYSCALL
then the arguments, if any, in RDI, RSI, RDX, R10, R8, and
R9, respectively.
49
Structure of a NASM Program
NASM is line-based. Most programs consist of directives followed by one or
more sections. Lines can have an optional label. Most lines have
an instruction followed by zero or more operands.
50
Example Hello World program x86-32bit
section .text
global main ;must be declared for linker (ld)
main: ;tells linker entry point
mov eax,4 ;system call number (sys_write)
mov ebx,1 ;file descriptor (stdout)
mov ecx,msg ;message to write
mov edx,len ;message length
int 0x80 ;call kernel
section .data
msg db 'Hello, world!', 0xa ;our dear string
len equ $ - msg ;length of our dear string
51
System Calls
The operating system is responsible for
Process Management (starting, running, stopping processes)
File Management(creating, opening, closing, reading, writing,
renaming files)
Memory Management (allocating, deallocating memory)
Other stuff (timing, scheduling, network management)
An application program makes a system call to get the
operating system to perform a service for it, like reading
from a file.
One nice thing about syscalls is that we don't have to link
with a C library, so our executables can be much smaller.
52
System Calls in 32-bit Linux
53
Some of the system calls (32 bit)
All the syscalls are listed in /usr/include/asm/unistd.h
54
system call sys_write
55
system call sys_exit
56
Example Hello World program x86-32bit
section .text
global main ;must be declared for linker (ld)
main: ;tells linker entry point
mov eax,4 ;system call number (sys_write)
mov ebx,1 ;file descriptor (stdout)
mov ecx,msg ;message to write
mov edx,len ;message length
int 0x80 ;call kernel
section .data
msg db 'Hello, world!', 0xa ;our dear string
len equ $ - msg ;length of our dear string
57
System Calls in 64-bit Linux
To make a system call in 64-bit Linux, place the system call
number in rax, then its arguments, in order, in rdi, rsi, rdx,
r10, r8, and r9, then invoke syscall.
Some system calls return information, usually in rax. A value
in the range between -4095 and -1 indicates an error, it is -
errno.
The system call destroys rcx and r11 but others registers are
saved across the system call.
58
59
Example Hello World program x86-64bit
Use system calls for writing to a file (call number 1) and
exiting a process (call number 60). Here it is in the NASM
assembly language:
60
Compile and link on 64-bit Linux
Save above program with a name: example hello.asm
$nasm -f elf64 hello.asm -o hello.o
$ld hello.o -o hello
Run the program
$./hello
61
Example: using gcc compiler
62
The End
63
Homework
Create an Ubuntu VM
Installing NASM assembler
Testing all examples in the lesson
Write a program in assembly language that reads a string
from the keyboard and displays it on the screen?
64