0% found this document useful (0 votes)
9 views64 pages

Assembler On x86 Platform - New

The document provides an overview of x86 architecture, detailing the IA-32 and Intel 64 ISAs, as well as the structure and purpose of various registers, including general-purpose, segment, and special registers. It explains memory organization, different memory models (flat, segmented, real address), and the use of EFLAGS and instruction pointers. Additionally, it discusses assembly language programming, common assemblers, object file formats, and the assembly syntax used in x86 programming.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views64 pages

Assembler On x86 Platform - New

The document provides an overview of x86 architecture, detailing the IA-32 and Intel 64 ISAs, as well as the structure and purpose of various registers, including general-purpose, segment, and special registers. It explains memory organization, different memory models (flat, segmented, real address), and the use of EFLAGS and instruction pointers. Additionally, it discusses assembly language programming, common assemblers, object file formats, and the assembly syntax used in x86 programming.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

The Basic of x86 Architecture

1
References
 https://www.intel.com/content/www/us/en/developer/a
rticles/technical/intel-sdm.html
 https://www.intel.com/content/www/us/en/developer/o
verview.html#gs.kemo1i

2
IA-32 và 64 ISA
 IA-32 is the Instruction Set Architecture (ISA) of Intel 32-bit

processor and Intel 64 ISA is an expansion for the 64-bit


processor.

 Intel 64 ISA is referred to as x86-64.

3
MICROPROCESSOR PROCESSOR REGISTERS
(INTEL x86-32bit)
 8 general-purpose registers – 32 bits.
 6 segment registers – 16 bits.
 1 EFLAGS register – 32 bits.
 1 EIP, Instruction Pointer register – 32 bits.

4
General-Purpose Registers
Register Name Size (in bits) Purpose

AL, AH/AX/ Main register used in arithmetic calculations. Also known as accumulator, as it holds results
8,8/16/32
EAX of arithmetic operations and function return values.

BL, BH/BX/ The Base Register. Pointer to data in the DS segment. Used to store the base address of
8,8/16/32
EBX the program.

CL, CH/CX/ The Counter register is often used to hold a value representing the number of times a
8,8/16/3
ECX process is to be repeated. Used for loop and string operations.

DL, DH/DX/
8,8/16/32 A general purpose registers. Also used for I/O operations. Helps extend EAX to 64-bits.
EDX
Source Index register. Pointer to data in the segment pointed to by the DS register. Used as
SI/ESI 16/32 an offset address in string and array operations. It holds the address from where to read
data.
Destination Index register. Pointer to data (or destination) in the segment pointed to by the
DI/EDI 16/32 ES register. Used as an offset address in string and array operations. It holds the implied
write address of all string operations.
Base Pointer. Pointer to data on the stack (in the SS segment). It points to the bottom of the
BP/EBP 16/32
current stack frame. It is used to reference local variables.
Stack Pointer (in the SS segment). It points to the top of the current stack frame. It is used to
SP/ESP 16/32
reference local variables.
5
THE SEGMENT REGISTERS
Segmen
t Size
Registe (bits) Purpose
r

Code segment register. Base location of code


CS 16 section (.text section). Used for fetching
instructions.
These registers are used to break up a
Data segment register. Default location for
program into parts. As it executes, the
DS 16 variables (.data section). Used for data
segment registers are assigned the
accesses.
base values of each segment. From
Extra segment register. Used during string
ES 16 here, offset values are used to access
operations.
each command in the program.
Stack segment register. Base location of the
SS 16 stack segment. Used when implicitly using SP
or ESP or when explicitly using BP, EBP.
FS 16 Extra segment register.
GS 16 Extra segment register.

6
MEMORY ORGANIZATION
 Registers, Physical memory: Random Access Memory (RAM)

 Inside processor address space that will be used for addressing the RAM

 Logically the address space is organized as a sequence of 8-bit bytes.

 Each byte is assigned a unique address, called a physical address

 The physical address space of the processor ranges from zero to the maximum

of 232 -1(4 GB) or 236 -1(64GB) if physical address extension mechanism is


used

7
……
 Operating systems that employ the processor will use processor’s memory

management facilities (Memory Management Unit – MMU) to access the


actual memory

 These facilities provide features such as segmentation and paging

 When using the processor’s memory management facilities, programs do not

directly address physical memory. Instead they access memory using any of
three memory model: flat, segmented or real-address mode.

8
FLAT MEMORY MODEL
 Memory appears to the program as a single, continuous address space,

called a linear address space

9
SEGMENTED MEMORY MODEL
 Memory appears to a program as a group of independent address spaces called

segments.

 Code, data and stacks are typically contain in separate segments

 To address a byte in a segment, a program must issue a logical address (often referred

to as a far pointer), which consists of a segment selector and an offset.

 All the segments that are defined for a system are mapped into the processor’s linear

address space.

 To access a memory location, the processor translates each logical address into a linear

address.This translation is transparent to the application/program

10
….
Segmented addressing uses two components to specify a memory location:

 A segment value and

 An offset within that segment.

11
12
REAL ADDRESS MODE MEMORY MODEL

 This model used for the Intel 8086 processor.

 This memory model supported in the Intel 32 bits architecture for

compatibility with existing programs written to run on the Intel 8086


processor.

13
THE EFLAGS
 The 32 bits EFLAGS register contains a group of status flags, a control flags and a

group of system flags

14
x86-64 Architecture Diagram

15
Registers
 Application Programmers generally use only the general
purpose registers, floating point registers, XMM, and YMM
registers.

16
General Purpose Registers
 The registers are called R0...R15
 Used for integer arithmetic and logic, and to hold both data and pointers to
memory
 Can access the lower order 32-bits of each register using the names
R0D...R15D. The “D” stands for “doubleword”
 Can access the lower order 16-bits of each register using the names

R0W...R15W
 Can access the lower order 8-bits of each register using the names R0B...R15B
 R0...R7 have aliases RAX, RCX, RBX, RDX, RSP, RBP, RSI, RDI, respectively.
 R0D...R7D have aliases EAX, ECX, EBX, EDX, ESP, EBP, ESI, EDI,
respectively.
 R0W...R7W have aliases AX, CX, BX, DX, SP, BP, SI, DI, respectively.
 R0B...R7B have aliases AL, CL, BL, DL, SPL, BPL, SIL, DIL, respectively

17
Segment Registers
 These are CS, DS, SS, ES, FS, and GS

18
XMM Registers
 These are 128-bits wide. They are named XMM0...XMM15.
Use them for floating-point and integer arithmetic.
 Can do operations on 128-bit integers, but can also take
advantage of their ability to do operations in parallel:
 Two 64-bit integer operations in parallel
 Four 32-bit integer operations in parallel
 Eight 16-bit integer operations in parallel
 Sixteen 8-bit integer operations in parallel
 Two 64-bit floating-point operations in parallel
 Four 32-bit floating-point operations in parallel

19
YMM Registers
 These are 256-bits wide. They are named YMM0...YMM15.
Use them for floating-point arithmetic.
 Can do operations on 128-bit integers, but can also take
advantage of their ability to do operations in parallel:
 Four 64-bit floating-point operations in parallel
 Eight 32-bit floating-point operations in parallel

20
FPU Registers
 There are eight registers used for computing with 80-bit
floating point values.
 The registers don’t have names because they are used in a
stack-like fashion.

21
Other Registers
 The 8 32-bit processor control registers: CR0, CR1, CR2,
CR3, CR4, CR5, CR6, CR7. The lower 16 bits of CR0 is
called the Machine Status Word (MSW).
 The 4 16-bit table registers: GDTR, IDTR, LDTR and TR.
 The 8 32-bit debug registers: DR0, DR1, DR2, DR3, DR4,
DR5, DR6 and DR7.
 The 5 test registers: TR3, TR4, TR5, TR6 and TR7.
 The memory type range registers
 The machine specific registers
 The machine check registers

22
Addressing Memory
 In protected mode, applications can choose a flat or segmented
memory model; in real mode only a 16-bit segmented model is
available.
 Most programmers will only use protected mode and a flat-
memory model:
 A memory reference has four parts and is often written as
[SELECTOR : BASE + INDEX * SCALE + OFFSET]
The selector is one of the six segment registers; the base is one of the eight
GPRs; the index is any of the GPRs except ESP; the scale is 1, 2, 4, or 8; and
the offset is any 32-bit number.
 Example: [fs:ecx+esi*8+93221]
 The minimal reference consists of only a base register or only an offset; a
scale can only appear if there is an index present
 Sometimes the memory reference is written like this:
selector
23
offset(base,index,scale)
Data Types

24
Little Endianness
The IA-32 is little endian, meaning the least significant bytes come first in memory.
For example:

25
Flags Register

 Many instructions cause the flags register to be updated.

26
INSTRUCTION POINTER REGISTER –
EIP(32bit)/RIP(64bit)

 EIP/RIP register contains the offset in the current code segment for the next

instruction to be executed

 EIP/RIP cannot be accessed directly by software. It is controlled implicitly

by control-transfer instructions such as JMP, JCC, CALL, RET and IRET,


interrupts and exceptions

27
….
 The only way to read the EIP/RIP register is to execute the CALL

instruction and then read the value of the return instruction pointer from the
function stack

 when the CALL instruction executed, the EIP/RIP content of the next

address immediately after the CALL, is saved on the stack as return address
of the function. Then, the EIP/RIP can be loaded indirectly by modifying the
value of a return instruction pointer on the function stack and executing a
return, RET/IRET instruction.

28
ASSEMBLER on x86 PLATFORM

29
Machine Language
 Here is the machine language for a chunk of code (for an IA-32
processor) that takes (from the stack) a single 32-bit integer
argument — let's call it n — and returns through eax the value 3n+1
if n is even and 4n-3 if n is odd.
1000101101001100001001000000010010001011110000011001100100110011
1100001001001011110000101000001111100000000000010011001111000010
0010101111000010100011010100010001001001000000010111010000000111
1000110100000100100011011111110111111111111111000011

Hexa:
8b 4c 24 04 8b c1 99 33 c2 2b c2 83 e0 01 33 c2
2b c2 8d 44 49 01 74 07 8d 04 8d fd ff ff ff c3

30
Assembly
 What is 8B 4C 24 04 ?
mov ecx, [esp+4]
 This human-friendly recoding of the machine language is
called assembly language

31
Disassembly
 When go from machine language to assembly language, the
process is called "disassembling".
 The disassembled code (using NASM syntax):

32
Introduction of x86 assembly language
programming
The topic of x86 assembly language programming is messy because:
 There are many different assemblers out there: MASM, NASM, gas, as86,
TASM, a86,Terse, etc. All use radically different assembly languages.
 There are differences in the way you have to code for Linux, OS/X, Windows,
etc.
 Many different object file formats exist: ELF, COFF, Win32, OMF, a.out for
Linux, a.out for FreeBSD, rdf, IEEE-695, as86, etc.
 When be calling functions residing in the operating system or other libraries so
have to know some technical details about how libraries are linked, and not all
linkers work the same way.
 Modern x86 processors run in either 32 or 64-bit mode; there are quite a few
differences between these.
33
The programming process

34
Common Assemblers
 MASM, the Microsoft Assembler. It outputs OMF files (but Microsoft’s
linker can convert them to win32 format). It supports a massive and
clunky assembly language. Memory addressing is not intuitive. The
directives required to set up a program make programming unpleasant.
 GAS, the GNU assember. This uses the rather ugly AT&T-style syntax so
many people do not like it; however, you can configure it to use and
understand the Intel-style. It was designed to be part of the back end of
the GNU compiler collection (gcc).
 NASM, the "Netwide Assembler." It is free, small, and best of all it can
output zillions of different types of object files. The language is much
more sensible than MASM in many respects.
35
Object file formats
 OMF: used in DOS but has 32-bit extensions for Windows. Old.
 AOUT: used in early Linux and BSD variants
 COFF: "Common object file format"
 Win, Win32: Microsoft’s version of COFF, not exactly the same!
Replaces OMF.
 Win64: Microsoft’s format for Win64.
 ELF, ELF32: Used in modern 32-bit Linux and elsewhere
 ELF64: Used in 64-bit Linux and elsewhere
 macho32: NeXTstep/OpenStep/Rhapsody/Darwin/OS X 32-bit
 macho64: NeXTstep/OpenStep/Rhapsody/Darwin/OS X 64-bit

36
Linker
 Need to get a linker that (1) understands the object file
formats, and (2) can write executables for the operating
systems want to run code on.
 Some linkers out there include:
 LINK.EXE, for Microsoft operating systems.
 ld, which exists on all Unix systems; Windows programmers
get this in any gcc distribution.

37
Assembly Basic Syntax
 An assembly program can be divided into three sections:
 The data section
 The bss section
 The text section

38
The data Section
 The data section is used for declaring initialized data or
constants

section .data

39
The bss Section
 bss (block starting symbol)
 The bss section is used for declaring variables. The syntax for declaring
bss section is:

section .bss

40
The text section
 The text section is used for keeping the actual code. This section must begin with

the declaration global main, which tells the kernel where the program execution
begins.

section .text
global main
main:

41
Comments
 Assembly language comment begins with a semicolon (;)

42
Assembly Language Statements
Assembly language programs consist of three types of statements:

 Executable instructions or instructions

 Assembler directives

 Macros

43
Executable instruction
 Executable instructions tell the processor what to do

 Each instruction consists of an operation code (opcode). Each


executable instruction generates one machine language instruction.

44
Assembler directives
 The assembler directives or pseudo-ops tell the assembler about the

various aspects of the assembly process.

 These are non-executable and do not generate machine language

instructions

45
Macros
 Macros are basically a text substitution mechanism

46
Syntax of Assembly Language Statements

 Assembly language statements are entered one statement per

line. Each statement follows the following format:

[label] mnemonic [operands] [;comment]

• mnemonic is the name of the instruction

47
Examples of typical assembly
language statements

INC COUNT ; Increment the memory variable COUNT


MOV TOTAL, 48 ; Transfer the value 48 in the memory variable TOTAL
ADD AH, BH ; Add the content of the BH register into the AH register
AND MASK1, 128 ; Perform AND operation on the variable MASK1 and 128
ADD MARKS, 10 ; Add 10 to the variable MARKS
MOV AL, 10 ; Transfer the value 10 to the AL register

48
Programming Using System Calls
 64-bit Linux installations use the processor’s SYSCALL

instruction to jump into the portion of memory where


operating system services are stored.

 To use SYSCALL, first put the system call number in RAX,

then the arguments, if any, in RDI, RSI, RDX, R10, R8, and
R9, respectively.

49
Structure of a NASM Program
 NASM is line-based. Most programs consist of directives followed by one or
more sections. Lines can have an optional label. Most lines have
an instruction followed by zero or more operands.

50
Example Hello World program x86-32bit
section .text
global main ;must be declared for linker (ld)
main: ;tells linker entry point
mov eax,4 ;system call number (sys_write)
mov ebx,1 ;file descriptor (stdout)
mov ecx,msg ;message to write
mov edx,len ;message length
int 0x80 ;call kernel

mov eax,1 ;system call number (sys_exit)


int 0x80 ;call kernel

section .data
msg db 'Hello, world!', 0xa ;our dear string
len equ $ - msg ;length of our dear string

51
System Calls
 The operating system is responsible for
 Process Management (starting, running, stopping processes)
 File Management(creating, opening, closing, reading, writing,
renaming files)
 Memory Management (allocating, deallocating memory)
 Other stuff (timing, scheduling, network management)
 An application program makes a system call to get the
operating system to perform a service for it, like reading
from a file.
 One nice thing about syscalls is that we don't have to link
with a C library, so our executables can be much smaller.

52
System Calls in 32-bit Linux

 To make a system call in 32-bit Linux, place the system call


number in eax, then its arguments, in order, in ebx, ecx, edx,
esi, edi, and ebp, then invoke int 0x80.
 Some system calls return information, usually in eax.
 All registers are saved across the system call.

53
Some of the system calls (32 bit)
 All the syscalls are listed in /usr/include/asm/unistd.h

54
system call sys_write

mov edx,4 ; message length


mov ecx,msg ; message to write
mov ebx,1 ; file descriptor (stdout)
mov eax,4 ; system call number (sys_write)
int 0x80 ; call kernel

55
system call sys_exit

mov eax,1 ; system call number (sys_exit)


int 0x80 ; call kernel

56
Example Hello World program x86-32bit
section .text
global main ;must be declared for linker (ld)
main: ;tells linker entry point
mov eax,4 ;system call number (sys_write)
mov ebx,1 ;file descriptor (stdout)
mov ecx,msg ;message to write
mov edx,len ;message length
int 0x80 ;call kernel

mov eax,1 ;system call number (sys_exit)


int 0x80 ;call kernel

section .data
msg db 'Hello, world!', 0xa ;our dear string
len equ $ - msg ;length of our dear string

57
System Calls in 64-bit Linux
 To make a system call in 64-bit Linux, place the system call
number in rax, then its arguments, in order, in rdi, rsi, rdx,
r10, r8, and r9, then invoke syscall.
 Some system calls return information, usually in rax. A value
in the range between -4095 and -1 indicates an error, it is -
errno.
 The system call destroys rcx and r11 but others registers are
saved across the system call.

58
59
Example Hello World program x86-64bit
 Use system calls for writing to a file (call number 1) and
exiting a process (call number 60). Here it is in the NASM
assembly language:

60
Compile and link on 64-bit Linux
 Save above program with a name: example hello.asm
$nasm -f elf64 hello.asm -o hello.o
$ld hello.o -o hello
 Run the program
$./hello

61
Example: using gcc compiler

62
The End

63
Homework
 Create an Ubuntu VM
 Installing NASM assembler
 Testing all examples in the lesson
 Write a program in assembly language that reads a string
from the keyboard and displays it on the screen?

64

You might also like