7 Week
7 Week
3. Instruction sets
—Instruction Sets: Characteristics and Functions
—Instruction Sets: Addressing Modes and Formats
—Assembly Language and Related Topics
+
3.3 Assembly Language and Related
Topics
3.3 Outline
• Assembly Language Concepts
• Motivation for Assembly Language
Programming
• Assembly Language Elements
• Examples
• Types of Assemblers
• Assemblers
• Loading and Linking
Assembler
A program that translates assembly language into machine code.
Assembly Language
A symbolic representation of the machine language of a specific processor, augmented by additional
types of statements that facilitate program writing and that provide instructions to the assembler.
Compiler
A program that converts another program from some source language (or programming language)
to machine language (object code). Some compilers output assembly language which is then con-
verted to machine language by a separate assembler. A compiler is distinguished from an assembler
by the fact that each input statement does not, in general, correspond to a single machine instruction
or fixed sequence of instructions. A compiler may support such features as automatic allocation
of variables, arbitrary arithmetic expressions, control structures such as FOR and WHILE loops,
variable scope, input/output operations, higher-order functions and portability of source code.
Executable Code
The machine code generated by a source code language processor such as an assembler or
compiler.
Key Terms
This is software in a form that can be run in the computer.
Instruction Set For This
The collection of all possible instructions for a particular computer; that is, the collection of
machine language instructions that a particular processor understands.
Linker
Week
A utility program that combines one or more files containing object code from separately compiled
program modules into a single file containing loadable or executable code.
Loader
A program routine that copies an executable program into memory for execution.
Machine Language, or Machine Code
The binary representation of a computer program which is actually read and interpreted by the
computer. A program in machine code consists of a sequence of machine instructions (possibly
interspersed with data). Instructions are binary strings which may be either all the same size (e.g.,
one 32-bit word for many modern RISC microprocessors) or of different sizes.
Object Code
The machine language representation of programming source code. Object code is created by a
compiler or assembler and is then turned into executable code by the linker.
Programming the Statement n = i + j+ k
Address Contents Address Contents
Opcode Operand
101 0010 0010 1100 1001 101 22C9
102 0001 0010 1100 1010 102 12CA
103 0001 0010 1100 1011 103 12CB
104 0011 0010 1100 1100 104 32CC
Disadvantages
• The disadvantages of using an assembly language
rather than an HLL include:
– Development time
– Reliability and security
– Debugging and verifying
– Maintainability
– Portability
– System code can use intrinsic functions instead of assembly
– Application code can use intrinsic functions or vector classes
instead of assembly
– Compilers have been improved a lot in recent years
Assembly Language Programming (2 of 2)
Advantages
• Advantages to the occasional use of assembly
language include:
– Debugging and verifying
– Making compilers
– Embedded systems
– Hardware drivers and system code
– Accessing instructions that are not accessible from high-level
language
– Self-modifying code
– Optimizing code for size
– Optimizing code for speed
– Function libraries
– Making function libraries compatible with multiple compilers and
operating systems
Assembly Language vs. Machine Language
• The terms assembly language and machine language are
sometimes, erroneously, used synonymously
• Machine language:
▪ Consists of instructions directly executable by the processor
▪ Each machine language instruction is a binary string containing an opcode,
operand references, and perhaps other bits related to execution, such as
flags
▪ For convenience, instead of writing an instruction as a bit string, it can be
written symbolically, with names for opcodes and registers
• Assembly language:
▪ Makes much greater use of symbolic names, including assigning names to
specific main memory locations and specific instruction locations
▪ Also includes statements that are not directly executable but serve as
instructions to the assembler that produces machine code from an assembly
language program
Assembly-Language Statement Structure
BH BL BX EBX (011)
CH CL CX ECX (001)
DH DL DX EDX (010)
ESI (110)
EDI (111)
EBP (101)
ESP (100)
Segment Registers
15 0
CS
DS
SS
ES
FS
GS
Statements (1 of 2)
Comment
• All assembly languages allow the placement of comments in
the program
In NASM, single-
Multiline macros
line macros are
are defined using
defined using the
the mnemonic
%DEFINE
%MACRO
directive
• EXTERN
– Used to declare a symbol which is not
defined anywhere in the module being
Directives assembled, but is assumed to be defined in
some other module and needs to be
referred to by this one
• DEFAULT • FLOAT
– Can change some assembler defaults, such – Allows the programmer to change some of
as whether to use relative or absolute the default settings to options other than
addressing those used in IEEE 754
• [WARNING]
• SECTION or SEGMENT
– Used to enable or disable classes of
– Changes that section of the output file the
warnings
source code will be assembled into
System Calls
• The assembler makes use of the x86 INT instruction to make
system calls
• There are six registers that store the arguments of the system call
used
• EBX
• ECX
• EDX
• ESI
• EDI
• EDP
segment .bss
Limit resd 1 ; find primes up to this limit
Guess resd 1 ; the current guess for prime
segment .text
global _asm_main
_asm_main:
enter 0,0 ; setup routine
pusha
; printf("3\n");
Prime while_factor:
mov eax,ebx
mul eax
jo end_while_factor
; edx:eax = eax*eax
; if answer won’t fit in eax alone
cmp eax, [Guess]
popa
mov eax, 0 ; return back to C
leave
ret
x86 String Instructions
Operation Name Description
Moves the string byte addressed by the ESI register to the location
MOVSB
addressed by the EDI register.
Subtracts the destination string byte from the source string element and
CMPSB
updates the status flags in the EFLAGS register according to the results.
Subtracts the destination string byte from the contents of the AL register
SCASB
and updates the status flags according to the results.
Loads the source string byte identified by the ESI register into the EAX
LODSB
register.
Stores the source string byte from the AL register into the memory
STOSSB
location identified with the EDI register.
REP Repeat while the ECX register is not zero.
REPE/REPZ Repeat while the ECX register is not zero and the ZF flag is set.
REPNE/REPNZ Repeat while the ECX register is not zero and the ZF flag is clear.
Assembly Program for Moving a String
section .text
global main ;must be declared for using gcc
main: ;tell linker entry point
mov ecx, len
mov esi, s1
mov edi, s2
cld
rep movsb
mov edx,20 ;message length
mov ecx,s2 ;message to write
mov ebx,1 ;file descriptor (stdout)
mov eax,4 ;system call number (sys_write)
int 0x80 ;call kernel
mov eax,1 ;system call number (sys_exit)
int 0x80 ;call kernel
section .data
s1 db 'Hello, world!',0 ;string 1
len equ $-s1
section .bss
s2 resb 20 ;destination
always update
condition condition zero
code flags rotation
ADDS r3, r3, #19 1 1 1 0 0 0 1 0 0 1 0 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 1 0 0 1 1
data processing instr
immediate format cond format opcode S Rn Rd rotate immediate
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Figure 15.9 Translating an ARM Assembly I nstruction into a Binary M achine I nstruction
One-Pass Assembler
• It is possible to implement as assembler that makes only a single pass
through the source code
• The main difficulty in trying to assemble a program in one pass involves
forward references to labels
• Instruction operands may be symbols that have not yet been defined in the
source program
– Therefore, the assembler does not know with relative address to insert in the
translated instruction
Program Program
Data Data
Object Code
Stack
Process image in
main memory
A Linking and Loading Scenario
Static
library
Dynamic
library x
M odule 1
Loader
Load
Linker
M odule
M odule 2
Dynamic Run-time
library linker/
loader
I ncreasing
address
values
Reference
to data
Data
Current top
of stack
Stack
Address Binding
(a) Loader
Binding Time Function
All actual physical addresses are directly specified by the programmer in the
Programming time
program itself.
Compile or assembly The program contains symbolic address references, and these are converted to
time actual physical addresses by the compiler or assembler.
The compiler or assembler produces relative addresses. The loader translates
Load time
these to absolute addresses at the time of program loading.
The loaded program retains relative addresses. These are converted dynami-
Run time
cally to absolute addresses by processor hardware.
(b) Linker
Linkage Time Function
(a) Object module (b) Absolute load module (c) Relative load module (d) Relative load module
loaded into main memory
starting at location x
M odule A 0 M odule A
External
Reference to CALL B; JSR " L"
Length L
M odule B
L+M –1 Return
Return L+M M odule C
M odule C
Length N Return
L+M +N –1