0% found this document useful (0 votes)
15 views39 pages

7 Week

The document discusses assembly language and its role in computer architecture, outlining its characteristics, advantages, and disadvantages compared to high-level languages. It explains the function of assemblers, compilers, and the structure of assembly language statements, including labels, mnemonics, and operands. Additionally, it covers directives, macros, and system calls in assembly programming, providing examples and comparisons with machine language.

Uploaded by

Aiman Al Arab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views39 pages

7 Week

The document discusses assembly language and its role in computer architecture, outlining its characteristics, advantages, and disadvantages compared to high-level languages. It explains the function of assemblers, compilers, and the structure of assembly language statements, including labels, mnemonics, and operands. Additionally, it covers directives, macros, and system calls in assembly programming, providing examples and comparisons with machine language.

Uploaded by

Aiman Al Arab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

(Advanced) Computer Architechture

Prof. Dr. Hasan Hüseyin BALIK


(7th Week)
Outline

3. Instruction sets
—Instruction Sets: Characteristics and Functions
—Instruction Sets: Addressing Modes and Formats
—Assembly Language and Related Topics
+
3.3 Assembly Language and Related
Topics
3.3 Outline
• Assembly Language Concepts
• Motivation for Assembly Language
Programming
• Assembly Language Elements
• Examples
• Types of Assemblers
• Assemblers
• Loading and Linking
Assembler
A program that translates assembly language into machine code.
Assembly Language
A symbolic representation of the machine language of a specific processor, augmented by additional
types of statements that facilitate program writing and that provide instructions to the assembler.
Compiler
A program that converts another program from some source language (or programming language)
to machine language (object code). Some compilers output assembly language which is then con-
verted to machine language by a separate assembler. A compiler is distinguished from an assembler
by the fact that each input statement does not, in general, correspond to a single machine instruction
or fixed sequence of instructions. A compiler may support such features as automatic allocation
of variables, arbitrary arithmetic expressions, control structures such as FOR and WHILE loops,
variable scope, input/output operations, higher-order functions and portability of source code.
Executable Code
The machine code generated by a source code language processor such as an assembler or
compiler.
Key Terms
This is software in a form that can be run in the computer.
Instruction Set For This
The collection of all possible instructions for a particular computer; that is, the collection of
machine language instructions that a particular processor understands.
Linker
Week
A utility program that combines one or more files containing object code from separately compiled
program modules into a single file containing loadable or executable code.
Loader
A program routine that copies an executable program into memory for execution.
Machine Language, or Machine Code
The binary representation of a computer program which is actually read and interpreted by the
computer. A program in machine code consists of a sequence of machine instructions (possibly
interspersed with data). Instructions are binary strings which may be either all the same size (e.g.,
one 32-bit word for many modern RISC microprocessors) or of different sizes.
Object Code
The machine language representation of programming source code. Object code is created by a
compiler or assembler and is then turned into executable code by the linker.
Programming the Statement n = i + j+ k
Address Contents Address Contents
Opcode Operand
101 0010 0010 1100 1001 101 22C9
102 0001 0010 1100 1010 102 12CA
103 0001 0010 1100 1011 103 12CB
104 0011 0010 1100 1100 104 32CC

201 0000 0000 0000 0010 201 0002


202 0000 0000 0000 0011 202 0003
203 0000 0000 0000 0100 203 0004
204 0000 0000 0000 0000 204 0000

(a) Binary program (b) Hexadecimal program

Address I nstruction Label Operation Operand


101 LDA 201 FORMUL LDA I
102 ADD 202 ADD J
103 ADD 203 ADD K
104 STA 204 STA N

201 DAT 0002 I DATA 2


202 DAT 0003 J DATA 3
203 DAT 0004 K DATA 4
204 DAT 0000 N DATA 0

(c) Symbolic program (d) Assembly program

Figure 15.1 Programming the Statement n = i + j+ k


Motivation for Assembly Language
Programming
• Assembly language is a programming language that is one
step away from machine language
• Typically each assembly language instruction is translated into
one machine instruction by the assembler
• Assembly language is hardware dependent, with a different
assembly language for each type of processor
• Assembly language instructions can make reference to
specific registers in the processor, include all of the opcodes
of the processor, and reflect the bit length of the various
registers of the processor and operands of the machine
language
– Therefore, an assembly language programmer must understand the
computer’s architecture
Assembly Language Programming (1 of 2)

Disadvantages
• The disadvantages of using an assembly language
rather than an HLL include:
– Development time
– Reliability and security
– Debugging and verifying
– Maintainability
– Portability
– System code can use intrinsic functions instead of assembly
– Application code can use intrinsic functions or vector classes
instead of assembly
– Compilers have been improved a lot in recent years
Assembly Language Programming (2 of 2)

Advantages
• Advantages to the occasional use of assembly
language include:
– Debugging and verifying
– Making compilers
– Embedded systems
– Hardware drivers and system code
– Accessing instructions that are not accessible from high-level
language
– Self-modifying code
– Optimizing code for size
– Optimizing code for speed
– Function libraries
– Making function libraries compatible with multiple compilers and
operating systems
Assembly Language vs. Machine Language
• The terms assembly language and machine language are
sometimes, erroneously, used synonymously
• Machine language:
▪ Consists of instructions directly executable by the processor
▪ Each machine language instruction is a binary string containing an opcode,
operand references, and perhaps other bits related to execution, such as
flags
▪ For convenience, instead of writing an instruction as a bit string, it can be
written symbolically, with names for opcodes and registers

• Assembly language:
▪ Makes much greater use of symbolic names, including assigning names to
specific main memory locations and specific instruction locations
▪ Also includes statements that are not directly executable but serve as
instructions to the assembler that produces machine code from an assembly
language program
Assembly-Language Statement Structure

label: mnemonic operand(s) ;comment

optional opcode name zero or more optional


or
directive name
or
macro name

Figure 15.2 Assembly-Language Statement Structur e


Statements (1 of 3)
Label
• If a label is present, the assembler defines the label as
equivalent to the address into which the first byte of the object
code generated for that instruction will be loaded
• The programmer may subsequently use the label as an
address or as data in another instruction’s address field
• The assembler replaces the label with the assigned value
when creating an object program
• Labels are most frequently used in branch instructions
• Reasons for using a label:
– Makes a program location easier to find and remember
– Can easily be moved to correct a program
– Programmer does not have to calculate relative or absolute memory
addresses, but just uses labels as needed
Statements (2 of 3)
Mnemonic
• The mnemonic is the name of the operation or function of the
assembly language statement
• In the case of a machine instruction, a mnemonic is the
symbolic name associated with a particular opcode
Statements (3 of 3)
Operands
• An assembly language statement includes zero or more
operands
• Each operand identifies an immediate value, a register value,
or a memory location
• Typically the assembly language provides conventions for
distinguishing among the three types of operand references,
as well as conventions for indicating addressing mode
Intel x86 Program Execution Registers
Generall-Purpose Registers
31 0 16-bit 32-bit
AH AL AX EAX (000)

BH BL BX EBX (011)

CH CL CX ECX (001)

DH DL DX EDX (010)

ESI (110)

EDI (111)

EBP (101)

ESP (100)

Segment Registers
15 0
CS

DS

SS

ES

FS

GS
Statements (1 of 2)
Comment
• All assembly languages allow the placement of comments in
the program

• A comment can either occur at the right-hand end of an


assembly statement or can occupy and entire test line

• The comment begins with a special character that signals to


the assembler that the rest of the line is a comment and is to
be ignored by the assembler

• Typically, assembly languages for the x86 architecture use a


semicolon (;) for the special character
Statements (2 of 2)
Pseudo-instructions
• Pseudo-instructions are statements which, though not real x86
machine instructions, are used in the instruction field anyway
because that’s the most convenient place to put them
• Pseudo-instructions are not directly translated into machine
language instructions
• Instead, directives are instructions to the assembler to perform
specified actions during the assembly process
• Examples include:
– Define constants
– Designate areas of memory for data storage
– Initialize areas of memory
– Place tables or other fixed data in memory
– Allow references to other programs
Assembly-Language Directives
(a) Letters for RESx and Dx Directives
Unit Letter
byte B
word (2 bytes) W
double word (4 bytes) D
quad word (8 bytes) Q
ten bytes T
(b) Directives
Name Description Example
DB, DW,
L6 DD 1A92H
DD, DQ, Initialize locations
;doubleword at L6 initialized to 1A92H
DT
RESB,
RESW,
Reserve uninitialized BUFFER RESB 64
RESD,
locations ;reserve 64 bytes starting at BUFFER
RESQ,
REST
Include binary file in
INCBIN INCBIN “file.dat” ; include this file
output
Define a symbol to a MSGLEN EQU 25
EQU
given constant value ;the constant MSGLEN equals decimal 25
Repeat instruction ZEROBUF TIMES 64 DB 0
TIMES
multiple times ;initialize 64-byte buffer to all zeros
Macro Definitions (1 of 2)
• A macro definition is similar to a subroutine in several ways
– A subroutine is a section of a program that is written once, and can be used
multiple times by calling the subroutine from any point in the program
– When a program is compiled or assembled, the subroutine is loaded only once
– A call to the subroutine transfers control to the subroutine and a return instruction
in the subroutine returns control to the point of the call

• Similarly, a macro definition is a section of code that the programmer


writes once, and then can use many times
– The main difference is that when the assembler encounters a macro call, it
replaces the macro call with the macro itself
– This process is call macro expansion

• Macros are handled by the assembler at assembly time


• Macros provide the same advantage as subroutines in terms of
modular programming, but without the runtime overhead of a
subroutine call and return
– The tradeoff is that the macro approach uses more space in the object code
Macro Definitions (2 of 2)

In NASM and many


other assemblers, a
distinction is made
between a single-line
macro and a multi-line
macro

In NASM, single-
Multiline macros
line macros are
are defined using
defined using the
the mnemonic
%DEFINE
%MACRO
directive
• EXTERN
– Used to declare a symbol which is not
defined anywhere in the module being
Directives assembled, but is assumed to be defined in
some other module and needs to be
referred to by this one

• A directive is a command embedded in • GLOBAL


the assembly source code that is – Is the other end of EXTERN: if one module
declares a symbol as EXTERN and refers to
recognized and acted upon by the it, then in order to prevent linker errors,
assembler some other module must actually define the
symbol and declare it as GLOBAL
• NASM includes the following directives:
• COMMON
• BITS – Used to declare common variables
– Specifies whether NASM should generate
code designed to run on a processor
• CPU
operating in 16-bit mode, 32-bit mode, or 64- – Restricts assembly to those instructions that
bit mode are available on the specified CPU

• DEFAULT • FLOAT
– Can change some assembler defaults, such – Allows the programmer to change some of
as whether to use relative or absolute the default settings to options other than
addressing those used in IEEE 754

• [WARNING]
• SECTION or SEGMENT
– Used to enable or disable classes of
– Changes that section of the output file the
warnings
source code will be assembled into
System Calls
• The assembler makes use of the x86 INT instruction to make
system calls
• There are six registers that store the arguments of the system call
used
• EBX
• ECX
• EDX
• ESI
• EDI
• EDP

• These registers take the consecutive arguments, starting with the


EBX register
• If there are more than six arguments, then the memory location of
the first argument is stored in the EBX register
Assembly Programs for Greatest Common
Divisor
gcd: mov ebx,eax gcd: neg eax
mov eax,edx je L3
test ebx,ebx L1: neg eax
jne L1 xchg eax,edx
test edx,edx L2: sub eax,edx
jne L1 jg L2
mov eax,1 jne L1
ret L3: add eax,edx
L1: test eax,eax jne L4
jne L2 inc eax
mov eax,ebx L4: ret
ret
L2: test ebx,ebx
je L5
L3; cmp ebx,eax
je L5
jae L4
sub eax,ebx
jmp L3
L4: sub ebx,eax
jmp L3
L5: ret

(a) Compiled program (b) Written directly in assembly language


C Program for Generating Prime Numbers

unsigned guess; /* current guess for prime */


unsigned factor ; /* possible factor of guess */
unsigned limit ; /* find primes up to this value */

printf (”Find primes up to : ”);


scanf(”%u”, &limit);
printf (”2\n”); /* treat first two primes as */
printf (”3\n”); /* special case */
guess = 5; /* initial guess */
while ( guess <= limit ) { /* look for a factor of guess */
factor = 3;
while ( factor * factor < guess && guess % factor != 0 )
factor += 2;
if ( guess % factor != 0 )
printf (”%d\n”, guess);
guess += 2; /* only look at odd numbers */
}

Figure 15.5 C Program for Generating Prime Numbers


%include "asm_io.inc"
segment .data
Message db "Find primes up to: ", 0

segment .bss
Limit resd 1 ; find primes up to this limit
Guess resd 1 ; the current guess for prime

segment .text
global _asm_main
_asm_main:
enter 0,0 ; setup routine
pusha

mov eax, Message


call print_string
call read_int ; scanf("%u", & limit );
mov [Limit], eax

Assembly mov eax, 2


call print_int
call print_nl
mov eax, 3
; printf("2\n");

; printf("3\n");

Program for call print_int


call print_nl

mov dword [Guess], 5 ; Guess = 5;


while_limit: ; while ( Guess <= Limit )

Generating mov eax,[Guess]


cmp eax, [Limit]
jnbe end_while_limit ; use jnbe since numbers are unsigned

mov ebx, 3 ; ebx is factor = 3;

Prime while_factor:
mov eax,ebx
mul eax
jo end_while_factor
; edx:eax = eax*eax
; if answer won’t fit in eax alone
cmp eax, [Guess]

Numbers jnb end_while_factor


mov eax,[Guess]
mov edx,0
div ebx
; if !(factor*factor < guess)

; edx = edx:eax % ebx


cmp edx, 0
je end_while_factor ; if !(guess % factor != 0)

add ebx,2 ; factor += 2;


jmp while_factor
end_while_factor:
je end_if ; if !(guess % factor != 0)
mov eax,[Guess] ; printf("%u\n")
call print_int
call print_nl
end_if:
add dword [Guess], 2 ; guess += 2
jmp while_limit
end_while_limit:

popa
mov eax, 0 ; return back to C
leave
ret
x86 String Instructions
Operation Name Description
Moves the string byte addressed by the ESI register to the location
MOVSB
addressed by the EDI register.
Subtracts the destination string byte from the source string element and
CMPSB
updates the status flags in the EFLAGS register according to the results.
Subtracts the destination string byte from the contents of the AL register
SCASB
and updates the status flags according to the results.
Loads the source string byte identified by the ESI register into the EAX
LODSB
register.
Stores the source string byte from the AL register into the memory
STOSSB
location identified with the EDI register.
REP Repeat while the ECX register is not zero.

REPE/REPZ Repeat while the ECX register is not zero and the ZF flag is set.

REPNE/REPNZ Repeat while the ECX register is not zero and the ZF flag is clear.
Assembly Program for Moving a String

section .text
global main ;must be declared for using gcc
main: ;tell linker entry point
mov ecx, len
mov esi, s1
mov edi, s2
cld
rep movsb
mov edx,20 ;message length
mov ecx,s2 ;message to write
mov ebx,1 ;file descriptor (stdout)
mov eax,4 ;system call number (sys_write)
int 0x80 ;call kernel
mov eax,1 ;system call number (sys_exit)
int 0x80 ;call kernel
section .data
s1 db 'Hello, world!',0 ;string 1
len equ $-s1
section .bss
s2 resb 20 ;destination

Figure 15.7 Assembly Program for M oving a String


TYPES OF ASSEMBLERS
• An assembler is a software that translates assembly language into
machine language
• Although all assemblers perform the same tasks, their
implementations vary
• Some of the common terms that describe types of assemblers:
– Cross-assembler
– Resident assembler
– Macroassembler
– Microassembler
– Meta-assembler
– One-pass assembler
– Two-pass assembler
Flowchart of Two-Pass Assembler
Translating an ARM Assembly Instruction
into a Binary Machine Instruction

always update
condition condition zero
code flags rotation
ADDS r3, r3, #19 1 1 1 0 0 0 1 0 0 1 0 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 1 0 0 1 1
data processing instr
immediate format cond format opcode S Rn Rd rotate immediate
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Figure 15.9 Translating an ARM Assembly I nstruction into a Binary M achine I nstruction
One-Pass Assembler
• It is possible to implement as assembler that makes only a single pass
through the source code
• The main difficulty in trying to assemble a program in one pass involves
forward references to labels
• Instruction operands may be symbols that have not yet been defined in the
source program
– Therefore, the assembler does not know with relative address to insert in the
translated instruction

• When the assembler encounters an instruction operand that is a symbol that


is not yet defined, the assembler does the following:
– It leaves the instruction operand field empty in the assembled binary instruction
– The symbol used as an operand is entered in the symbol table and the table entry
is flagged to indicate that the symbol is undefined
– The address of the operand field in the instruction that refers to the undefined
symbol is added to a list of forward references associated with the symbol table
entry
The Loading Function
Process Control Block

Program Program

Data Data

Object Code
Stack

Process image in
main memory
A Linking and Loading Scenario
Static
library
Dynamic
library x

M odule 1
Loader
Load
Linker
M odule
M odule 2

Dynamic Run-time
library linker/
loader

M odule n M ain memory


Addressing Requirements for a Process
Process control
Process Control Block
information Entry point
to program
Branch
Program instruction

I ncreasing
address
values
Reference
to data

Data

Current top
of stack
Stack
Address Binding
(a) Loader
Binding Time Function

All actual physical addresses are directly specified by the programmer in the
Programming time
program itself.
Compile or assembly The program contains symbolic address references, and these are converted to
time actual physical addresses by the compiler or assembler.
The compiler or assembler produces relative addresses. The loader translates
Load time
these to absolute addresses at the time of program loading.
The loaded program retains relative addresses. These are converted dynami-
Run time
cally to absolute addresses by processor hardware.
(b) Linker
Linkage Time Function

No external program or data references are allowed. The programmer


Programming time must place into the program the source code for all subprograms that are
referenced.
The assembler must fetch the source code of every subroutine that is refer-
Compile or assembly time
enced and assemble them as a unit.
All object modules have been assembled using relative addresses. These mod-
Load module creation ules are linked together, and all references are restated relative to the origin
of the final load module.
External references are not resolved until the load module is to be loaded into
main memory. At that time, referenced dynamic link modules are appended
Load time
to the load module, and the entire package is loaded into main or virtual
memory.
External references are not resolved until the external call is executed by the
Run time processor. At that time, the process is interrupted and the desired module is
linked to the calling program.
Absolute and Relocatable Load Modules

Symbolic Absolute Relative M ain memory


Addresses Addresses Addresses addresses

PROGRAM 1024 PROGRAM 0 PROGRAM x PROGRAM

JUM P X JUM P 1424 JUM P 400 JUM P 400

X 1424 400 400 + x

LOAD Y LOAD 2224 LOAD 1200 LOAD 1200

DATA DATA DATA DATA

Y 2224 1200 1200 + x

(a) Object module (b) Absolute load module (c) Relative load module (d) Relative load module
loaded into main memory
starting at location x

Figure 15.13 Absolute and Relocatable Load M odules


The Linking Function
Relative
Addresses

M odule A 0 M odule A

External
Reference to CALL B; JSR " L"
Length L
M odule B

Return L–1 Return


L M odule B
M odule B
JSR " L + M"
CALL C;
Length M

L+M –1 Return
Return L+M M odule C

M odule C

Length N Return
L+M +N –1

Return (b) Load module

(a) Object modules

Figure 15.14 The Linking Function


Load-Time Dynamic Linking
• Dynamic Linking is used to refer to the practice of deferring
the linkage of some external modules until after the load
module has been created
• For load-time dynamic linking the steps occur following:
– The load module to be loaded is read into memory
– Any reference to an external module causes the loader to find the target module, load
it, and alter the reference to a relative address in memory from the beginning of the
application module

• Advantages to approach over what might be called static


linking
– It becomes easier to incorporate changed or upgraded versions of the target module
– Having target code in a dynamic link file paves the way for automatic code sharing
– It becomes easier for independent software developers to extend the functionality of a
widely-used operating system such as Linux
Run-Time Dynamic Linking
• With run-time dynamic linking some of the linking is postponed until
execution time
▪ External references to target modules remain in the loaded program
▪ When a call is made to the absent module, the operating system locates the
module, loads it, and links it to the calling module
▪ Such modules are typically shareable
▪ In the Windows environment these are called dynamic-link libraries (DLLs)
▪ If one process is already making use of a dynamically linked shared module, then
that module is in main memory and a new process can simply link to the already-
loaded module

• The use of DLLs can lead to a problem commonly referred to as DLL


hell
▪ DLL hell occurs if two or more processes are sharing a DLL module, but expect
different versions of the module

You might also like