Introduction To Assembly Language
Introduction To Assembly Language
Assembly Language
2nd Semester SY 2009-2010
Benjie A. Pabroa
What is Assembly
Language
"High"-level languages such as BASIC,
FORTRAN, Pascal, Lisp, APL, etc. are
designed to ease the strain of
programming by providing the user with a
set of somewhat sophisticated operations
that are easily accessed
Assembly as Low-level
language
The lesson we derive is this: a very low-level
language might be very flexible and
efficient (in terms of speed and memory
use), but might be very difficult to program
in since no sophisticated operations are
provided and since the programmer must
understand in detail the operation of the
computer
Assembly language is essentially the lowest
possible level of language.
Built-in Features
the ability to read the values stored at
various "memory locations",
the ability to write a new value into a
memory location,
the ability to do integer arithmetic of limited
precision (add, subtract, multiply, divide),
The ability to do logical operations (or, and,
not, xor),
and the ability to "jump" to programs stored
at various locations in the computer's
memory.
Features not included
The ability to perform graphics
and the ability to access files
ability to directly perform floating-point
arithmeti
Assembly vs High Level
Lang
FORTRAN code to average together the N numbers stored
in the array X(I):
INTEGER*2 I,X(N)
INTEGER*4 AVG
.
.
.
AVG=0
DO 10 I=1,N
AVG=AVG+X(I)
AVG=AVG/N
.
.
.
Assembly vs High Level
Lang
mov cx,n ; cx is used as the loop
; counter. It starts at N and
; counts down to zero.
mov dx,0 ; the dx register stores the
; two most significant bytes of
; the running sum
mov ax,0 ; use ax to store the least
; significant bytes
mov si,offset x ; use the si register to point
; to the currently accessed
; element X(I), starting with
; I=0
Assembly vs High Level
Lang
addloop:
add ax,word ptr [si] ; add X(I) to the two least
; significant bytes of AVG
adc dx,0 ; add the "carry" into the two
; most significant bytes of AVG
add si,2 ; move si to point to X(I+1)
loop addloop ; decrement cx and loop again
; if not zero
div n ; divides AVG by N
mov avg,ax ; save the result as AVG
Assembly vs High Level
Lang
writing it required intimate knowledge of
how the variables x, n, and avg were
stored in memory.
PC System Architecture
Microprocessor
◦ Reading instructions from the memory and
executing them
Access memory
Do arithmetic and logical operations
Performs other services as well
PC System Architecture
1971:
◦ Intel’s 4004 was the first microprocessor—a 4-bit CPU (like the one
from CS231) that fit all on one chip.
1978:
◦ The 8086 was one of the earliest 16-bit processors.
1981:
◦ IBM uses the 8088 in their little PC project.
1989:
◦ The 80486 includes a floating-point unit in the same chip as the main
processor, and uses RISC-based implementation ideas like pipelining
for greatly increased performance.
1997:
◦ The Pentium II is superscalar, supports multiprocessing, and includes
special instructions for multimedia applications.
2002:
◦ The Pentium 4 runs at insane clock rates (3.06 GHz), implements
extended multimedia instructions and has a large on-chip cache.
PC System Architecture..
Memory
◦ Store instructions(program) or data
◦ It appears as a sequence of locations(or
addresses)
Each address – stored a byte
◦ Types:
ROM
Stored byte may only be read by the CPU
Cannot be changed
RAM
Stored byte may be both read and
written(changed)
Volatile – all data will be lost after shutdown
Both types are random access
The Process of Assembly
Assembly language is a compiled language
◦ Source-code must first be created with a text-
editor program
◦ Then the source-code will be compiled
◦ Assembly language compilers => assemblers
Auxiliary Programs
◦ First: text-editor(source code editor)
◦ Second: assembler
Assembles source code to generate object code
in the process.
◦ Third: Linker
Combines object code modules created by
assembler
The Process of Assembly..
◦ Fourth: Loader
Built-in to the operating system and is never
explicitly executed.
Takes the “relocatable” code created by the
linker, “loads: it into memory at the lowest
available location, then runs it.
◦ Fifth: Debugger
Environment for running and testing assembly
language programs.
The Process of Assembly..
RAM
Assem bler
ALU
CU 1
Flag Register 2
3
4
Instruction Pointer
CPU Registers
Assembly language
◦ Thought goes into the use of the computer
memory and the CPU registers
Register
◦ Like a memory location in that it can store a
byte (or work) value.
◦ No address in the memory, it is not part of the
computer memory(built into the CPU)
CPU Registers
Importance of Registers in Assembly Prog.
◦ Instructions using registers > operating on
values stored at memory locations.
◦ Instructions tend to be shorter (less room to
store in memory)
◦ Register-oriented instructions operate faster that
memory-oriented instructions
Since the computer hardware can access a
register much faster than a memory location.
◦
CPU Registers (8086
family)
AX The Accumulator SP The stack pointer
BX The Pointer Register IP The Instruction pointer
CX The Loop Counter CS The “code segment”
DX Used for multiplication DS register
The “data segment”
SI and Division
The “Source” string SS register
The “stack segment”
DI index register
The “Destination” ES register
The “Extra segment”
BP String
Used forindex register
passing FLAG register
The flag register
arguments on the stack
Segment Registers
CS Code Segment 16-bit number that points to
the active code-segment
.data
My First Program
message db "Hello world, I'm learning Assembly !!!", "$"
.code
main proc
mov ax,seg message
mov ds,ax
mov ah,09
lea dx,message
int 21h
mov ax,4c00h
int 21h
main endp
end main
Names
Identifiers
◦ An identifier is a name you apply to items in
your program. the two types of identifiers are
"name", which refers to the address of a data
item, and "label", which refers to the address
of an instruction. The same rules apply to
names and labels
◦
Statements
◦ A program is made of a set of statements, there
are two types of statements, "instructions"
such as MOV and LEA, and "directives" which
tell the assembler to perform a specific action,
like ".model small“ or “.code”
Statements
Here's the general format of a statement:
.code
main proc
mov ax,seg message
mov ds,ax
mov ah,09
lea dx,message
int 21h
mov ax,4c00h
int 21h
main endp
end main
Dissecting Code
.model small
◦ Lines that start with a "." are used to provide the assembler
with information.
◦ The word(s) behind it say what kind of info.
In this case it just tells the assembler that the program is small
and doesn't need a lot of memory. I'll get back on this later.
.stack
◦ This one tells the assembler that the "stack" segment starts
here.
The stack is used to store temporary data.
◦
.data
◦ indicates that the data segment starts here and that the stack
segment ends there.
.model small
.stack
.data
message db "Hello world, I'm learning Assembly !!!", "$"
.code
main proc
mov ax,seg message
mov ds,ax
mov ah,09
lea dx,message
int 21h
mov ax,4c00h
int 21h
main endp
end main
Dissecting Code..
.code
◦ indicates that the code segment starts there and the data
segment ends there.
◦
main proc
◦ Code must be in procedures, just like in C or any other language.
◦ This indicates a procedure called main starts here.
◦ endp states that the procedure is finished.
◦ endmain main : tells the assembler that the program is finished.
◦ It also tells the assembler where to start in the program.
At the procedure called main in this case.
◦
message db "xxxx"
◦ DB means Define Byte and so it does.
◦ In the data-segment it defines a couple of bytes.
◦ These bytes contain the information between the brackets.
◦ "Message" is a name to indentify this byte-string.
◦ It's called an "indentifier".
Memory space for variables
◦ DB (Byte – 8 bit )
◦ DW (Word – 16 bit)
◦ DD (Doubleword – 32 bit)
◦ Example:
foo db 27 ;by default all numbers are decimal
bar dw 3e1h ; appending an "h" means hexadecimal
real_fat_rat dd ? ; "?" means "don't care about the value“
◦ Variable name
Address can’t be changed
Value can be changed
.model small
.stack
.data
message db "Hello world, I'm learning Assembly !!!", "$"
.code
main proc
mov ax, seg message
mov ds,ax
mov ah,09
lea dx,message
int 21h
mov ax,4c00h
int 21h
main endp
end main
Dissecting Code..
mov ax, seg message
◦ AX is a register.
You use registers all the time, so that's why you had to know
about them before.
◦ MOV is an instruction that moves data.
It can have a few "operands“
Here the operands are AX and seg message.
◦ seg message can be seen as a number.
It's the number of the segment "message“ in (The data-segment)
We have to know this number, so we can load the DS register
with it.
Else we can't get to the bit-string in memory.
We need to know WHERE the bit-string is located in memory.
◦ The number is loaded in the AX register.
MOV always moves data to the operand left of the comma and
from the operand right of the comma.
The MOV Instruction
Syntax:
otice the size of the source and destination
mov ax,bar ; load the word-size register ax with
(must match in ; the word value stored at location bar.
reg-reg, mov dl,foo ; load the byte-size register dl with
mem-reg, ; the byte value stored at location foo.
reg-mem mov bx,ax ; load the word-size register bx with
Transfers) ; the byte value in ax.
mov bl,ch ; load the byte-size register bl with
; the byte value in ch.
mov bar,si ; store the value in the word-size
; register si at the memory location
; labelled "bar".
mov foo,dh ; store the byte value in the register
; dh at memory location foo.
mov ax,5
onstant must consistent with the destination ; store the word 5 in the ax register.
mov al,5 ; store the byte 5 in the al register.
mov bar,5 ; store the word 5 at location bar.
mov foo,5 ; store the byte 5 at location foo.
Illegal Move Statement
◦ MOV AL, 3172
◦ MOV foo, 3172
.code
main proc
mov ax, seg message
mov ds,ax
mov ah,09
lea dx,message
int 21h
mov ax,4c00h
int 21h
main endp
end main
Dissecting Code..
◦
mov ds,ax
◦ Here it moves the number in the AX register (the number of
the data segment) into the DS register.
◦ We have to load this DS register this way (with two
instructions)
◦ Just typing: "mov ds,segment message" isn't possible.
mov ah, 09
◦ MOV again. This time it load the AH register with the constant
value nine.
.code
main proc
mov ax,seg message
mov ds,ax
mov ah,09
lea dx,message
int 21h
mov ax,4c00h
int 21h
main endp
end main
Dissecting Code..
int 21h
◦ This instruction causes an Interrupt.
◦ The processor calls a routine somewhere in memory.
◦ 21h tells the processor what kind of routine, in this case a DOS
routine.
◦ For now assume that INT just calls a procedure from DOS.
◦ The procedure looks at the AH register to find out what it has to do.
◦ In this example the value 9 in the AH register indicates that the
procedure should write a bit-string to the screen.
int 21h
◦ this time the AH register contains the value 4ch (AX=4c00h) and to
the DOS procedure that means "exit program".
◦ The value of AL is used as an "exit-code" 00h means "No error"
After running:
◦ Go to DOS and type “FIRST.exe” to debug.
◦ Type d -> display some addresses
◦ Type u -> you will see something
0F77:0000 B8790F MOV AX,0F79
0F77:0003 8ED8 MOV DS,AX
0F77:0005 B409 MOV AH,09
Segm ent Num ber & Offset