It153l Introduction To Assembly Language Revised
It153l Introduction To Assembly Language Revised
Julius Cansino
Auxiliary Programs
Fourth: Loader
Built-in to the operating system and is never explicitly executed. Takes the relocatable code created by the linker, loads: it into memory at the lowest available location, then runs it.
Fifth: Debugger
Environment for running and testing assembly language programs.
Object Code
Linker
Relocatable Code
Loader
Source Code
Assembler
RAM
DOS
provides the environment in which programs run. Provides a set of helpful utility functions
Must be understood in order to create program in DOS
You can use the edit command in DOS or just use the notepad.
AH BH CH DH
AL BL CL DL
CS
DS SS ES
SP BP
SI DI
CU
Flag Register
1 2 3 4
Instruction Pointer
Assembly language
Thought goes into the use of the computer memory and the CPU registers
Register
Like a memory location in that it can store a byte (or work) value. No address in the memory, it is not part of the computer memory(built into the CPU)
AX
BX CX DX SI
The Accumulator
The Pointer Register The Loop Counter Used for multiplication and Division The Source string index register
SP
IP CS DS SS
DI
BP
ES
FLAG
CS
Code Segment
DS
Data Segment
SS
Stack Segment
ES
Extra Segment
IP
Instruction Pointer
16-bit number that points to the offset of the next instruction 16-bit number that points to the offset that the stack is using
used to pass data to and from the stack
SP
Stack Pointer
BP
Base Pointer
AX
Accumulator Register
Base Register Count Register Data Register
BX CX DX
mostly used for calculations and for input/output Only register that can be used as an index register used for the loop instruction input/output and used by multiply and divide
SI
Source Index
DI
Destination Index
Abr. OF
bit n 11
Description indicates an overflow when set used for string operations to check direction if set, interrupt are enabled, else disabled if set, CPU can work in single step mode if set, resulting number of calculation is negative
DF
Direction Flag
10
IF
Interrupt Flag
TF
Trap Flag
SF
Sign Flag
Abr.
Name
bit n
Description
ZF
Zero Flag
if set, resulting number of calculation is zero some sort of second carry flag indicates even or odd parity contains the leftmost bit after calculations
AF
Auxiliary Carry
PF
Parity Flag
CF
Carry Flag
If the processor store 1 word (16-bits) it stores the bytes in reverse order in the memory. 1234h (word) ---> memory 34h (byte) 12h (byte)
Memory value: 78h 56h derived value 5678h
(see above).
Segments overlaps
The address 0000:0010 = 0001:0000 Therefore, segments starts at paragraph boundaries
A paragraph = 16 bytes So a segment starts at an address divisible by 16
.model small .stack .data message db "Hello world, I'm learning Assembly !!!", "$"
My First Program
.code main proc mov ax,seg message mov ds,ax mov ah,09 lea dx,message int 21h
Identifiers
An identifier is a name you apply to items in your program. the two types of identifiers are "name", which refers to the address of a data item, and "label", which refers to the address of an instruction. The same rules apply to names and labels
Statements
A program is made of a set of statements, there are two types of statements, "instructions" such as MOV and LEA, and "directives" which tell the assembler to perform a specific action, like ".model small or .code
The identifier is the name as explained above. The operation is an instruction like MOV. The operands provide information for the Operation to act on. Like The comment is a line of text you can add as a comment, everything the assembler sees after a ";" is ignored.
MOV (operation) AX,BX (operands).
Example
MOV AX,BX ;this is a MOV instruction
The source code can only be assembled by an assembler or and the linker.
A86 MASM TASM we will use this one
Install TASM
How to Assemble
The Assemble
To assemble Type the ff. on the command prompt:
cd c:\tasm\bin tasm <filename/path of the source code>
tasm c:\first.asm
.code
main proc mov ax,seg message mov ds,ax mov ah,09 lea dx,message int 21h mov ax,4c00h int 21h main endp end main
.model small
Lines that start with a "." are used to provide the assembler with information. The word(s) behind it say what kind of info.
In this case it just tells the assembler that the program is small and doesn't need a lot of memory. I'll get back on this later.
.stack
This one tells the assembler that the "stack" segment starts here.
The stack is used to store temporary data.
.data
indicates that the data segment starts here and that the stack segment ends there.
.code
main proc
mov ax,seg message mov ds,ax mov ah,09 lea dx,message int 21h mov ax,4c00h int 21h
.code
indicates that the code segment starts there and the data segment ends there.
Code must be in procedures, just like in C or any other language. This indicates a procedure called main starts here. endp states that the procedure is finished. endmain main : tells the assembler that the program is finished. At the procedure called main in this case.
main proc
message db "xxxx"
DB means Define Byte and so it does. In the data-segment it defines a couple of bytes. It's called an identifier".
These bytes contain the information between the brackets. "Message" is a name to indentify this byte-string.
Variable name
.model small .stack .data message db "Hello world, I'm learning Assembly !!!", "$" .code main proc
Syntax:
MOV destination, source
Destination either registers or mem. Loc. Source can be either registers, mem. Loc. or numeric value
Codes we do earlier
;by default all numbers are decimal ; appending an "h" means hexadecimal ; "?" means "don't care about the value
Notice the size of the source and destination (must match in reg-reg, mem-reg, reg-mem Transfers)
mov ax,bar mov dl,foo mov bx,ax mov bl,ch mov bar,si mov foo,dh mov mov mov mov ax,5 al,5 bar,5 foo,5
; load the word-size register ax with ; the word value stored at location bar. ; load the byte-size register dl with ; the byte value stored at location foo. ; load the word-size register bx with ; the byte value in ax. ; load the byte-size register bl with ; the byte value in ch. ; store the value in the word-size ; register si at the memory location ; labelled "bar". ; store the byte value in the register ; dh at memory location foo. ; store the word 5 in the ax register. ; store the byte 5 in the al register. ; store the word 5 at location bar. ; store the byte 5 at location foo.
.model small .stack .data message db "Hello world, I'm learning Assembly !!!", "$" .code main proc mov ax, seg message
mov ds,ax
Here it moves the number in the AX register (the number of the data segment) into the DS register. We have to load this DS register this way (with two instructions) Just typing: "mov ds,segment message" isn't possible.
mov ah, 09
MOV again. This time it load the AH register with the constant value nine. LEA - Load Effective Address.
This instructions stores the offset within the datasegment of the bitstring message into the DX register. This offset is the second thing we need to know, when we want to know where "message" is in the memory. So now we have DS:DX.
AH BH CH DH
AL BL CL DL
CS
DS SS ES
SP BP
SI DI
CU
Flag Register
1 2 3 4
Instruction Pointer
.model small .stack .data message db "Hello world, I'm learning Assembly !!!", "$" .code main proc mov ax,seg message mov ds,ax mov ah,09 lea dx,message
int 21h
int 21h
This instruction causes an Interrupt. The processor calls a routine somewhere in memory. 21h tells the processor what kind of routine, in this case a DOS routine. For now assume that INT just calls a procedure from DOS. The procedure looks at the AH register to find out what it has to do. In this example the value 9 in the AH register indicates that the procedure should write a bit-string to the screen.
Load the Ax register with the constant value 4c00h this time the AH register contains the value 4ch (AX=4c00h) and to the DOS procedure that means "exit program". The value of AL is used as an "exit-code" 00h means "No error"
int 21h
After running:
Go to DOS and type debug FIRST.exe to debug. Type d -> display some addresses Type u -> you will see something
The stack is a place where data is temporarily stored The SS and SP registers point to that place like this: SS:SP
So the SS register is the segment and the SP register contains the offset
It is easy done by the instruction .stack that will create a stack of 1024 bytes. The stack uses a LIFO system (Last In First Out)
First the value 1234h was pushed after that the value 5678h was pushed to the stack. According to LIFO 5678h comes of first, so AX will pop that value and BX will pop the next. What is the value of AX and BX?
it "grows" downwards in memory. When you push a word (2 bytes) for example, the word will be stored at SS:SP and SP will be decreased to times. So in the beginning SP points to the top of the stack and (if you don't pay attention) it can grow so big downwards in memory that it overwrites the source code. Major system crash is the result.
If you fully understand this stuff (registers, flags, segments, stack, names, etc.) you may, from now on, call yourself a
"Level 0 Assembly Coder"
Suppose that we have 4 word-sized values stored in the variables MY, NAME, IS, NOBODY, (initial values 4, 5, 6, and 32) and that we want to move these values to the variables PLAY, MISTY, FOR, ME. Fortran Prog
INTEGER*2 MY,NAME,IS,NOBODY,PLAY,MISTY,FOR,ME DATA MY,NAME,IS,NOBODY/4,5,6,32/ .... PLAY=MY MISTY=NAME FOR=IS ME=NOBODY ....
Assembly Version
; destination variables play db ? misty db ? for db ? me db ? ; source variables my db 4 name db 5 is db 6 nobody db 32 ..... mov al,my ; PLAY=MY mov play,al mov al,name ; MISTY=NAME mov misty,al mov al,is ; FOR=IS mov for,al mov al,nobody ; ME=NOBODY mov me,al
DEBUG
System Debugger Has its own built-in editor and primitive assembler Its code does not need to be linked also has facilities for modifying memory locations and for examining memory locations
Debug
cannot be used to conveniently develop larger programs one must literally know the memory addresses of all data items. an (immediate) value is distinguished from the value stored at an address in that an address is enclosed in square brackets.
MOV AX, 200 load ax with the value 200 MOV AX, [200] load ax with the value at address 200
200 means 200H or 512
Our
program
this:
The program may be entered with the "A" or command followed by the address. (Annnn)
-a100 48EE:0100 48EE:0103 48EE:0106 48EE:0109 48EE:010C 48EE:010F 48EE:0112 48EE:0115 48EE:0118 mov mov mov mov mov mov mov mov ax,[200] [204],ax ax,[201] [205],ax ax,[202] [206],ax ax,[203] [207],ax
"assemble
We can check that the program is actually in the computer at address 100 with the "U" or "unassemble" command.
You may also type in U100,118 to specify the ending line to view
DEBUG Program
RAM
MOV AX,[200] assembler A10002
U command Unassembler
RAM
Executable Program Loader A10002
initialize the variables MY, NAME, IS, and NOBODY (which is to say, the values stored at memory locations 200 through 203).
can be done with the
"E"
or
"enter"
instruction (Ennnn)
<space> moves cursor to the next address <enter> terminated enter command Can be also possible to use
DB
and
DW
using
A:
db db db db
4 5 6 20
or
display
command
dnnn display from address nnnn dnnn,mmmm - display from nnnn to mmmm
or
Go
command
200
to
207
Terminating Debugger
Q or Quit Command
Modifying Registers Rrn where rn is the name of the registers(AX,BX...) Ex. to store 4567 (hex) in the CX register
Instructions
ADD Additional SUB - Subtraction
Syntax
Things to remember:
the sizes of the source and destination operands must match
The ADD (SUB) instruction adds (subtracts) the value of the source operand to (from) the value of the destination operand, and stores the result in the destination
from al
sub si,ax
DEC
DEC destination Subtracts one from destination
It can happen in integer addition that the result of an addition is too big for the destination address to hold
the carry flag is used to store both carries and borrows in integer addition and subtraction Ex:
MOV MOV MOV ADD AL,200 BL,195 CL,25 AL,BL
the carry flag would be "set" to one, and the result would be truncated to 8 bits: i.e., AL would contain 139.
the result, 225 (<256) is byte sized, so we would find that AL contains 225 and the carry flag is "cleared" to zero
AL,200 BL,195 CL,25 AL,BL
we are subtracting a smaller number from a bigger number, so AL register contains the result, 5, and the carry flag (which stores the "borrow") is cleared
ADC
ADC destination,source "add with carry ADC automatically adds in the carry left over from previous operations SBB destination,source "subtract with borrow SBB automatically subtracts the borrow
SBB
The instructions we have considered so far are limited in that they allow only linear code However, for real programming we need to have a way of transferring control from one program location to another.
We need to be able to choose which part of the computer's memory contains the program to be executed.
The mnemonic here can be a number of different things, but for the moment, we will assume that it is "JMP". A JMP instruction "jumps from the present location in memory (as indicated by the instruction pointer register IP) to the specified address in memory. In essence, JMP simply stores the given address in the IP register.
In DEBUG, the address operand is, of course, simply a number. For example, if we executed the instruction
JMP 121
then the very next instruction executed would be the instruction located at address 121h.
In tasm assembler
. . . JMP FOOBAR ADD AX,21 FOOBAR: INC AX . . .
"FOOBAR" is a label.
There are also a series of conditional jump instructions which perform a jump only if some special condition is met.
it always goes to the specified address, regardless of any special conditions that may obtain.
These instructions all have the general syntax given above, but their