Assembly Language
Each personal computer has a microprocessor that manages the computer's arithmetical, logical and
control activities. Each family of processors has its own set of instructions for handling various
operations like getting input from keyboard, displaying information on screen and performing various
other jobs. These set of instructions are called 'machine language instruction'.
Assembly language is the closest thing to the hardware. In other words, while your Java, C#,
and JS/Python are all abstracted and higher away from the hardware, Assembly is at the
bottom, making working with the hardware easy and more efficient. Because then it gets
compiled into machine language, stuff your CPU can understand and execute. Its also very
structured. You can’t do something like this: object.get().modify().set();. In C++ for example,
you might say “int foo = 5;”. But in assembly, nothing is abstracted, so code looks like this
“move #5, foo”.
Processor understands only machine language instructions which are strings of 1s and 0s. However
machine language is too obscure and complex for using in software development. So the low level
assembly language is designed for a specific family of processors that represents various instructions
in symbolic code and a more understandable form.
ADVANTAGES OF ASSEMBLY LANGUAGE
It allows complex jobs to run in a simpler way.
It is memory efficient, as it requires less memory.
It is faster in speed, as its execution time is less.
It is mainly hardware-oriented.
It requires less instruction to get the result.
It is used for critical jobs.
It is most suitable for writing interrupt service routines and other memory resident
programs.
High Level vs. Assembly
High Level Languages
More programmer friendly
More ISA independent
Each high-level statement translates to several instructions in the ISA of the computer
Assembly Languages
– Lower level, closer to ISA
– Very ISA-dependent
– Each instruction specifies a single ISA instruction
– Makes low level programming more user friendly
– More efficient code
I will assume that you have installed Easy68K, ready to go. If you haven’t, go and install it. Now then,
writing “Hello World” in Assembly isn’t exactly a “1-line program”. In Java you might write
“System.out.println(“Hello World”);. Here’s Hello World in Assembly. BTW: you don’t need to write
assembly code in all caps if you don’t want to.
1
2 ;PUT ANY ASSIGNMENTS HERE
3
ORG $1000
4 START:
5 LEA MESSAGE, A1
6 MOVE.B #14, D0
7 TRAP #15
SIMHALT
8
9
*PUT VARIABLES AND CONSTANTS HERE
10 MESSAGE DC.B 'HELLO WORLD',0
11
12 END START
13
Let’s break all of this down, line by line, piece by piece.
1. ORG $1000: this tells the assembler that this program will be located and organized in memory at
address $1000
2. START: this is the first instruction of the program, it is also considered a subroutine. Think of it as
“int main()”.
3. LEA MESSAGE, A1: load the message variable into address register 1 (A1). We’ll cover this later.
4. MOVE.B #14, D0: move the byte, number 14, into data register D0.
5. TRAP #15: a hardware interrupt, think of it like typing a key. Basically, it tells the assembler: go
and display or read input.
6. SIMHALT: terminate the program.
7. MESSAGE DC.B ‘HELLO WORLD’,0: this is our string variable, called message, initialized by
‘DC.B’, and at the end, has a null terminator (0)
8. END START: the last line of the program, the program ends here
Assembler Syntax
Each assembly line begins with either a label, a blank (tab), an asterisk, or a semicolon . Each line
has four fields: {label[:]} mnemonic {operand list} {;comment}
Symbols
– Symbols are used as labels, constants, and substitution values
– Symbols are stored in a symbol table
– A symbol name is
a string of up to 200 alphanumeric characters (A-Z, a-z, 0-9, $, and _)
cannot contain embedded blanks
first character cannot be a number
case sensitive
- Symbols used as labels become symbolic addresses that are associated with locations in the
program
Label Field
– Labels are symbols
– Labels must begin in column 1.
– A label can optionally be followed by a colon
– The value of a label is the current value of the Location Counter (address within program)
– A label on a line by itself is a valid statement
– Labels used locally within a file must be unique
Mnemonic Field
– The mnemonic field follows the label field.
– The mnemonic field cannot start in column 1; if it does, it is interpreted as a label.
– The mnemonic field contains one of the following items:
68K instruction mnemonic (ie. ADD, MOV, JMP)
Assembler directive (ie. .data, .list, .equ)
Macro directive (ie. .macro, .var, .mexit)
Macro call
Operand Field
– The operand field follows the mnemonic field and contains one or more operands.
– The operand field is not required for all instructions or directives.
Figure 1(Anatomy of a typical 68k assembly language instruction)
Figure 1 shows the structure of a typical 68K instruction. Instructions with two operands are always
written in the form source, destination, where source is where the operand comes from and
destination is where the result goes to
– An operand may consist of:
Symbols
Constants
Expressions (combination of constants and symbols)
– Operands are separated with commas
ASSEMBLY PROGRAM SECTIONS
An assembly program can be divided into three sections:
The data section
The bss section
The text section
The data section is used for declaring initialized data or constants. This data does not change at
runtime. You can declare various constant values, file names or buffer size etc. in this section. The
syntax for declaring data section is: section .data
The bss Section
The bss section is used for declaring variables. The syntax for declaring bss section is:
section .bss
The text section
The text section is used for keeping the actual code. This section must begin with the
declarationglobal main, which tells the kernel where the program execution begins. The syntax for
declaring text section is:
section .text
global main
main:
THE 68000 MOTOROLA MICROPROCESSOR ARCHITECTURE /STRUCTURE
The 68K is a CPU made by Motorola. It’s one of the most successful microprocessors and is still in use
today. Embedded systems, computers, even game consoles have used it. Its a 32-bit CISC CPU. It has its
own instruction set,
The 68K has 16 general-purpose 32-bit user accessible registers. DO to D7 are data registers and AO
to A7 are address registers. An address register holds a pointer and is used in address register
indirect addressing. Note: The only instructions that can be applied to the contents of address
registers are add, subtract, move, and compare. Operations on the contents of an address register
always yield a 32-bit value whereas operations on data registers can be 8, 16, or 32 bits. Most 68K
instructions are register to register, register to memory, or memory to register. The following
defines some 68K instructions.
Figure 2: The structure of assembly language program
The Figure shows the structure of a typical 68K instruction. Instructions with two operands are
always written in the form source, destination, where source is where the operand comes from and
destination is where the result goes to.
ASSEMBLER DIRECTIVES
Assembly directives are used to specify:
– Starting addresses for programs
– Starting values for memory locations
– Specify the end of program text.
Assembly language statements are divided into executable instructions and assembler directives. An
executable instruction is translated into the machine code of the target microprocessor and
executed when the program is loaded into memory. In the example in Fig. 2, the executable
instructions are:
MOVE P, D0 Copy contents of P to DO
ADD Q, D0 Add contents of Q to DO
MOVE D0, R Store contents of DO in memory location R
STOP #$2700 Stop executing instructions1
We've already encountered the first three instructions. The last instruction, STOP #$2700,
terminates the program by halting further instruction execution. This instruction also loads the 68K's
status register with the value 270016, a special code that initializes the 68K. We use this STOP
instruction to terminate programs running on the simulator. An assembler directive tells the
assembler something it needs to know about the program; for example, the assembler directive ORG
means origin and tells the assembler where instructions or data are to be loaded in memory. The
expression ORG $1000 tells the assembler to load instructions in memory starting at address 1000 16.
Remember that in the instruction STOP #$2700 the operand is #$2700 . The '#' indicates a literal operand and
the '$' indicates hexadecimal. The literal operand 00100111000000002 is loaded into the 68K's status register
after it stops. The 68K remains stopped until it it receives an interrupt
We've used the value 100016 because the 68K reserves memory locations 0 to 3FF 16 for a special
purpose and 1000 is an easy number to remember. The second origin assembler directive, ORG
$2000, is located after the code and defines the starting point of the data area. We don't need this
assembler directive; without it data would immediately follow the code. We have used it because it's
easy to remember that the data starts at memory location $2000. Note: An important role of assembler
directives is in reserving memory space for variables, presetting variables to initial values, and binding variables
.
Decimal, Binary, and Hex…
Working with numbers in 68K means having to deal with more than just decimals. Binary and Hexadecimal
are also used.
In 68K, a decimal number is specified like so: #2 <———(just the pound symbol means its a
decimal)
A hex: #$24 <—–(the dollar symbol tells the assembler that this number is a hex)
Binary: #%10101010 <—–(the percent symbol says that this number is a binary)
Registers
68K has 8 data registers (D0-D7) and 8 address registers (A0-A7).
Data register? A data register holds data. In the context of 68K, this data is just numbers. If you put a
decimal number like 10 in either a data or address register, it will be interpreted as ‘A’. Yep, hex is king
here.
Address register? An address register also holds numbers, BUT the values in here are counted as
addresses. In C, you can define a pointer like so: int *val = 10;. Address registers are just that, POINTERS
to an address in memory. If you look at your A0 register and inside it says “00003000”, that means A0 is
pointing to memory address $3000.
There is a special register, A7. Why is it special? Because its the stack pointer. Yes, you have a stack at
your disposal, you can push and pop stuff from it. We’ll get to that (see Moving and loading things).
In Easy68K, if you run your program you will get a screen that displays these registers. Get comfortable
with it. Play around.
Whats Addressing?
In 68K addressing means how you handle sizes of data. It also means how we work with memory and
registers. In C++ for example, an integer is not the same thing as a double. A String is not a character.
Notice how in “Hello world”, there’s the instruction “MOVE.B”. Move is the instruction, and “.B” says that we
are going to move a byte of data into a register. In 68K, there are 3 sizes to consider:
.B (byte) you know, 8 bits in a byte, example: #$FF
.W (word) 16 bits in a word, example: #1234
.L (long-word) 32 bits in a long-word, example: #00809070
Addressing in 68K also means how we decide to deal with memory and registers.
Data Register to register: doing instructions that involve only data registers. Example: MOVE.B
D1, D2
Data register to address register: MOVE.L D5, A2 <—–we’re moving what’s in D5 into A2, so
now A2 points something else
Immediate data to register: ADD.W #1000, D4
Immediate data to indirect address register: MOVE.B #$AC, (A0) <—we’re moving a value into
whatever data A0 is pointing to (Note: immediate data means Data that appears in an instruction exactly as it is to be
processed)
Immediate data to absolute address: MOVE.W #2000, $9000 <—we’re moving a value into the
memory address 9000 (Note: absolute address is represented by the contents of a register)
Increment: MOVE.B #$9F, (A5)+ <–move a value into what A5 is pointing to, then increment
A5 address
Decrement: MOVE.L #00102340, -(A1) <—decrement the address in A1, then, move a value
into the address A1 is pointing to
More on increment/decrement: When you increment and decrement, you are ultimately adding or
subtracting by the size you are moving/using from the address. Example:
A0 = 000037BC <— original address
MOVE.B #$AC, (A0)+ = 000037BD <— Byte increments by 1
MOVE.W #$EEAC, (A0)+ = 000037BE <— Word increments by 2
MOVE.L #$22AC0044, (A0)+ = 000037C0 <— Long increments by 4
MOVE.B #30, -(A0) = 000037BB <– Byte decrement by 1 first, then move the value
Finally, note that if the symbol is on the left side of the register -> -(A2) it is called pre-decrement/increment,
meaning the the pointer will advance or decrease before data is moved. Vice versa if the symbol is on the
right side of the register -> (A2)+, it means that we first move data, then advance/decrease the pointer.
Syntax
Lets look at the “MOVE.B #14, D0” instruction again. Notice how after the instruction, we provide a
number, or register as a source. Then a destination register, or memory location (INSTRUCTION.SIZE
SOURCE, DEST). Not every instruction has this kind of syntax, but for now, here’s some examples:
LEA $4000, A0 ;load address 4000 into A0
ADD D3, D4 ;add what’s in D3 to D4
MOVE.W #$ABCD, -(A6) ;move a word value indirectly to A6 then decrement address
SUBI.W #4, D6 ;subtract 4 from D6
Instructions that don’t have similar kinds of syntax:
CLR.L D0 ;clear D0 entirely
SWAP D2 ;swap the top and bottom half of D2 (02347800 —-> 78000234)
JSR START ;jump to subroutine START
Printing and reading I/O
In “Hello world”, all we did was print a simple message to console. In Easy68K its called the simulator
window. Remember that this tutorial is closely linked with this assembler, so here’s how I/O works with it.
Notice how we worked with D0 and A1, lets examine those closely:
Moving a certain value into D0 yields different results.
o Example, moving #14 will display a null terminated string without a newline afterwards
o Moving #5 means the simulator will be reading input from the user
Moving a value into D1, like decimal number 10 (in hex it is ‘A’), means that a number will will be
printed.
Loading something in A1, usually means we’re going to print a string.
NOTE: when all said and done, the final step to displaying to I/O or reading from I/O is
writing: TRAP #15
1 *displaying decimal a number to console
move #3, d0 *task #3 in D0 lets us display a decimal
2 number
3 move.w #100, d1 *move the word value 100 into D1
4 trap #15 *the number displayed on screen is 100
RECOMMEND REFERRING TO THIS FOR ALL KINDS OF I/O
OPERATIONS: http://www.easy68k.com/QuickStart/TrapTasks.htm
Moving and Loading things
Moving things around loading data is what Assembly is mostly about. I’ll go straight to the instructions:
MOVE: moves values between registers and addresses
MOVEA: moves addresses between address registers
MOVEM: this is used to move things to the stack pointer A7. We can move a single register or
multiple registers:
o Example: MOVEM D0/D1, -(A7) <—-push D0 and D1 onto the stack
o MOVEM +(A7), D0-D1 <—-pop from the stack and store into D0 and D1
o MOVEM A0-A6, -(A7)
LEA: load a value or address into an address register (LEA $5000, A1)
Doing Math
68K has instructions to add, subtract, divide, multiply, complement, boolean logic, and shift
ADD: adds values between registers or immediate data to register/memory
ADDI: adds but you can only add immediate values, no registers allowed for source
ADDQ: add quick, sometimes used for incrementing address pointer
SUB: subtracts values between registers or immediate data to register/memory
SUBI: subtract but you can only subtract immediate values, no registers allowed for source
SUBQ: subtract quick, sometimes used for decrementing address pointer
MULS: multiply signed
MULU: multiply unsigned
DIVS: divide signed
DIVU: divide unsigned
ASL/ASR/LSL/LSR/ROL/ROR: shift the bits in a register, example -> ASL.B #4, D7 *shift the
value in D7 4 bits to the left
NOT: complement bits, example: 1010 —> 0101
OR: perform logical OR on bits… 1001 OR 0110 = 1111
AND: perform logical AND on bits… 1010 1100 AND 0010 1000 = 0010 1000
1 add d0, d1 *add byte contents of d0 to d1
2 addi.w #$AABB, d2 *add immediate word to d2
3 mulu #16, d4 *multiply d3 by 16
not.l d6 *complement the longword binary bits of d6
4 lsr.b #16, (a0) *shift the bits of the data pointed to by a0 16 bits to
5 the right
6 divu #2, d5 *divide d5 by 2
7 subq #4, a3 *decrement a3 by 4
Control and Checking
In all high level languages the if-statement is used for program flow and control/checking. In 68K, we can
compare.
CMP: you can compare immediate data to register, register to register, and more…
Example: CMP.B #4, D5 ;does the byte value in D5 equal 4?
o
After you compare however, YOU MUST DO SOMETHING, and that’s where
subroutines come in.
Usually after comparing, we branch somewhere.
CMP.B #4, D5 *does the byte value in D5 equal 4?
1 BEQ doStuff *branch if equal, to a subroutine called
2 'doStuff'
Branching and Jumping
Branching means we go to another piece of code, usually a subroutine. Jumping means jumping to a
subroutine and then coming back. There’s different kinds of branches and jumps.
BRA: just branch somewhere, like ‘BRA func1’ means go to subroutine ‘func1’
BEQ: branch if equal (if D0 = 4 then go here…)
BNE: branch if not equal
BLT: branch if less than
BGT: branch if greater than
BCC: branch on carry clear (see Condition codes)
BCS: branch on carry set (see Condition codes)
JSR: jump to subroutine, note, you will need to have ‘RTS’ at the end of a subroutine should you
use JSR ( Note : The function of RTS is to pulls the top two bytes off the stack (low byte first) and transfers program control to that
address+1 i.e. return (RTS) from the calling subroutine .
JMP: jump. Yes, only jump. You can either jump to a subroutine or jump to a specific
address/program displacement. Example: JMP (A0, D0)
Looping
How would you implement a for loop in 68K? A while loop?
1 func1: *for loop example
cmp.b #$A, D0 *does D0 equal 10?
2
beq done *if so, branch to done
3 addi #1, D0 *increment D0
4 bra func1 *else, go back and loop
5
6
7 func2: *while loop example
8 LSR.B #1, D3 *do a logical shift left 1 bit
BCS done *branch on carry set
9 BRA func2 *loop again
10
11 done:
12 SIMHALT
13
14
Subroutines
You can write subroutines for anything in 68K. Think of them like methods in C/C++. A subroutine
must first be defined by a label (the name), then the code afterwards. Example:
1
2 doStuff:
3 move.b #5, d0
move.l #$00ABC5F0, (a0)+
4 lea message, a1
5 cmp.w #$FF, a5
6 beq goThere
7 addq #4, d0
8 bra doStuff
9
doMoreStuff:
10 clr.l d2
11 move.l d2, a2
12 jsr routine2
13 rts
14
Macros
You can write a macro that will greatly simplify the way you do things. For example, here’s a how a
print macro looks like (all credit given to Prof. Charles Kelly of Easy68K forums)
1 *-----------------------------------------------------------
* Written by : Chuck Kelly
2
* Description : Demo of macros
3 * Macro definitions should be placed at the top of the source
4 file.
5 *-----------------------------------------------------------
6 OPT MEX
CODE EQU 0
7 TEXT EQU 1
8
9 SECTION TEXT
10 ORG $2000
11 SECTION CODE
12 ORG $1000
13
* print the text string
14 * use ENDL as second argument to add return and linefeed
15 PRINT MACRO
16 SECTION TEXT
17 MSG\@ DC.B \1
IFARG 2
18 IFC '\2','ENDL'
19 DC.B $D,$A
20 ENDC
21 ENDC
22 DC.B 0
SECTION CODE
23
24
25
26 MOVEM.L D0/A1,-(SP)
27 LEA MSG\@,A1
28 MOVE.B #14,D0
29 TRAP #15
MOVEM.L (SP)+,D0/A1
30 ENDM
31
32 HALT MACRO
33 MOVE.B #9,D0
34 TRAP #15
35 ENDM
36
**********************
37 * Program Start
38 **********************
39 START
40 PRINT <'Macro Demonstration Program'>,ENDL
41
HALT Halt the program
42
END START
43
44
45
46
Memory in 68K
Easy68K provides a way of viewing what’s going on in the memory we’re working with. When you run your
program, under the View menu you can click “Memory” and you will get this screen. Inside the blue outline
is our current address. In green is the offset. In red is the data residing at that location.
The current address, the data residing, and the offsets.
Condition codes
When you do math in 68K, certain condition codes are set. For example, if we compare something and its
equal, a special bit called ‘Z’ gets set to 1 that lets us know it is equal. 68K has these things called
condition codes:
X – extend bit
N – negative bit
Z – zero bit
V – overflow bit
C – carry bit
Variables, Constants, Misc
You can also set variables and constants above your code like so…
1
2 *you can define stuff here as well
address1 EQU $1500
3 address2 EQU $2000
4 variableX EQU #40
5 **********************
6 * Program Start
7 **********************
START
8
9 ;put program code here
10
11 text dc.b 'bla bla bla',0
12 END START
13
1. Write a 68K program to scan a region of memory and look for a specific value. If found, print the
address location.
2. Write a 68K program to take user input, such as your name, and display it to the console. Hint:
look into an ASCII table for reference.
3. Learn about subroutines, jumping, branching, and program displacement by writing small
subroutines.
4. Write a 68K program that takes number input from a user and prints to screen the hexadecimal
representation.
5. Write a 68K program that uses a macro to print the binary representation of a number.