Emu 8086
Emu 8086
BEKKOUCHE Mohammed
[email protected]
1
Overview of Emu8086
• emu8086 is the emulator of the 8086 microprocessor (Intel and AMD
compatible)
• This program is extremely useful for those who are just beginning to
study assembly language . It compiles the source code and runs it on
the emulator step by step
• The emulator executes programs like the real microprocessor in
single-step mode.
• The visual interface is very easy to use. You can watch registers , flags
(FLAGS) and memory while your program is running
• The arithmetic and logic unit ( ALU ) shows the internal working of
the central processing unit ( CPU )
• The assembly code runs in a virtual computer (a virtual machine) 2
Overview of Emu8086
• Machine code 8086 is fully compatible with all subsequent
generations of Intel microprocessors, including the Pentium II, the
Pentium 4 and the Pentium 5 , I'm sure the Intel Core is also
compatible with 8086 . This makes the 8086 code very portable , as it
works on both old and modern computer systems . Another
advantage of the 8086 instruction set is that it is easy to learn
• Emu8086 has a much simpler syntax than any of the major
assemblers, but will still generate a program that can be run on any
computer that runs 8086 machine code ; a great combination for
beginners
3
Where to start?
• Start Emu8086 by selecting its icon from the Start menu or by running
Emu8086.exe
• Select “ Examples ” from the “ File ” menu
• Click on the button [ Emulate ] (or press the F5 shortcut key )
• Click the [ Single Step ] button (or press the F8 shortcut key ), and
watch how the code is executed
• Try opening other examples , all examples are heavily commented, so
it's a great learning tool
4
What is the assembler
• Assembler is a so-called low-level language , that is to say that its
operation is very close to machine language
• It is closely linked to the architecture of the microprocessor. "Registers",
"Memory addresses", "Interrupts", "System calls" ...
• The advantage of assembly language is to be able to manage down to the
smallest byte of memory and to always know what code is being executed
by the microprocessor at a given moment.
• The assembler also has the advantage of being fast and low volume. It is
arguably the preferred language of all virus writers. Small comparison: a
program displaying "Hello, World!" in C weighs 15,839 bytes , while the
same program displaying the same message in Assembler weighs 23 bytes
5
Basic structure of a computer
• You need to learn about the basic structure of a computer to understand
anything. The model of a simple computer as I see it :
7
Registers general
8086 processor has 8 general purpose registers, each register has its own
name:
• AX - the accumulator register (divided into AH / AL )
• BX - the base address register (divided into BH / BL )
• CX - the count register (divided into CH / CL )
• DX - the data register (divided into DH / DL )
• SI - source index register
• DI - destination index register
• BP - base pointer
• SP - stack pointer
8
General
• The programmer determines the use of each general -purpose register.
• The main purpose of a register is to keep a number
• The size of these registers is 16 bits
• The content of these registers is something like: 0011000000111001b (in
binary form), 12345 in decimal form (human )
• 4 general purpose registers ( AX , BX , CX , DX ) consist of two separate 8-
bit registers , for example if AX = 00110000 00111001b , then AH =
00110000b and AL = 00111001b
• Therefore, when you modify one of the 8-bit registers, the 16-bit register is
also updated, and vice-versa. The same goes for the other 3 registers, " H "
is for " high " and " L " for the " low " part.
9
General
• Since registers are located inside the processor , they are much faster
than memory .
• Accessing a memory location requires the use of a system bus, so it
takes much longer
• Accessing registry data usually takes no time
• Therefore you should try to keep variables in registers
• The number of registers is very small and most registers have special
purposes that limit their use as variables, but they are still a great
place to store temporary data of calculations
10
Registers segment
• CS - points to the segment containing the current program
• DS - usually points to the segment where variables are defined
• ES – additional segment register , it is up to the encoder to define its
use
• SS - point to the segment containing the stack
Although it is possible to store data in segment registers , it is never a
good idea. Segment registers have a very special purpose - pointing to
accessible memory blocks
11
Segment registers
• Segment registers work with a general purpose register to access any
memory value. For example, if we want to access the memory at the
physical address 12345h (hexadecimal), we must set the DS = 1230h
and SI = 0045h
• This is good because this way, we can access much more memory
than with a single register limited to 16- bit values
• The CPU performs a physical address calculation by multiplying the
segment register by 10h and adding a general purpose register to it (
1230h * 10h + 45h = 12345h )
12
Segment registers
• The address formed with 2 registers is called an effective address
• By default, the BX , SI , and DI registers work with the DS segment
register ; BP and SP work with SS segment register
• Other general purpose registers cannot constitute an effective
address
• Also , although BX can form an effective address , BH and BL cannot
13
Registers for special purpose
• The IP register always works with the CS segment register and points
to the instruction being executed
• FLAGS flags register is modified automatically by the CPU after the
mathematical operations , this makes it possible to determine the
type of the result , and to determine the conditions for transferring
control to other parts of the program
• Usually you cannot access these registers directly, like you can access
AX and other general registers, but it is possible to change the values
of the system registers using some tricks which you will learn a bit
later
14
Memory
Logically the memory of the 8086 is organized in this way:
15
Access memory
• To access the memory, we can use these four registers: BX , SI , DI , BP
• By combining these registers in [] symbols , we can get different
memory locations
• These combinations are supported ( addressing modes ):
16
Access memory
• d8 - for an 8-bit signed immediate displacement (eg: 22 , 55h , -1 )
• d16 - for 16-bit signed immediate displacement (eg: 300 , 5517h , -
259 )
• The displacement can be an immediate value or an offset of a
variable , or both. If there are multiple values, the assembler
evaluates all values and calculates a single immediate value
• The displacement can be inside or outside the [] symbols , the
assembler generates the same machine code in both directions
• The displacement is a signed value, so it can be positive or negative
17
Access memory
Example :
Suppose DS = 100 , BX = 30 , SI = 70
The following addressing mode: [BX + SI] + 25 is calculated by the
processor at this physical address: 100 * 16 + 30 + 70 + 25 = 1725
• By default the DS segment register is used for all modes except those
with the BP register , for these the SS segment register is used
18
Access memory
• There is a simple way to remember all of these possible combinations
using this table :
• All valid combinations can be formed by taking only one element from
each column or by skipping the column taking nothing. BX and BP never go
together. Neither SI nor DI do .
Examples of valid addressing modes :
[BX+5]
[BX+IS]
[DI+BX-4] 19
Access memory
• The value in the segment register ( CS , DS , SS , ES ) is called a segment,
and the value in the general purpose register ( BX , SI , DI , BP ) is called an
offset
• When DS contains the value 1234h and SI the value 7890h , it can also be
written as 1234h:7890h . The physical address will be 1234h * 10h + 7890h
= 19BD0h
• If zero is added to a decimal number it is multiplied by 10 , however 10h =
16 , so if zero is added to a hexadecimal number it is multiplied by 16 , for
example :
7h = 7
70h = 112
20
Access memory
• In order to tell the compiler about the data type , these prefixes should be
used :
byte ptr - for byte
word ptr - for word ( two bytes )
Example :
byte ptr [BX]; access byte
Or
word ptr [BX]; access word
• Emu assembler also supports shorter prefixes :
b. - for byte ptr
w. - for word ptr
• In some cases, the assembler may automatically calculate the data type. 21
The MOV statement
Syntax: MOV Destination, Source
• copies the second operand ( source ) into the first operand (
destination )
• source operand can be an immediate value , a general-purpose
register , or a memory location
• the destination can be a general purpose register or a memory
location
• both operands must have the same size , which can be a byte or a
word
22
The MOV statement
These types of operands are accepted :
MOV REG, memory
MOV memory, REG
MOV REG, REG
MOV memory, immediate
MOV REG, immediate
REG : AX, BX, CX, DX, AH, AL, BL, BH, CH, CL, DH, DL, DI, SI, BP, SP.
ORG 100h ; this directive required for a simple 1 segment .com program
MOV AX , 0B800h ; set AX to hexadecimal value of B800h.
MOV DS , AX ; copy value of AX to DS.
MOV CL , 'A' ; set CL to ASCII code of 'A', it is 41h.
MOV CH , 11011111b ; set CH to binary value.
MOV BX , 15Eh ; set BX to 15Eh.
MOV [ BX ] , CX ; copy contents of CX to memory at B800:015E
RET ; returns to operating system
25
Variables
• The variable is a memory location . For a programmer it is much
easier to keep a value in a variable named " var1 " than in address
5A73:235B , especially when you have 10 or more variables
• The compiler supports two types of variables: BYTE and WORD
26
Variables
Syntax of a variable declaration :
name DB value
name DW value
DB - for Define Byte .
DW - for Define Word
• name - can be any combination of letters or numbers , but must start with
a letter . It is possible to declare unnamed variables by not specifying the
name (this variable will have an address but no name )
• value - can be any numeric value in any supported numbering system (
hexadecimal , binary or decimal ), or "?" symbol for variables that are not
initialized
27
Variables
MOV instruction is used to copy values from source to destination .
Let's see another example with the MOV instruction :
ORG 100h
MOV AL, var1
MOV BX, var2
RET ; stops the program.
VAR1 DB 7
var2 DW 1234h
28
Variables
Copy the code above into the source editor and press the F5 key to compile it and
load it into the emulator. You should get something like:
29
Variables
• In the memory list, the first column is a physical address , the second
column is a hexadecimal value , the third column is a decimal value ,
and the last column is an ASCII character value
• The compiler is not case sensitive, so " VAR1 " and " var1 " refer to
the same variable
• The offset of VAR1 is 0108h and the full address is 0B56: 0108
• The offset of var2 is 0109h , and the complete address is 0B56:0109 ,
this variable is a WORD so it occupies 2 bytes . The low byte is stored
at the lower address , so 34h is before 12h
30
Variables
• You can see there are other instructions after the RET instruction ,
this happens because the disassembler has no idea where the data
starts, it just processes the values in memory and it understands
them as valid 8086 instructions (we'll learn about them later)
31
Variables
• You can even write the same program using just the DB directive :
ORG 100h
DB 01h • Copy this code to the source editor and press
DB 0A0h on the key F5 to compile and load it into the
DB 0C3h
DB 08h emulator . YOU should get the same
DB 7 disassembled code and the same features !
DB 01h DB 34h • As you might guess, the compiler simply
DB 8Bh DB 12h converts the program source into a set of
bytes , this set is called machine code , the
DB 1Eh processor understands the machine code and
DB 09h executes it.
32
Arrays
• Arrays can be thought of as strings of variables . A character string is
an example of a byte array , each character is presented as an ASCII
code value ( 0..255 )
• Here are some example array definitions :
a DB 48h, 65h, 6Ch, 6Ch, 6Fh, 00h
b DB 'Hello', 0
• b is an exact copy of the array a , when the compiler sees a quoted
string it automatically converts it to a set of bytes.
• This graphic shows part of the memory where these arrays are
declared :
33
Arrays
• You can access the value of any array element using square brackets ,
for example :
MOV AL, a[3]
• You can also use one of the BX, SI, DI, BP memory index registers , for
example :
MOV SI, 3
MOV AL, a[SI]
34
Arrays
• If you need to declare a large array, you can use the DUP operator
DUP syntax :
number DUP ( value (s))
number - number of duplicates to create (any constant value )
value - expression as DUP will duplicate
For example : Another example :
c DB 5 DUP (9) d DB 5 DUP (1, 2)
is another way to declare: is another way to declare:
c DB 9, 9, 9, 9, 9 d DB 1, 2, 1, 2, 1, 2, 1, 2, 1, 2
• Of course, you can use DW instead of DB if it is necessary to store values greater
than 255 or less than -128. However, DW cannot be used to declare character
strings
35
Get the address of a variable
• There is a Load Effective Address ( LEA ) instruction and an alternative OFFSET
operator . OFFSET and LEA can be used to get the offset address of the
variable
Example :
ORG 100h
MOV AL, VAR1 ; check the value of VAR1 by moving it to AL .
LEA BX, VAR1 ; get the address of VAR1 in BX .
MOV BYTE PTR [BX], 44h ; modify the contents of VAR1 .
MOV AL, VAR1 ; check the value of VAR1 by moving it to AL.
RET
VAR1 DB 22h
END 36
Get the address of a variable
These lines:
LEA BX, VAR1
MOV BX, OFFSET VAR1
are compiled into the same machine code: MOV BX, num
num is a 16-bit value of the variable's offset .
37
Constants
• Constants are like variables, but they only exist until your program is
compiled ( assembled ). After defining a constant, its value cannot be
changed . To define the constants, the EQU directive is used :
name EQU < any expression >
For example :
k EQU 5
MOV AX, k
The example above is functionally identical to the code :
MOV AX, 5
38
Variables
You can display the variables during the execution of your program by
selecting 'Variables' in the 'View' menu of the emulator
39
Variables
• To display arrays, you need to click on a variable and set the "
Elements " property to the size of the array. In assembly language
there are no strict data types, so any variable can be presented as an
array
• The variable can be viewed in any numbering system :
HEX - hexadecimal (base 16)
BIN - binary (base 2 ).
OCT - octal (base 8 ).
SIGNED - signed decimal (base 10 ).
UNSIGNED - unsigned decimal (base 10 ).
CHAR - ASCII character code (there are 256 symbols, some symbols are
invisible) 40
Using the
emulator
• If you want to load your code into
the emulator, just click on the "
Emulate " button
• The emulator can also load
executables created by other
assemblers by selecting " Show
emulator " from the " Emulator "
menu
41
Using the emulator
Select " Examples " from
the File menu, load any
example, compile it, then
load it into the emulator:
42
Using the emulator
• The [ Single Step ] button executes the instructions one by one, stopping
after each instruction
• The [ Run ] button executes the instructions one by one with a delay
defined by [ step delay ] between statements
• Double-clicking on a register's text box opens the [ Extended Viewer ]
window with the value of that register converted into all possible forms.
You can change the registry value directly in this window
• Double-click on an item in the memory list to open [ Extended Viewer ]
with the WORD value loaded from the memory list at the selected
location. The least significant byte is at the lower address: LOW BYTE is
loaded from the selected position and HIGH BYTE from the next memory
address. You can modify the value of the memory word directly in the
window [ Extended Viewer ]
• The [ Flags ] button allows you to view and change flags at runtime
43