Assembly Languages
Assembly Languages
{ If you have any comments or questions regarding this tutorial, please E-mail
me }
While learning ASM, i found many tutorials to be very confusing, and did not cover
assembly in the detail that's necessary for such a complicated programming language
as this one. So, I write this rudementary tutorial in order to ease the pain others may
have learning ASM.
The problem with most beginner level tutorials is that they assume the reader has
previous programming knowledge in one language or another. While i'll make
comments that draw connections between programming in BASIC and ASM, i hope
to write this is such a way that you can skip these remarks without affecting your
learning, therefore making this a completely newbie-level tutorial.
I am also very much a beginner, so I have recent memory of learning alot of the stuff
i'm gonna cover.
First off, i beleive it very difficult to learn programming without programming as you
learn. So, i suggest you have a copy of TASM, a necessary utility for writing
assembly programs.
Also before you start, it's imperative you understand about Hexedecimal + binary.
END START
That is, all your programs should include these lines. Your whole program will go in
lines between "start" and "end start".
(By the way, all things that are code on this site are red text. Not necessarily all red
text is code, but all code is red text.)
It's very important that if you copy these lines into your file instead of using
Copy+Paste, notice the periods at the beginng of the first few lines. And, notice the
colon after START. Even the smallest dot is a very important peice in programming
so never overlook them.
Now, start and end start don't really mean much to a computer. But, to use, start is the
beginning of something. And end start doesn't make a lot of logical sense to us, but
that's how it goes, so just grin and bear it. End Start tell where the end of the part of
the main program is.
But, right now, our program does absolutely nothing! So, we may want to learn about
the different commands we can use in assembly:
2.3 - Interrupts
We can write a very simple program that puts just a character of text on the screen
using just "interrupts". If you're familiar with any higher level languages, you can
think of interrupts as essentially commands.
Interrupts each have some complicated operation(s) they perform, and all they require
is that you give them a small amount of information.
In this case, we'll be using an interrupt that can put text characters on the screen.
Because one interrupt may have many other functions it can perform, we must tell it
which one to do. Then, we give it the required information, and tell it to do whatever
it may do. We can therefore do very complicated operations while being totally
oblivious to how they work. Here is our example program:
.MODEL SMALL
.STACK 200H
.CODE
START:
Mov ah, 2
Mov dl, 1
Int 21h
END START
It may seem like a collection of completely arbitrary words and numbers. Only at first.
We soon realize that it is a VERY concrete concept. Every part along the way does it's
important part. The tiny peices of code result in one big program that does exactly
what we expected. Here's a breakdown, line by line, of what the program does:
1 ) We put the number 2 in a specific location in the computer's memory. Later, the
computer will look at this number and, in this case, this number tell which "function
number" the interrupt should do. As mentioned before, most interrupts can do a
variety of functions. So, we must tell it which one to do. In this case, we want the
DISPLAY OUTPUT function. This is function number 2. So, we put the number 2 in
a specific place, just waiting for the computer to look it up later
4 ) The final commands end the program. This is necessary at the end of the all your
programs, unless you want awful things to happen. If you forget this, random effects,
that will more than likely freeze up the computer, will result.
There we have it! We've effectively written a program doing exactly what we
expected from the outset.
A couple things you should note:
Firstly, the blanks line are just my style of seperating code to make it easier to read.
The assembler, which we'll explain using in just a second, doesn't care one way or the
other if there are blank lines, as long as they don't actually hurt the code in some way:
They generally don't. you can take them out if you don't like - You can add more at
your willl - It doesn't matter, because the only important part is the code; The
commands involved in our program.
Secondly, in Mov ah....... ah is not "10" in hexedecimal. In this instance, it's the name
of a place in memory. It's just a coincidence. More is explained in the next section.
First, let's explain "Mov". It appears to be shorthand for the word "Move". This makes
a lot of sense.
This command, unlike an interrupt, does a tiny, simple command. However, it's a very
very very very very important instruction in ASM.
The first one takes the number 2 and "Moves" it into a place the computer explicitly
calls "ah". So we can deduce that the next command moves the number 1 into a place
called "dl". It does.
The MOV instruction can be used in other ways. For example, we could say:
Mov ah, dl
The computer would take whatever is in dl and move it into ah. Well, to say "move" is
misleading, because it's not actually moved. Whatever's in dl stays there. But now, it's
also in ah. Likewise, this would work:
Mov dl, ah
So what are ah and dl anyway. We know from many previous mentions that they're
specific places in memory. They're called registers. The ones we're mainly concerned
with right now are AX, BX, CX, and DX.
They're made up of two 'peices' each - hence, smaller registers. Ah, that we've already
encountered, is one of the parts of ax. Ax also has another part called al. the h and l in
ah and al, mean "High" and "Low". They make up the higher and lower parts of the
register ax. For example, if we did this:
Mov ah, 1
Mov al, FF
Why? Because the "High" part contains 1, or 01. and the "low" part contains FF. So,
combined into the bigger register, they make 01FF
So, we conclude that many registers, or at the least the ones we care about are made of
2 smaller parts. And, to find their values, we combine them (Dont add them, though:
01 + FF = 100, not 01FF)
ah + al = ax
bh + bl = bx
ch + cl = cx
dh + dl = dx
One final thing to mention - al, bl, ah, bh, and so on, can each have a value of 0 to
255. So, when combined to make ax, bx, and so on, the total value possible for those
is 0 to 65,535
As long as your program has no problems, this will make a file in the same directory
called "first.obj". Then, type this into the address bar:
Tlink First.obj
Finally, this will make a program called "First.exe"! Hoorah! Our first successful
compile! (hopefully). Now, click on it to run it. If you have problems seeing it run
because it opens and closes itself too fast... well.... enjoy!
3.1 - Memory
True, MOV, interrupts, and registers are very important, as you just read. However,
there's not a whole lot that can be done using only them. To move on, we'll need to
understand a little bit about the computer's memory. And to do this, we also need to
just know about memory in general. We'll first start by how memory is divided up.
This can become quite complex, so just read through slowly, and go back over it if
something confuses you.
Basically, a computer's memory is a peice of circuitry; most of the time many peices.
It has small points in the circuits called "transistors" that can either have an electric
charge of 5v, or no charge. The millions of these that the computer has is where is
stores everything.
Taking into account our previous knowledge of binary, we remember that in binary a
digit can only be either 0 or 1. So, we could think of either a transistor with a charge,
or one with no charge, as the same as 1 and 0 in binary. This turns out to be true. 1
and 0 represent each transistor of memory.
Each transistor is called a "bit". This is short for "BInary digiT".
Well, hexedecimal is also important in our discussion. You see, if we wanted to look
at memory and it was all in binary form, it would be very cryptic -
1010111010000111001100011100011100111110.... and so on. So, to make memory
easier to read, we can read it in hexedecimal numbers
Well, recall that in hexedecimal the highest digit is F - which has a decimal equivilent
of 15. In binary, that would take up four digits to show:
1111
is the same as F in hexedecimal.
Since our previous unit was called a "Bit", to keep in the same naming theme, 4 bits
are called a "Nibble".
Then, it just goes up from there. All in all, it works like this:
8 bits = Byte
2 Bytes = Word
2 Words = Double Word (DWORD for short)
--- The rest i'm not sure of, but i think there's names for things up to 80bits ---
I'm sure Terabyte can be TB, petabyte PB, and exabyte EB; But, i'm never seen this
used. A terabyte is such a high amount of memory that most people don't ever need to
know the term.
In any case, we just need to deal with the terms "bit", "byte", and "word". They're the
ones that'll come up most often in conversation.
As I mentioned breifly in the last section, the registers ah, al, and so on could only
have a maximum value of 255. This may seem arbitrary at first - why not 999?
Because, they can only hold one byte. One byte is 8 bits, and the highest number we
can make in binary with 8 bits is this:
11111111
So when we put together ah and al, the highest number is 65535. Why? Well, each
register can hold 1 byte, or 2 nibbles, or 8 bits - it all means the same thing. So, with
two registers we have 2 bytes, or 4 nibbles, or 16 bits. Assuming we made the highest
possible hexedecimal number with 4 nibbles, it would look like this:
FFFF
Punch that into your computer's calculator and convert it to decimal, and, suprise
suprise : It equals 65535
3.2 - Addressing
In order to use the computer's memory - i.e. store numbers, text, etc - we have to
understand how the computer goes about organizing it.
It does this by something called Segments and Offsets.
These are used to communicate between ourselves and the computer, where things
should be put and where they should be gotten from. Whenever we want to read or
write to memory, we must use numbers pointing the the exact location of the BYTE
we want to read; Specifically, 2 numbers, called the Segment and Offset.
Usually these two numbers are WORDs [16 bits each]. One points to the general area
of memory, the "Segment". And the other, how many bytes into that segment, known
as the "offset".
This way, we can use up to 64KB of memory at once [65536 bytes]. For example, say
these numbers [hexedecimal] we stored somewhere in memory:
00 AB D2 AC 98 4E 67
and so on.. Now, say we wanted to read those numbers. Well, the computer has
millions of bytes of memory - so we must have some way of specifying what part of
memory they're in. This is called their "Address". For a real life addres with the street
name and the number. Essentially the street name is what "part" of the city you live in.
We do the same with computer memory, but both are numbers. So, that data above
may be located at:
FE00:0000
FE00 would be the "segment", or part. And 0000 would be how far in the data starts.
So the address FE00:0000 would "point" to the hexedecimal number 00.
In that case, FE00:0001 would point to the hexedecimal number AB. FE00:0002
points to D2, and so on.
Bare this in mind as we cover just one more section, before making use of what we
now know about memory.
END START
So, what does this program do. Well, first, it puts fe00, the segment of the text, into
AX. We use the MOV instruction to do this, which in this case 'moves' (or rather
copies) fe00 into the ax register. fe00.
But, we wanted ds to have the segment. Well, that's one quirk of the segment registers
- you're not allowed to change them directly. So, you cant just put a number right into
ds. You can, however, put another register into them.
So, we just put the segment number first into ax. Now, we move it into ds, thereby
bypassing this charming little fact of DS.
Then we put 0 into dx. For this interrupt, it requires that we have the segment of the
text in DS and the offset in dx. Since the offset is 0, we put 0 into dx.
Next, we put 9 into ah. Since int 21 has a lot of different functions it can do, we must
specify which on we want. The one to print text, by specifying a segment and offset,
is #9.
And finally, we use int 21 again, but this time to end the program.
In theory, this is just great. But, memory doesn't work like that. We generally don't
just put whatever we want, where we want. At least not at this stage.
For example, when you run a program like this one [if you were to compile and run it,
which i don't recommend], the computer picks out a free space in memory to load the
program itself. You don't specify this. So, what have we accomplished then with
Segments and Offsets if we can't use them - we can, as you will see.
3.4 - Variables
DS was important to introduce in the previous section, because when you write a
program, you can have things called "variables". And whatever you put in these
variables is ususally put in the "Data Segment", which is what DS points to.
When the compiler/assembler is done changing your program into something the
computer can actually read, it doesn't actually use variables. But they make life a lot
easier for programmers.
So, what are variables and how do we use them? When, a variable is where you can
store data. Strings (text), numbers, etc. They're called variables because, well, they
can vary. Not only can they contain a number or something like that, but you change
them as much as you need during your program. This makes programs very much
more versatile and useful.
For example, let's rewrite that last program so that it does actually work:
.MODEL SMALL
.STACK 200H
.DATA <--------- This is a new part! Make sure to include this
Textstring db "I'm a string$"
.CODE
START:
END START
Wow. A lot of things to explain here. Let's start from the top downward. You'll notice
there's a new part that should be included in the beginning. The part called .DATA
declares what variables we have.
As always, the period in front of DATA is very important. Also, make sure
that .DATA comes before .CODE, because .CODE says that everything after it is part
of the code.
Again, we put the segment into ax first, since we can't move it straight into ds. Once
very convient feature of the assembler is that we don't have to figure out the segment
and offset that our variable is at; Which is good, because as we said, the computer
decides quite randomly - it would make it tough to find where our variables are in
memory. So, by saying SEGMENT Textstring, we move the segment of that variable
into ax instad of what's actually in the variable. The same for OFFSET Textstring. It
puts the offset of the variable textstring into the register, instead of the actual varible.
One more unexplained part - What's with that line after .DATA:
Well, Textstring is the name of the variable - we must specify the name we want to
call the variable first.
Next, db, stands for "Declare Byte(s)". It can either be used if we want our variable to
be one byte long, or multiple bytes. In this case, it's multiple bytes, because each
character of text takes up one byte.
Finally, we tell the compiler what we want to be in the variable. This can be changed
by your program, but we just tell what we want it to start at.
One more little detail of int 21, function 9 is that the text your printing must have a
dollar sign at the end. It doesn't actually print a dollar sign on the screen, it just tells
where the text ends.
Go ahead and compile and run this program. Unlike the last one, it should work.
to Page 2
or Appendix