Programming in Assembly
Programming in Assembly
Programming in Assembly
GETTING STARTED
KEYWORDS: A SS E MB L Y
I N T E L 8086
This tutorial provide you some help in assembly programming. It is inspired from Tom Swan's book, Mastering Turbo Assembler. For this tutorial it is assumed that the reader: - has some basic knowledge about processor architectures; - wants to begin to work in assembly language; - has some programming knowledge. As you will see, this tutorial has several steps. In this one, we are going to present several different things before we proceed to assembly programming. OVERVIEW
OF THE
80 X 86 F A M I L Y
The 80x86 family was first started in 1981 with the 8086 and the newest member is the Pentium which was released thirteen years later in 1994. They are all backwards compatible with each other but each new generation has added features and more speed than the previous chip. Today there are very few computers in use that have the 8088 and 8086 chips in them as they are very outdated and slow. The number of 286 or 386 based computers around is declining as today's software becomes more and more demanding. Even 486's are being replaced by Pentiums. With the Pentium PRO and the MMX based CPUs Intel keeps increasing performance and features. As computers are used to manipulate various types of data, there has to be some physical places where these data can be stored. Some of these places are the registers. REGISTERS Registers are places in the CPU where a number can be stored and manipulated. There are three sizes of registers: 8-bit, 16-bit and on 386 and above 32-bit. There are four different types of registers: - general purpose registers, - segment registers, - index registers, - stack registers. First, we are going to describe the main registers. Later, we are going to describe the stack registers. The main registers are 16-bit registers. There are four general purpose registers: AX, BX, CX and DX. They are split up into 8-bit registers. AX is split up into AH which contains the high byte and AL which contains the low byte. On 386's and above there are also 32-bit registers, these have the same names as the 16-bit registers but with an 'E' in front i.e. EAX. You
can use AL, AH, AX and EAX separately and treat them as separate registers for some tasks. INDEX REGISTERS These registers are sometimes called pointer registers. They are 16-bit registers and are mainly used for string instructions. There are three index registers SI (source index), DI (destination index) and IP (instruction pointer). On 386's and above there are also 32-bit index registers: EDI and ESI. You can also use BX to index strings. STACK REGISTERS BP and SP are stack registers and are used when dealing with the stack. We will discuss about them when we will talk about the stack. STACK REGISTERS The designers of the 8088 decided that the maximum need for memory space will be one megabyte. So, the chip they build cannot go above that. The problem is to access a whole megabyte 20 bits are needed. But registers only have 16 bits so 4 more bites are needed. They came up with what they thought was a clever way to solve this problem: segments and offsets. This is a another way to do the addressing with two registers but not using 32 bits. To compute every single address the segment address as well as the offset are needed. A segment is usually 64kBytes long. To address a location inside a segment, we need the offset (16 bytes long). THE STACK As the number of registers used is low, how do they manage to manipulate the data? There is something called a stack which is an area of memory which you can save and restore values to. To physically imagine this we can conside a stack of plates. The last one you put on is the first one that you take off. This is sometimes referred to as Last On First Off (LOFO) or Last In First Out (LIFO). But, you have to be sure that you do not put to much data in the stack. MEMORY MODELS There are several memory models available. In stand alone assembly language programming, the small is usually the best choice. In the table below those models are described.
Memory Model tiny small medium compact large huge tchuge tpascal flat
Code Data Description near near code, data and stack in one 64k segment code and data in separate 64k segments; it is used for near near small-medium .EXE file far near data is limited to one 64k; code size is not limited the code is limted to one 64k; the size of the data is not near far limited far far the code and data size is not limited far far identical to large memory model same as large model but with different register far far assumptions; it is mostly used for Turbo C and Borland C programming near far provided for support of early versions of Turbo Pascal near near for use with OS/2 only
EQUATES In assembly language, constant values are known as equates, referring to the EQU directive that associates values with identifiers such as "BaseAddress" and "MaxValue". It is important to note that equates may appear anywhere in the program without restriction. Several examples are given below: Count EQU 10 Number EQU 5 Price = Count * Number MyCat EQU "Giully" - after declaring a symbol with EQU you cannot change its associated value, - the values declared with "=" can be changed as often as you want, - EQU can declare all kinds of equates including numbers, expressions and characters,
Assembly language is more human-readable than machine language. Generally, statements in assembly language are written using short codes for the instruction and arguments, such as "MOV $12 SP", as opposed to machine language, where everything is written as numbers. Assembly language can have comments and macros as well, to ease programming and understanding. Generally, programs called "assemblers" transform assembly language to machine language. This is a relatively straightforward process, there being a clear 1-to-1 transformation between assembly and machine language. This is as opposed to compilers, which do a complicated transformation between high-level language and assembly. - Assembly language is a symbolic representation of a processor's native code. Using machine code allows the programmer to control precisely what the processor does. It offers a great deal of power to use all of the features of the processor. The resulting program is normally very fast and very compact. In small programs it is also very predictable. Timings, for example, can be calculated very precisely and program flow is easily controlled. It is often used for small, real time applications. However, the programmer needs to have a good understanding of the hardware being used. As programs become larger, assembly language get very cumbersome. Maintenance of assembly language is notoriously difficult, especially if another programmer is brought in to carry out modifications after the code has been written. Assembly langauge also has no support of an operating system, nor does it have any complex instructions. Storing and retrieving data is a simple task with high level languages; assembly needs the whole process to be programmed step by step. Mathmatical processes also have to be performed with binary addition and subtraction when using assembly which can get very complex. Finally, every processor has its own assembly language. Use a new processor and you need to learn a new language each time. Assembly is a great language to use for certain applications, rotten for others and never for the faint hearted.
comparison of assembly and high level languages Assembly languages are close to a one to one correspondence between symbolic instructions and executable machine codes. Assembly languages also include directives to the assembler, directives to the linker, directives for organizing data space, and macros. Macros can be used to combine several assembly language instructions into a high level language-like construct (as well as other purposes). There are cases where a symbolic instruction is translated into more than one machine instruction. But in general, symbolic assembly language instructions correspond to individual executable machine instructions.
High level languages are abstract. Typically a single high level instruction is translated into several (sometimes dozens or in rare cases even hundreds) executable machine language instructions. Some early high level languages had a close correspondence between high level instructions and machine language instructions. For example, most of the early COBOL instructions translated into a very obvious and small set of machine instructions. The trend over time has been for high level languages to increease in abstraction. Modern object oriented programming languages are highly abstract (although, interestingly, some key object oriented programming constructs do translate into a very compact set of machine instructions). Assembly language is much harder to program than high level languages. The programmer must pay attention to far more detail and must have an intimate knowledge of the processor in use. But high quality hand crafted assembly language programs can run much faster and use much less memory and other resources than a similar program written in a high level language. Speed increases of two to 20 times faster are fairly common, and increases of hundreds of times faster are occassionally possible. Assembly language programming also gives direct access to key machine features essential for implementing certain kinds of low level routines, such as an operating system kernel or microkernel, device drivers, and machine control. High level programming languages are much easier for less skilled programmers to work in and for semi-technical managers to supervise. And high level languages allow faster development times than work in assembly language, even with highly skilled programmers. Development time increases of 10 to 100 times faster are fairly common. Programs written in high level languages (especially object oriented programming languages) are much easier and less expensive to maintain than similar programs written in assembly language (and for a successful software project, the vast majority of the work and expense is in maintenance, not initial development).
Computer Organization refers to the level of abstraction above the digital logic level, but below the operating system level.
Click on the image to test your understanding of this hierarchy of abstraction in systems organization.
At this level, the major components are functional units or subsystems that correspond to specific pieces of hardware built from the lower level building blocks described in the previous module. A closely related term, computer architecture, emphasizes the engineering decisions and tradeoffs that must be made in order to produce a "good" design. The computer architect answers questions like...