Computer Science Machine Language
Computer Science Machine Language
Kaeli
MACHINE LANGUAGE
David R. Kaeli
Department of Electrical and Computer Engineering, Northeastern University, USA
Contents
1. Summary
2. Introduction
2.1 Assembler Directives
3. Assembly Language
S
TE S
3.1 ELF
R
3.2 COFF
AP LS
3.3 Java Class File
4. Binary Code
4.1 Microcode
C EO
4.2 Control storage
4.3 Microinstruction decoding
4.4 Macroinstructions
4.5 Optimizations
E –
5. Conclusion
H
Glossary
PL O
Bibliography
M SC
Summary
Machine Language is the representation of a program that either executes directly on the
central processing unit or is represented in an assembly language that is later compiled
SA NE
down to binary code. This article discusses machine language, its different forms, and
the microarchitectural data path that supports the execution of the machine language.
1. Introduction
U
model is commonly used in the embedded computing field, where a small, compact
program controls the underlying hardware.
Both binary and assembly code are forms of machine language. This article will
provide an overview of a typical assembly language, as well as a description a typical
binary code format. Finally, we will discuss how the machine language is executed on
the hardware, and describe some of the issues associated with machine code execution.
2. Assembly Language
S
TE S
An assembly language (sometimes referred to assembler or assembly) provides the
R
AP LS
program almost complete control with the actual realization of their program. The
benefits of writing in assembly code are that the programmer has a lot more control over
the contents of the final binary code. The assembler (the program which transforms
C EO
assembly code to binary code) can still apply some optimizations, though the executed
binary code will closely resemble the assembly code.
three fields:
H
PL O
While fields 1 and 3 are optional, every line in an assembly language contains an
SA NE
instruction field.
High-level languages will generate multiple assembly instructions per source code line,
while a single binary level instruction. Presently, most embedded software is developed
U
in assembly since the program has a lot more control over the final binary.
The reason why many programmers also use high-level languages is a tradeoff between
high performance and productivity.
The reason why programmers do not write their programs at the binary level is that they
can still obtain the control over the final binary by writing in assembly code, though
they do not have to worry about the actual bit values in each instruction, and can instead
work with symbols (e.g., Loop1) and instruction operation abbreviations (e.g., MOV,
ADD and CMP, to denote move, addition and compare operations, respectively).
Note that the above assembly program is written in a format that is compatible with a
particular assembler. Each assembler will demand a particular specification of the
assembly code format. The code can be assembled and run on a number of different
versions of the Intel 80X86 microprocessor family (e.g., the 80486, PentiumI, PentiumII,
PentiumIII, PentiumIV, etc.).
S
TE S
R
AP LS
C EO
Figure 1 shows an example of an assembly code format.
Some of these tasks include allocating storage, aligning code or data on a memory
boundary, or performing conditional execution of portions of a program.
M SC
SA NE
U
In Figure 2 we provide an example of assembly directives. The first two lines tell the
assembler to allocate enough storage for two doublewords (a doubleword on the Intel
Pentium is 32 bits), give these locations the labels of A and B, and initialize their values
to 100 and 200, respectively. Most assembly languages provide the programmer with a
number of shortcuts that reduce the
3. Binary Code
Once programs have been compiled, they will generate object files. A single program
many comprise many object files. These files are linked together to form an executable
binary. There are a number of formats for executable files. Some of these include
ELF
COEFF
S
TE S
Java Class File
R
AP LS
3.1 ELF
The executable and linking format (ELF) was originally developed by Unix System
C EO
Laboratories and is now a very popular executable file format. ELF is the default binary
format on operating systems such as Linux and Solaris. ELF executable files contain
executable code, sometimes referred to as text and data sections. Each executable
image contains tables, which describe how the image should be mapped into virtual
memory. Some of the capabilities of ELF include: dynamic linking, dynamic loading,
E –
imposing runtime control on a program, and an improved method for creating shared
H
libraries. The ELF representation of control data in an object file is platform
PL O
making the ELF object files compatible across multiple platforms and architectures of
different size.
SA NE
The three main types of ELF files are executable, relocatable, and shared object files.
These file types hold the code, data, and information about the program that the
operating system and/or link editor needs to read. The three types of files are
summarized as follows:
U
3.2 COFF
The Common Object File Format (COFF) is the format used by Microsoft for its object
files. A full specification of COFF can be found at the Microsoft Developer Network.
File Header – contains an overview of the file and controls the layout of other
sections
Optional Header – used to store the instruction pointer address for executable
files
Section Header - maintains the location and size information about code and
data sections
Section Data - contains code and data for the program
Relocation Directives – contains fix-up information needed when relocating a
section
Line Numbers – holds the address of each line number in the code/data section
Symbol Table – contains symbol information relating locations in the source
code to code in the executable
S
TE S
String Table – stores symbol names
R
AP LS
Some of the main features of COFF include:
-
-
M SC
Visit: http://www.eolss.net/Eolss-sampleAllChapter.aspx
U
Bibliography
1. (1985). IEEE Standard for Microprocessor Assembly Language. (IEEE Std. 694-1985), New York,
NY, Institute of Electrical and Electronic Engineers [IEEE specification of microprocessor assembly
language format]
2. Irvine K.R. (1999). Assembly Language for Intel-Based Computers, 3rd edition, Prentice Hall, Upper
Saddle River, NJ [A reference guide for assembly code targeting the 80X86 family of microprocessors.]
3. Microsoft Developers Network, http://msdn.microsoft.com [Extensive information on the COFF
binary format.]
4. (1990). UNIX Software Operations, UNIX System V Release 4 Programmers Guide: Ansi C and
Programming Support Tools, STREAMS, Prentice Hall [Extensive information on the ELF binary format.]
5. Venners B. (1996). The Java Class File Lifestyle, Artima Software Company [Extensive information
on the Java Class File format.]
6. Wilkes M.V. (1951). The Best Way to Design an Automatic Calculating Machine, Report of
Manchester University Computer Inaugural Conference, 16-18, July [The seminal work on
microprogramming.]
7. Mano M.M. and Kime C.R. (2000). Logic and Computer Design Fundamentals, 2nd edition,
Prentice-Hall, Inc., Upper Saddle River, NJ [A standard reference for machine organization and design.]
8. Tannenbaum A.S. (1990). Structured Systems Architecture, 3rd edition, Prentice Hall, Upper Saddle
River, NJ [A standard reference for machine organization and design.]
9. Stallings W. (2000). Computer Organization and Architecture, 5th edition, Prentice Hall, Upper
Saddle River, NJ [A higher level text on computer architecture.]
10. Hennessy J.L. and Patterson D.A. (1996). Computer Architecture: A Quantitative Approach, 2nd
edition, Morgan Kaufman, San Francisco, CA [An advanced text on computer architecture.]
11. Patterson D.A. and Hennessy J.L. (1994). Computer Organization and Design: The
Hardware/Software Interface, Morgan Kaufman, San Francisco, CA [An intermediate level text on
computer architecture.]
S
TE S
R
AP LS
C EO
E –
H
PL O
M SC
SA NE
U