What Compiler Is
What Compiler Is
The role of computers in daily life is growing each year. Modern microprocessors are found in cars,
microwave ovens, dishwashers, mobile telephones, GPSS navigation systems, video games and personal
computers. Each of these devices must be programmed to perform its job. Those programs are written in
some “programming” language – a formal language with mathematical properties and well-defined
meanings – rather than a natural language with evolved properties and many ambiguities. Programming
languages are designed for expressiveness, conciseness, and clarity. A program written in a programming
language must be translated before it can execute directly on a computer; this translation is accomplished
by a software system called a compiler.
A compiler is just a computer program that takes as input an executable program and produces as output
an equivalent executable program.
Definition (Compiler)
While many issues in compiler design are amenable to several different solutions, there are two principles
that should not be compromised.
The first principle that a well- designed compiler must observe is inviolable.
“The compiler must preserve the meaning of the program being compiled.”
The code produced by the compiler must faithfully implement the “meaning” of the source-code
program being compiled. If the compiler can take liberties with meaning, then it can always generate the
same code, independent of input. For example, the compiler could simply emit a nop or a return
instruction.
The second principle that a well-designed compiler must observe is quite practical.
“The compiler must improve the source code in some discernible way.”
If the compiler does not improve the code in some way, why should any-one invoke it? A
traditional compiler improves the code by making it directly executable on some target machine. Other
“compilers” improve their input in different ways. For example, tpic is a program that takes the
specification for a drawing written in the graphics language pic, and converts it into L A TEX; the
“improvement” lies in L A TEX’s greater availability and generality. Some compilers produce output
programs in the same language as their input; we call these “source-to-source” translators. In general,
these systems try to restate the program in a way that will lead, eventually, to an improvement.
You may never write a commercial compiler, but that's not why we study compilers. We study
compiler construction for the following reasons:
1
Writing a compiler gives a student experience with large-scale applications development. Your
compiler program may be the largest program you write as a student. Experience working with
really big data structures and complex interactions between algorithms will help you out on your
next big programming project.
Compiler writing is one of the shining triumphs of CS theory. It demonstrates the value of theory
over the impulse to just "hack up" a solution.
Compiler writing is a basic element of programming language research. Many language researchers
write compilers for the languages they design.
Many applications have similar properties to one or more phases of a compiler, and compiler
expertise and tools can help an application programmer working on other projects besides
compilers.
The name "compiler" is primarily used for programs that translate source code from a high-level
programming language to a lower level language (e.g., assembly language or machine language).
A program that translates from a low level language to a higher level one is a decompiler.
A program that translates between high-level languages is usually called a language translator,
source to source translator, or language converter.
A language rewriter is usually a program that translates the form of expressions without a change
of language.
History
Software for early computers was exclusively written in assembly language for many years. Higher level
programming languages were not invented until the benefits of being able to reuse software on different
kinds of CPUs started to become significantly greater than the cost of writing a compiler. The very limited
memory capacity of early computers also created many technical problems when implementing a
compiler.
Towards the end of the 1950s, machine-independent programming languages were first proposed.
Subsequently, several experimental compilers were developed. The first compiler was written by Grace
Hopper, in 1952, for the A-0 programming language. The FORTRAN team led by John Backus at IBM is
generally credited as having introduced the first complete compiler, in 1957. COBOL was an early language
to be compiled on multiple architectures, in 1960.
In many application domains the idea of using a higher level language quickly caught on. Because of the
expanding functionality supported by newer programming languages and the increasing complexity of
computer architectures, compilers have become more and more complex.
Early compilers were written in assembly language. The first self-hosting compiler — capable of compiling
its own source code in a high-level language — was created for Lisp by Hart and Levin at MIT in 1962. Since
the 1970s it has become common practice to implement a compiler in the language it compiles, although
both Pascal and C have been popular choices for implementation language. Building a self-hosting compiler
is a bootstrapping problem -- the first such compiler for a language must be compiled either by a
2
compiler written in a different language, or (as in Hart and Levin's Lisp compiler) compiled by running the
compiler in an interpreter.
Compilers in education
Compiler construction and compiler optimization are taught at universities as part of the computer science
curriculum. Such courses are usually supplemented with the implementation of a compiler for an
educational programming language. A well-documented example is Niklaus Wirth's PL/0 compiler, which
Wirth used to teach compiler construction in the 1970s. [3] In spite of its simplicity, the PL/0 compiler
introduced several influential concepts to the field:
1. Program development by stepwise refinement (also the title of a 1971 paper by Wirth)
2. The use of a recursive descent parser
3. The use of EBNF to specify the syntax of a language
4. A code generator producing portable P-code
5. The use of T-diagrams in the formal description of the bootstrapping problem
Compiler output
One method used to classify compilers is by the platform on which the generated code they produce
executes. This is known as the target platform.
A native or hosted compiler is one whose output is intended to directly run on the same type of computer
and operating system as the compiler itself runs on. The output of a cross compiler is designed to run on a
different platform. Cross compilers are often used when developing software for embedded systems that
are not intended to support a software development environment.
3
The output of a compiler that produces code for a virtual machine (VM) may or may not be executed on
the same platform as the compiler that produced it. For this reason such compilers are not usually
classified as native or cross compilers.
Higher-level programming languages are generally divided for convenience into compiled languages and
interpreted languages. However, there is rarely anything about a language that requires it to be exclusively
compiled, or exclusively interpreted. The categorization usually reflects the most popular or widespread
implementations of a language — for instance, BASIC is thought of as an interpreted language, and C a
compiled one, despite the existence of BASIC compilers and C interpreters.
In a sense, all languages are interpreted, with "execution" being merely a special case of interpretation
performed by transistors switching on a CPU. Modern trends toward just-in-time compilation and bytecode
interpretation also blur the traditional categorizations.
There are exceptions. Some language specifications spell out that implementations must include a
compilation facility; for example, Common Lisp. Other languages have features that are very easy to
implement in an interpreter, but make writing a compiler much harder; for example, APL, SNOBOL4, and
many scripting languages allow programs to construct arbitrary source code at runtime with regular string
operations, and then execute that code by passing it to a special evaluation function. To implement these
features in a compiled language, programs must usually be shipped with a runtime library that includes a
version of the compiler itself.
Hardware compilation
The output of some compilers may target hardware at a very low level. For example a Field Programmable
Gate Array (FPGA) or structured Application-specific integrated circuit (ASIC). Such compilers are said to be
hardware compilers or synthesis tools because the programs they compile effectively control the final
configuration of the hardware and how it operates; the output of the compilation are not instructions that
are executed in sequence - only an interconnection of transistors or lookup tables. For example, XST is the
Xilinx Synthesis Tool used for configuring FPGAs. Similar tools are available from Altera, Synplicity,
Synopsys and other vendors.
Compiler design
The approach taken to compiler design is affected by the complexity of the processing that needs to be
done, the experience of the person(s) designing it, and the resources (eg, people and tools) available.
A compiler for a relatively simple language written by one person might be a single, monolithic piece of
software. When the source language is large and complex, and high quality output is required the design
may be split into a number of relatively independent phases, or passes. Having separate phases means
development can be parceled up into small parts and given to different people. It also becomes much
easier to replace a single phase by an improved one, or to insert new phases later (eg, additional
optimizations).
The division of the compilation processes in phases (or passes) was championed by the Production Quality
Compiler-Compiler Project (PQCC) at Carnegie Mellon University. This project introduced the terms front
end, middle end (rarely heard today), and back end.
4
All but the smallest of compilers have more than two phases. However, these phases are usually regarded
as being part of the front end or the back end. The point at where these two ends meet is always open to
debate. The front end is generally considered to be where syntactic and semantic processing takes place,
along with translation to a lower level of representation (than source code).
The middle end is usually designed to perform optimizations on a form other than the source code or
machine code. This source code/machine code independence is intended to enable generic optimizations
to be shared between versions of the compiler supporting different languages and target processors.
The back end takes the output from the middle. It may perform more analysis, transformations and
optimizations that are for a particular computer. Then, it generates code for a particular processor and OS.
This front-end/middle/back-end approach makes it possible to combine front ends for different languages
with back ends for different CPUs. Practical examples of this approach are the GNU Compiler Collection,
LLVM, and the Amsterdam Compiler Kit, which have multiple front-ends, shared analysis and multiple
back-ends.
Classifying compilers by number of passes has its background in the hardware resource limitations of
computers. Compiling involves performing lots of work and early computers did not have enough memory
to contain one program that did all of this work. So compilers were split up into smaller programs which
each made a pass over the source (or some representation of it) performing some of the required analysis
and translations.
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a
compiler and one pass compilers are generally faster than multi-pass compilers. Many languages were
designed so that they could be compiled in a single pass (e.g., Pascal).
In some cases the design of a language feature may require a compiler to perform more than one pass
over the source. For instance, consider a declaration appearing on line 20 of the source which affects the
translation of a statement appearing on line 10. In this case, the first pass needs to gather information
about declarations appearing after statements that they affect, with the actual translation happening
during a subsequent pass.
The disadvantage of compiling in a single pass is that it is not possible to perform many of the sophisticated
optimizations needed to generate high quality code. It can be difficult to count exactly how many passes an
optimizing compiler makes. For instance, different phases of optimization may analyse one expression
many times but only analyse another expression once.
Splitting a compiler up into small programs is a technique used by researchers interested in producing
provably correct compilers. Proving the correctness of a set of small programs often requires less effort
than proving the correctness of a larger, single, equivalent program.
While the typical multi-pass compiler outputs machine code from its final pass, there are several other
types:
A "source-to-source compiler" is a type of compiler that takes a high level language as its input and
outputs a high level language. For example, an automatic parallelizing compiler will frequently take
in a high level language program as an input and then transform the code and annotate it with
parallel code annotations (e.g. OpenMP) or language constructs (e.g. Fortran's DOALL statements).
5
Stage compiler that compiles to assembly language of a theoretical machine, like some Prolog
implementations
o This Prolog machine is also known as the Warren Abstract Machine (or WAM). Bytecode
compilers for Java, Python, and many more are also a subtype of this.
Just-in-time compiler, used by Smalltalk and Java systems, and also by Microsoft .Net's Common
Intermediate Language (CIL)
o Applications are delivered in bytecode, which is compiled to native machine code just prior
to execution.
From the diagram on the next page, you can see there are two main stages in the compiling process:
analysis and synthesis. The analysis stage breaks up the source program into pieces, and creates a
generic (language-independent) intermediate representation of the program. Then, the synthesis stage
constructs the desired target program from the intermediate representation. Typically, a compiler’s
analysis stage is called its front-end and the synthesis stage its back-end. Each of the stages is broken
down into a set of "phases" that handle different parts of the tasks. (Why do you think typical
compilersseparate the compilation process into front and back-end phases?)
1) Lexical Analysis or Scanning: The stream of characters making up a source program is read from left to
right and grouped into tokens, which are sequences of characters that have a collective meaning.
Examples of tokens are identifiers (user-defined names), reserved words, integers, doubles or floats,
delimiters, operators, and special symbols.
6
Example of lexical analysis:
int a;
a = a + 2;
2) Syntax Analysis or Parsing: The tokens found during scanning are grouped together
using a context-free grammar. A grammar is a set of rules that define valid structures in the programming
language. Each token is associated with a specific rule, and grouped together accordingly. This process is
called parsing. The output of this phase is called a parse tree or a derivation, i.e., a record of which
grammar rules were used to create the source program.
Part of a grammar for simple arithmetic expressions in C might look like this:
The symbol on the left side of the "->" in each rule can be replaced by the symbols on the right. To parse a
+ 2, we would apply the following rules:
When we reach a point in the parse where we have only tokens, we have finished.
By knowing which rules are used to parse, we can determine the structures present in the source program.
A source program which can be parsed is syntactically correct.
7
3) Semantic Analysis: The parse tree or derivation is checked for semantic errors i.e., a statement that is
syntactically correct (associates with a grammar rule correctly).
However, a syntactically correct statement may disobey the semantic rules of the source language.
Semantic analysis is the phase where we detect such things as use of an undeclared variable, a function
called with improper arguments, access violations, and incompatible operands and type mismatches.
int arr[2], c;
c = arr * 10;
A lot of the semantic analysis work pertains to type checking. Although the C fragment above will scan into
valid tokens and successfully match the rules for a valid expression, it isn't semantically valid. In the
semantic analysis phase, the compiler checks the types and reports that you cannot use an array variable in
a multiplication expression.
4) Intermediate Code Generation: This is where the intermediate representation of the source program is
created. We want this representation to be easy to generate, and easy to translate into the target
program. The representation can have a variety of forms, but a common one is called three-address code
(TAC), which is a lot like a generic assembly language. Three-address code is a sequence of simple
instructions, each of which can have at most three operands.
The single C statement on the left is translated into a sequence of four instructions in three-address code
on the right. Note the use of temp variables that are created by the compiler as needed to keep the
number of operands down to three. Of course, it's a little more complicated than this, because we have to
translate branching and looping instructions, as well as function calls. Here is some TAC for a branching
translation:
1) Intermediate Code Optimization: The optimizer accepts input in the intermediate representation (e.g.,
TAC) and outputs a streamlined version still in the intermediate representation. In this phase, the compiler
attempts to produce the smallest, fastest and most efficient running result by applying various techniques
such as:
8
• inhibiting code generation of unreachable code segments
• getting rid of unused variables
• eliminating multiplication by 1 and addition by 0
• loop optimization (e.g., remove statements that are not modified in the loop)
• common sub-expression elimination
• strength reduction
.....
The optimization phase can really slow down a compiler, so most compilers allow this feature to be
suppressed. The compiler may have fine-grain controls that allow a developer to make tradeoffs between
compilation time and optimization quality.
In the example shown above, the optimizer eliminated an addition to the zero and a re-evaluation of the
same expression, allowing the original five TAC statements to be re-written in just three statements and
use two fewer temporary variables.
2) Object Code Generation: This is where the target program is generated. The output of this phase is
usually machine code or assembly code. Memory locations are selected for each variable. Instructions are
chosen for each operation. The three-address code is translated into a sequence of assembly or machine
language instructions that perform the same tasks.
In the example above, the code generator translated the TAC input into Sparc assembly output.
3) Object Code Optimization: There may also be another optimization pass that follows code generation,
this time transforming the object code into tighter, more efficient object code. This is where we consider
features of the hardware itself to make efficient usage of the processor(s) and registers. The compiler can
take advantage of machine-specific idioms (specialized instructions, pipelining, branch prediction, and
other peephole optimizations) in reorganizing and streamlining the object code itself. As with IR
optimization, this phase of the compiler is usually configurable or can be skipped entirely.
9
Front end
The front end analyzes the source code to build an internal representation of the program, called the
intermediate representation or IR. It also manages the symbol table, a data structure mapping each symbol
in the source code to associated information such as location, type and scope. This is done over several
phases, which includes some of the following:
1. Line reconstruction. Languages which strop their keywords or allow arbitrary spaces within
identifiers require a phase before parsing, which converts the input character sequence to a
canonical form ready for the parser. The top-down, recursive-descent, table-driven parsers used in
the 1960s typically read the source one character at a time and did not require a separate
tokenizing phase. Atlas Autocode, and Imp (and some implementations of Algol and Coral66) are
examples of stropped languages whose compilers would have a Line Reconstruction phase.
2. Lexical analysis breaks the source code text into small pieces called tokens. Each token is a single
atomic unit of the language, for instance a keyword, identifier or symbol name. The token syntax is
typically a regular language, so a finite state automaton constructed from a regular expression can
be used to recognize it. This phase is also called lexing or scanning, and the software doing lexical
analysis is called a lexical analyzer or scanner.
3. Preprocessing. Some languages, e.g., C, require a preprocessing phase which supports macro
substitution and conditional compilation. Typically the preprocessing phase occurs before syntactic
or semantic analysis; e.g. in the case of C, the preprocessor manipulates lexical tokens rather than
syntactic forms. However, some languages such as Scheme support macro substitutions based on
syntactic forms.
4. Syntax analysis involves parsing the token sequence to identify the syntactic structure of the
program. This phase typically builds a parse tree, which replaces the linear sequence of tokens with
a tree structure built according to the rules of a formal grammar which define the language's
syntax. The parse tree is often analyzed, augmented, and transformed by later phases in the
compiler.
5. Semantic analysis is the phase in which the compiler adds semantic information to the parse tree
and builds the symbol table. This phase performs semantic checks such as type checking (checking
for type errors), or object binding (associating variable and function references with their
definitions), or definite assignment (requiring all local variables to be initialized before use),
rejecting incorrect programs or issuing warnings. Semantic analysis usually requires a complete
parse tree, meaning that this phase logically follows the parsing phase, and logically precedes the
code generation phase, though it is often possible to fold multiple phases into one pass over the
code in a compiler implementation.
Back end
The term back end is sometimes confused with code generator because of the overlapped functionality of
generating assembly code. Some literature uses middle end to distinguish the generic analysis and
optimization phases in the back end from the machine-dependent code generators.
10
1. Analysis: This is the gathering of program information from the intermediate representation
derived from the input. Typical analyses are data flow analysis to build use-define chains,
dependence analysis, alias analysis, pointer analysis, escape analysis etc. Accurate analysis is the
basis for any compiler optimization. The call graph and control flow graph are usually also built
during the analysis phase.
2. Optimization: the intermediate language representation is transformed into functionally equivalent
but faster (or smaller) forms. Popular optimizations are inline expansion, dead code elimination,
constant propagation, loop transformation, register allocation or even automatic parallelization.
3. Code generation: the transformed intermediate language is translated into the output language,
usually the native machine language of the system. This involves resource and storage decisions,
such as deciding which variables to fit into registers and memory and the selection and scheduling
of appropriate machine instructions along with their associated addressing modes (see also Sethi-
Ullman algorithm).
Compiler analysis is the prerequisite for any compiler optimization, and they tightly work together. For
example, dependence analysis is crucial for loop transformation.
In addition, the scope of compiler analysis and optimizations vary greatly, from as small as a basic block to
the procedure/function level, or even over the whole program (interprocedural optimization). Obviously, a
compiler can potentially do a better job using a broader view. But that broad view is not free: large scope
analysis and optimizations are very costly in terms of compilation time and memory space; this is especially
true for interprocedural analysis and optimizations.
Interprocedural analysis and optimizations are common in modern commercial compilers from HP, IBM,
SGI, Intel, Microsoft, and Sun Microsystems. The open source GCC was criticized for a long time for lacking
powerful interprocedural optimizations, but it is changing in this respect. Another good open source
compiler with full analysis and optimization infrastructure is Open64, which is used by many organizations
for research and commercial purposes.
Due to the extra time and space needed for compiler analysis and optimizations, some compilers skip them
by default. Users have to use compilation options to explicitly tell the compiler which optimizations should
be enabled.
Related techniques
Assembly language is not a high-level language and a program that compiles it is more commonly known as
an assembler, with the inverse program known as a disassembler.
A program that translates from a low level language to a higher level one is a decompiler.
A program that translates between high-level languages is usually called a language translator, source to
source translator, language converter, or language rewriter. The last term is usually applied to translations
that do not involve a change of language.
Cross compiler
A cross compiler is a compiler capable of creating executable code for a platform other than the one on
which the compiler is run. Cross compiler tools are generally found in use to generate compiles for
embedded system or multiple platforms. It is a tool that one must use for a platform where it is
inconvenient or impossible to compile on that platform, like microcontrollers that run with a minimal
11
amount of memory for their own purpose. It has become more common to use this tool for
paravirtualization where a system may have one or more platforms in use.
Not targeted by this definition are source to source translators, which are often called by the name of cross
compiler.
The fundamental use of a cross compiler is to separate the build environment from the target
environment. This is useful in a number of situations:
Embedded computers where a device has extremely limited resources. For example, a microwave
oven will have an extremely small computer to read its touchpad and door sensor, provide output
to a digital display and speaker, and to control the machinery for cooking food. This computer will
not be powerful enough to run a compiler, a file system, or a development environment. Since
debugging and testing may also require more resources than are available on an embedded system,
cross-compilation can be less involved and less prone to errors than native compilation.
Compiling for multiple machines. For example, a company may wish to support several different
versions of an operating system or to support several different operating systems. By using a cross
compiler, a single build environment can be set up to compile for each of these targets.
Compiling on a server farm. Similar to compiling for multiple machines, a complicated build that
involves many compile operations can be executed across any machine that is free regardless of its
brand or current version of an operating system.
Bootstrapping to a new platform. When developing software for a new platform, or the emulator of
a future platform, one uses a cross compiler to compile necessary tools such as the operating
system and a native compiler.
Compiling native code for emulators for older now-obsolete platforms like the Commdore 64 or
Apple II by enthusiasts who use cross compilers that run on a current platform (such as Aztec C's
MS DOS 6502 cross compilers running under Windows XP).
Use of virtual machines (such as Java's JVM) resolves some of the reasons for which cross compilers were
developed. The virtual machine paradigm allows the same compiler output to be used across multiple
target systems.
Typically the hardware architecture differs (e.g. compiling a program destined for the MIPS architecture on
an x86 computer) but cross-compilation is also applicable when only the operating system environment
differs, as when compiling a FreeBSD program under Linux, or even just the system library, as when
compiling programs with uClibc on a glibc host.
Bootstrapping (compilers)
Bootstrapping is a term used in computer science to describe the techniques involved in writing a compiler
(or assembler) in the target programming language which it is intended to compile.
12
One may then wonder how the chicken and egg problem of creating the compiler was solved: if one needs
a compiler for language X to obtain a compiler for language X, how did the first compiler get written?
Possible methods include:
Implementing an interpreter or compiler for language X in language Y. Niklaus Wirth reported that
he wrote the first Pascal compiler in Fortran. Language Y could also be hand coded machine code or
assembly language.
Another interpreter or compiler for X has already been written in another language Y; this is how
Scheme is often bootstrapped.
Earlier versions of the compiler were written in a subset of X for which there existed some other
compiler; this is how some supersets of Java are bootstrapped.
The compiler for X is cross compiled from another architecture where there exists a compiler for X;
this is how compilers for C are usually ported to other platforms
Writing the compiler in X; then hand-compiling it from source (most likely in a non-optimized way)
and running that on the code to get an optimized compiler. Donald Knuth used this for his WEB
literate programming system.
Methods for distributing compilers in source code include providing a portable bytecode version of the
compiler, so as to bootstrap the process of compiling the compiler with itself.
The first language to provide such a bootstrap was NELIAC. The first commercial language to do so was
PL/I. Today, a large proportion of programming languages are bootstrapped, including Basic, C, Pascal,
Factor, Haskell, Modula-2, Oberon, OCaml, Common Lisp, Scheme and more.
Assembler
Typically a modern assembler creates object code by translating assembly instruction mnemonics into
opcodes, and by resolving symbolic names for memory locations and other entities. [1] The use of symbolic
references is a key feature of assemblers, saving tedious calculations and manual address updates after
program modifications. Most assemblers also include macro facilities for performing textual substitution—
e.g., to generate common short sequences of instructions to run inline, instead of in a subroutine.
Assemblers are generally simpler to write than compilers for high-level languages, and have been available
since the 1950s. Modern assemblers, especially for RISC based architectures, such as MIPS, Sun SPARC and
HP PA-RISC, optimize instruction scheduling to exploit the CPU pipeline efficiently.
Note that, in normal professional usage, the term assembler is often used ambiguously: It is frequently
used to refer to an assembly language itself, rather than to the assembler utility. Thus: "CP/CMS was
written in S/360 assembler" as opposed to "ASM-H was a widely-used S/370 assembler."
13
linker or link editor
In computer science, a linker or link editor is a program that takes one or more objects generated by a
compiler and assembles them into a single executable program.
In IBM mainframe environments such as OS/360 this program is known as a linkage editor.
On Unix variants the term loader is often used as a synonym for linker. Because this usage blurs the
distinction between the compile-time process and the run-time process, this article will use linking for the
former and loading for the latter. However, in some operating systems the same program handles both the
jobs of linking and loading a program; see dynamic linking.
Computer programs are typically comprised of several parts or modules; these parts, if not all contained
within a single object file, refer to each other by means of symbols. Typically, an object file can contain
three kinds of symbols:
When a program comprises multiple object files, the linker combines these files into a unified executable
program, resolving the symbols as it goes along.
Linkers can take objects from a collection called a library. Some linkers do not include the whole library in
the output; they only include its symbols that are referenced from other object files or libraries. Libraries
exist for diverse purposes, and one or more system libraries are usually linked in by default.
The linker also takes care of arranging the objects in a program's address space. This may involve
relocating code that assumes a specific base address to another base. Since a compiler seldom knows
where an object will reside, it often assumes a fixed base location (for example, zero). Relocating machine
code may involve re-targeting of absolute jumps, loads and stores.
The executable output by the linker may need another relocation pass when it is finally loaded into
memory (just before execution). This pass is usually omitted on hardware offering virtual memory — every
program is put into its own address space, so there is no conflict even if all programs load at the same base
address. This pass may also be omitted if the executable is a position independent executable.
Contents
1 Dynamic linking
2 Relaxation
3 References
4 See also
5 External links
Dynamic linking
Modern operating system environments allow dynamic linking, that is the postponing of the resolving of
some undefined symbols until a program is run. That means that the executable still contains undefined
14
symbols, plus a list of objects or libraries that will provide definitions for these. Loading the program will
load these objects/libraries as well, and perform a final linking.
Often-used libraries (for example the standard system libraries) need to be stored in only one
location, not duplicated in every single binary.
If an error in a library function is corrected by replacing the library, all programs using it dynamically
will benefit from the correction after restarting them. Programs that included this function by static
linking would have to be re-linked first.
loader
In computing, a loader is the part of an operating system that is responsible for loading programs from
executables (i.e., executable files) into memory, preparing them for execution and then executing them.
The loader is usually a part of the operating system's kernel and usually is loaded at system boot time and
stays in memory until the system is rebooted, shut down, or powered off. Some operating systems that
have a pageable kernel may have the loader in the pageable part of memory and thus the loader
sometimes may be swapped out of memory. All operating systems that support program loading have
loaders. Some embedded operating systems in highly specialized computers run only one program and
have no program loading capabilities and thus no loaders, for example embedded systems in cars or stereo
equipment.
In Unix, the loader is the handler for the system call execve().[1] The loader's tasks under Unix include: (1)
validation (permissions, memory requirements etc.); (2) copying the program image from the disk into
main memory; (3) copying the command-line arguments on the stack; (4) initializing registers (e.g., the
stack pointer); (5) jumping to the program entry point (_start).
Loader programs are useful for prototyping, testing, and one-off applications. One such program was an
integral part of Gene Amdahl's original OS/360 operating system, and this loader facility was continued
through OS/360's descendants including MVT, MVS and z/OS.
In computer science, runtime or run time describes the operation of a computer program, the duration of
its execution, from beginning to termination (compare compile time). The term runtime can also refer to a
virtual machine to manage a program written in a computer language while it is running. Run time is
sometimes used to mean runtime library, a library of basic code that is used by a particular compiler but
when used in this fashion, runtime library is more accurate.
A runtime environment is a virtual machine state which provides software services for processes or
programs while a computer is running. It may pertain to the operating system itself, or the software that
runs beneath it. The primary purpose is to accomplish the objective of "platform independent"
programming.
15
Runtime activities include loading and linking of the classes needed to execute a program, optional
machine code generation and dynamic optimization of the program, and actual program execution.
For example, a program written in Java would receive services from the Java Runtime Environment by
issuing commands from which the expected result is returned by the Java software. By providing these
services, the Java software is considered the runtime environment of the program. Both the program and
the Java software combined request services from the operating system. The operating system kernel
provides services for itself and all processes and software running under its control. The Operating System
may be considered as providing a runtime environment for itself.
In computer science, compile time refers to either the operations performed by a compiler (the "compile-
time operations") or programming language requirements that must be met by source code for it to be
successfully compiled (the "compile-time requirements").
The operations performed at compile time usually include syntax analysis, various kinds of semantic
analysis (e.g., type checks and instantiation of template) and code generation.
The definition of a programming language will specify compile-time requirements that source code must
meet to be successfully compiled.
Compile time occurs before link time (when the output of one or more compiled files are joined together)
and runtime (when a program is executed). In some programming languages it may be necessary for some
compilation and linking to occur at runtime.
16