0% found this document useful (0 votes)

51 views

Compiler and Assembler

The document discusses the process of compiling, linking, and loading a C/C++ program from source code into an executable file. It involves 4 main stages: preprocessing, compilation, assembly, and linking. Preprocessing handles includes, macros, and conditional compilation. Compilation generates assembly code from the preprocessed source. Assembly produces object files from the assembly code. Linking combines object files and libraries to produce a single executable file, resolving external symbols. The executable and object files contain sections like text, data, BSS, and symbols that are loaded and used to execute the program.

Uploaded by

Gideon Moyo

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views

Compiler and Assembler

Uploaded by

Gideon Moyo

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

COMPILER, ASSEMBLER, LINKER AND

LOADER:
A BRIEF STORY

My Training Period: xx hours

Note:
This Module presents quite a detail story of a process (running program). However, it is
an excerpt from more complete, Tenouk's buffer overflow Tutorial. It tries to investigate
how the C/C++ source codes preprocessed, compiled, linked and loaded as a running
program. It is based on the GCC (GNU Compiler Collection). When you use the IDE
(Integrated Development Environment) compilers such as Microsoft Visual C++,
Borland C++ Builder etc. the processes discussed here quite transparent. The
commands and examples of the gcc, gdb, g++, gas and friends are discussed in Linux
gnu gcc, g++, gdb and gas 1 and Linux gnu gcc, g++, gdb and gas 2. Have a nice day!
The C compiler ability:
Able to understand and appreciate the processes involved in preprocessing,
compiling, linking, loading and running C/C++ programs.
W.1 COMPILERS, ASSEMBLERS and LINKERS
Normally the Cs program building process involves four stages and utilizes
different tools such as a preprocessor, compiler, assembler, and linker.
At the end there should be a single executable file. Below are the stages that
happen in order regardless of the operating system/compiler and graphically
illustrated in Figure w.1.
1. Preprocessing is the first pass of any C compilation. It processes
include-files, conditional compilation instructions and macros.
2. Compilation is the second pass. It takes the output of the preprocessor,
and the source code, and generates assembler source code.
3. Assembly is the third stage of compilation. It takes the assembly source
code and produces an assembly listing with offsets. The assembler
output is stored in an object file.

Linking is the final stage of compilation. It takes one or more object files
or libraries as input and combines them to produce a single (usually
executable) file. In doing so, it resolves references to external symbols,
assigns final addresses to procedures/functions and variables, and
revises code and data to reflect new addresses (a process called
relocation).

Bear in mind that if you use the IDE type compilers, these processes quite
transparent.
Now we are going to examine more details about the process that happen
before and after the linking stage. For any given input file, the file name suffix
(file extension) determines what kind of compilation is done and the example for
GCC is listed in Table w.1.
In UNIX/Linux, the executable or binary file doesnt have extension whereas in
Windows the executables for example may have .exe, .com and .dll.
File extension
Description
file_name.c
C source code which must be preprocessed.
file_name.i
C source code which should not be preprocessed.
file_name.ii
C++ source code which should not be preprocessed.
file_name.h
C header file (not to be compiled or linked).
file_name.cc
file_name.cp
file_name.cxx
C++ source code which must be preprocessed. For file_name.cxx, the xx must both be
file_name.cpp literally character x and file_name.C, is capital c.
file_name.c++
file_name.C
file_name.s
Assembler code.
file_name.S
Assembler code which must be preprocessed.
Object file by default, the object file name for a source file is made by replacing the
file_name.o
extension .c, .i, .s etc with .o

Table w.1
The following Figure shows the steps involved in the process of building the C
program starting from the compilation until the loading of the executable image
into the memory for program running.

Figure w.1: Compile, link and execute stages for running program (a process)
W.2 OBJECT FILES and EXECUTABLE
After the source code has been assembled, it will produce an Object files
(e.g. .o, .obj) and then linked, producing an executable files.
An object and executable come in several formats such as ELF (Executable and
Linking Format) and COFF (Common Object-File Format). For example, ELF is
used on Linux systems, while COFF is used on Windows systems.
Other object file formats are listed in the following Table.

ct File
mat

Description

The a.out format is the original file format for Unix. It consists of three sections: text, data, and bss, which are
program code, initialized data, and uninitialized data, respectively. This format is so simple that it doesn't ha
any reserved place for debugging information. The only debugging format for a.out is stabs, which is encode
a set of normal symbols with distinctive attributes.
The COFF (Common Object File Format) format was introduced with System V Release 3 (SVR3) Unix. CO
files may have multiple sections, each prefixed by a header. The number of sections is limited. The COFF
specification includes support for debugging but the debugging information was limited. There is no file exte
for this format.
FF
A variant of COFF. ECOFF is an Extended COFF originally introduced for Mips and Alpha workstations.
The IBM RS/6000 running AIX uses an object file format called XCOFF (eXtended COFF). The COFF sectio
FF
symbols, and line numbers are used, but debugging symbols are dbx-style stabs whose strings are located i
the.debug section (rather than the string table). The default name for an XCOFF executable file is a.out.
Windows 9x and NT use the PE (Portable Executable) format for their executables. PE is basically COFF w
additional headers. The extension normally .exe.
The ELF (Executable and Linking Format) format came with System V Release 4 (SVR4) Unix. ELF is simi
COFF in being organized into a number of sections, but it removes many of COFF's limitations. ELF used o
most modern Unix systems, including GNU/Linux, Solaris and Irix. Also used on many embedded systems.
SOM (System Object Module) and ESOM (Extended SOM) is HP's object file and debug format (not to be
ESOM
confused with IBM's SOM, which is a cross-language Application Binary Interface - ABI).
Table w.2

When we examine the content of these object files there are areas called
sections. Sections can hold executable code, data, dynamic linking information,
debugging data, symbol tables, relocation information, comments, string tables,
and notes.
Some sections are loaded into the process image and some provide information
needed in the building of a process image while still others are used only in
linking object files.
There are several sections that are common to all executable formats (may be
named differently, depending on the compiler/linker) as listed below:
Description

This section contains the executable instruction codes and is shared among every process running the same
section usually has READ and EXECUTE permissions only. This section is the one most affected by optimization

BSS stands for Block Started by Symbol. It holds un-initialized global and static variables. Since the BSS only ho
that don't have any values yet, it doesn't actually need to store the image of these variables. The size that BSS
runtime is recorded in the object file, but the BSS (unlike the data section) doesn't take up any actual space in the

Contains the initialized global and static variables and their values. It is usually the largest part of the executable.
READ/WRITE permissions.
Also known as .rodata (read-only data) section. This contains constants and string literals.

Stores the information required for relocating the image while loading.
A symbol is basically a name and an address. Symbol table holds information needed to locate and relocate a p
symbolic definitions and references. A symbol table index is a subscript into this array. Index 0 both designates th
in the table and serves as the undefined symbol index. The symbol table contains an array of symbol entries.
Relocation is the process of connecting symbolic references with symbolic definitions. For example, when a prog
function, the associated call instruction must transfer control to the proper destination address at execution. Re-lo
must have relocation entries which are necessary because they contain information that describes how to modify
section contents, thus allowing executable and shared object files to hold the right information for a process's pro
image. Simply said relocation records are information used by the linker to adjust section contents.
Table w.3: Segments in executable file
The following is an example of the object file content dumping using readelf
program. Other utility can be used is objdump. These utilities presented in Linux
gcc, g++, gdb and gas 1 and Linux gcc, g++, gdb and gas 2.
For Windows dumpbin utility (coming with Visual C++ compiler) or more powerful
one is a free PEBrowse program that can be used for the same purpose.
/* testprog1.c */
#include <stdio.h>
static void display(int i, int *ptr);

int main(void)
{
int x = 5;
int *xptr = &x;
printf("In main() program:\n");
printf("x value is %d and is stored at address %p.\n", x, &x);
printf("xptr pointer points to address %p which holds a value of %d.\n", xptr,
*xptr);
display(x, xptr);
return 0;
}
void display(int y, int *yptr)
{
char var[7] = "ABCDEF";
printf("In display() function:\n");
printf("y value is %d and is stored at address %p.\n", y, &y);
printf("yptr pointer points to address %p which holds a value of %d.\n", yptr,
*yptr);
}
[bodo@bakawali test]$ gcc -c testprog1.c
[bodo@bakawali test]$ readelf -a testprog1.o
ELF Header:
Magic:
7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class:
ELF32

Data:
2's complement,
little endian
Version:
1 (current)
OS/ABI:
UNIX - System V
ABI Version:
0
Type:
REL (Relocatable file)
Machine:
Intel 80386
Version:
0x1
Entry point address:
0x0
Start of program headers: 0 (bytes into file)
Start of section headers: 672 (bytes into file)
Flags:
0x0
Size of this header:
52 (bytes)
Size of program headers: 0 (bytes)
Number of program headers:
0
Size of section headers:
40 (bytes)
Number of section headers:
11
Section header string table index:
8
Section Headers:
[Nr]
Name
Type
Addr
Off
Size
ES
Flg Lk Inf Al
[ 0]
NULL
00000000 000000 000000
00
0
0 0
[ 1] .text
PROGBITS
00000000 000034 0000de
00 AX
0
0 4
[ 2] .rel.text
REL
00000000 00052c 000068
08
9
1 4
[ 3] .data
PROGBIT
00000000 000114 000000
00
WA 0
0 4
[ 4] .bss
NOBIT
00000000 000114 000000
00 WA 0
0 4
[ 5] .rodata
PROGBITS
00000000 000114
00010a 00
A 0
0 4
[ 6] .note.GNU-stack
PROGBITS
00000000 00021e
000000 00
0
0 1
[ 7] .comment
PROGBITS
00000000 00021e 000031
00
0
0 1
[ 8] .shstrtab
STRTAB 00000000 00024f 000051
00
0
0 1
[ 9] .symtab
SYMTAB 00000000 000458 0000b0
10
10 9 4
[10] .strtab
STRTAB 00000000 000508 000021
00
0
0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)

'.rel.text' at offset 0x52c contains 13

Type

Sym.Value

Sym.

R_386_32 00000000
.rodata
R_386_PC32
00000000
printf
R_386_32 00000000
.rodata
R_386_PC32
00000000
printf
R_386_32 00000000
.rodata
R_386_PC32
00000000
printf
R_386_32 00000000
.rodata
R_386_32 00000000
.rodata
R_386_PC32
00000000
printf
R_386_32 00000000
.rodata
R_386_PC32
00000000
printf
R_386_32 00000000
.rodata
R_386_PC32
00000000
printf

There are no unwind sections in this file.

Symbol table '.symtab' contains 11 entries:
Num:
Value
Size
Type
Bind
Vis
Ndx Name
0: 00000000
0 NOTYPE LOCAL DEFAULT
1: 00000000
0 FILE
LOCAL DEFAULT
testprog1.c
2: 00000000
0 SECTION LOCAL DEFAULT
3: 00000000
0 SECTION LOCAL DEFAULT
4: 00000000
0 SECTION LOCAL DEFAULT
5: 00000000
0 SECTION LOCAL DEFAULT
6: 00000080
94 FUNC
LOCAL DEFAULT
7: 00000000
0 SECTION LOCAL DEFAULT
8: 00000000
0 SECTION LOCAL DEFAULT
9: 00000000
128 FUNC GLOBAL DEFAULT
10: 00000000
0 NOTYPE GLOBAL DEFAULT

UND
ABS
1
3
4
5
1 display
6
7
1 main
UND printf

No version information found in this file.

When writing a program using the assembly language it should be compatible
with the sections in the assembler directives (x86) and the partial list that is
interested to us is listed below:
Section
Description

Text (.section .text)

2
3
4

Read-Only Data (.section .rodata)

Read-Write Data (.section .data)
BSS (.section .bss)

Contain code (instructions).

Contain the _start label.
Contains pre-initialized constants.
Contains pre-initialized variables.
Contains un-initialized data.

Table w.4
The assembler directives in assembly programming can be used to identify code
and data sections, allocate/initialize memory and making symbols externally
visible or invisible.
An example of the assembly code with some of the assembler directives (Intel) is
shown below:

tializing data
.section
.byte
.long

.data
128
1,1000,10000

;one byte initialized to 128

;3 long words

tializing ascii data

.ascii
asciz

"hello"
"hello"

;ascii without null character

;ascii with \0

ocating memory in bss

.section
.equ
.comm

.bss
BUFFSIZE 1024
z, 4, 4

ing symbols externally visible

.section
.data
.globl
w
.text
.globl

fool

;define a constant
;allocate 4 bytes for x with
;4-byte alignment

;declare externally visible

;e.g: int w = 10
;e.g: fool(void) {}

leave
return
W.3 RELOCATION RECORDS
Because the various object files will include references to each others code
and/or data, so various locations, these shall need to be combined during the link
time.
For example in Figure w.2, the object file that has main() includes calls to
functions funct() and printf().
After linking all of the object files together, the linker uses the relocation records
to find all of the addresses that need to be filled in.
W.4 SYMBOL TABLE

Since assembling to machine code removes all traces of labels from the code,
the object file format has to keep these around in different places.
It is accomplished by the symbol table that contains a list of names and their
corresponding offsets in the text and data segments.
A disassembler provides support for translating back from an object file or
executable.

Figure w.2: The relocation record

W.5 LINKING
The linker actually enables separate compilation. As shown in Figure w.3, an
executable can be made up of a number of source files which can be compiled
and assembled into their object files respectively, independently.

Figure w.3: The object files linking process

W.5.1 SHARED OBJECTS
In a typical system, a number of programs will be running. Each program
relies on a number of functions, some of which will be standard C library
functions, like printf(), malloc(), strcpy(), etc. and some are non-standard or
user defined functions.
If every program uses the standard C library, it means that each program
would normally have a unique copy of this particular library present within it.
Unfortunately, this result in wasted resources, degrade the efficiency and
performance.
Since the C library is common, it is better to have each program reference the
common, one instance of that library, instead of having each program contain
a copy of the library.
This is implemented during the linking process where some of the objects are
linked during the link time whereas some done during the run time
(deferred/dynamic linking).
W.5.2 STATICALLY LINKED
The term statically linked means that the program and the particular library
that its linked against are combined together by the linker at link time.
This means that the binding between the program and the particular library is
fixed and known at link time before the program run. It also means that we
can't change this binding, unless we re-link the program with a new version of
the library.
Programs that are linked statically are linked against archives of objects
(libraries) that typically have the extension of .a. An example of such a collection
of objects is the standard C library, libc.a.
You might consider linking a program statically for example, in cases where you
weren't sure whether the correct version of a library will be available at runtime,
or if you were testing a new version of a library that you don't yet want to install
as shared.

For gcc, the static option can be used during the compilation/linking of the
program.
gcc static filename.c o filename
The drawback of this technique is that the executable is quite big in size, all the
needed information need to be brought together.
W.5.3 DYNAMICALLY LINKED
The term dynamically linked means that the program and the particular library it
references are not combined together by the linker at link time.
Instead, the linker places information into the executable that tells the loader
which shared object module the code is in and which runtime linker should be
used to find and bind the references.
This means that the binding between the program and the shared object is done
at runtime that is before the program starts, the appropriate shared objects are
found and bound.
This type of program is called a partially bound executable, because it isn't fully
resolved. The linker, at link time, didn't cause all the referenced symbols in the
program to be associated with specific code from the library.
Instead, the linker simply said something like: This program calls some functions
within a particular shared object, so I'll just make a note of which shared object
these functions are in, and continue on.
Symbols for the shared objects are only verified for their validity to ensure that
they do exist somewhere and are not yet combined into the program.
The linker stores in the executable program, the locations of the external libraries
where it found the missing symbols. Effectively, this defers the binding until
runtime.
Programs that are linked dynamically are linked against shared objects that have
the extension .so. An example of such an object is the shared object version of
the standard C library, libc.so.
The advantageous to defer some of the objects/modules during the static linking
step until they are finally needed (during the run time) includes:
1. Program files (on disk) become much smaller because they need not hold
all necessary text and data segments information. It is very useful for
portability.
2. Standard libraries may be upgraded or patched without every one
program need to be re-linked. This clearly requires some agreed modulenaming convention that enables the dynamic linker to find the newest,
installed module such as some version specification. Furthermore the
distribution of the libraries is in binary form (no source), including
dynamically linked libraries (DLLs) and when you change your program
you only have to recompile the file that was changed.
3. Software vendors need only provide the related libraries module
required. Additional runtime linking functions allow such programs to
programmatically-link the required modules only.
4. In combination with virtual memory, dynamic linking permits two or more
processes to share read-only executable modules such as standard C
libraries. Using this technique, only one copy of a module needs be

resident in memory at any time, and multiple processes, each can

executes this shared code (read only). This results in a considerable
memory saving, although demands an efficient swapping policy.
W.6 HOW SHARED OBJECTS ARE USED
To understand how a program makes use of shared objects, let's first examine
the format of an executable and the steps that occur when the program starts.
W.6.1 SOME ELF FORMAT DETAILS
Executable and Linking Format (ELF) is binary format, which is used in SVR4
Unix and Linux systems.
It is a format for storing programs or fragments of programs on disk, created as a
result of compiling and linking.
ELF not only simplifies the task of making shared libraries, but also enhances
dynamic loading of modules at runtime.
W.6.2 ELF SECTIONS
The Executable and Linking Format used by GNU/Linux and other operating
systems, defines a number of sections in an executable program.
These sections are used to provide instruction to the binary file and allowing
inspection. Important function sections include the Global Offset Table (GOT),
which stores addresses of system functions, the Procedure Linking
Table (PLT), which stores indirect links to the GOT, .init/.fini, for internal
initialization and shutdown, .ctors/.dtors, for constructors and destructors.
The data sections are .rodata, for read only data, .data for initialized data,
and .bss for uninitialized data.
Partial list of the ELF sections are organized as follows (from low to high):
1. .init
- Startup
- String
2. .text
3. .fini
- Shutdown
- Read Only
4. .rodata
5. .data
- Initialized Data
- Initialized Thread Data
6. .tdata
- Uninitialized Thread Data
7. .tbss
8. .ctors
- Constructors
- Destructors
9. .dtors
10. .got
- Global Offset Table
- Uninitialized Data
11. .bss
You can use the readelf or objdump program against the object or
executable files in order to view the sections.
In the following Figure, two views of an ELF file are shown: the linking
view and the execution view.

W.7

Figure w.4: Simplified object file format: linking view and execution view.
Keep in mind that the full format of the ELF contains many more items. As
explained previously, the linking view, which is used when the program or library
is linked, deals with sections within an object file.
Sections contain the bulk of the object file information: data, instructions,
relocation information, symbols, debugging information, etc.
The execution view, which is used when the program runs, deals with
segments. Segments are a way of grouping related sections.
For example, the text segment groups executable code, the data segment
groups the program data, and the dynamic segment groups information relevant
to dynamic loading.
Each segment consists of one or more sections. A process image is created by
loading and interpreting segments.
The operating system logically copies a files segment to a virtual memory
segment according to the information provided in the program header table. The
OS can also use segments to create a shared memory resource.
At link time, the program or library is built by merging together sections with
similar attributes into segments.
Typically, all the executable and read-only data sections are combined into a
single text segment, while the data and BSS are combined into the data
segment.
These segments are normally called load segments, because they need to be
loaded in memory at process creation. Other sections such as symbol
information and debugging sections are merged into other, non-load segments.
PROCESS LOADING
In Linux processes loaded from a file system (using either
the execve() or spawn() system calls) are in ELF format.
If the file system is on a block-oriented device, the code and data are loaded into
main memory.
If the file system is memory mapped (e.g. ROM/Flash image), the code needn't
be loaded into RAM, but may be executed in place.
This approach makes all RAM available for data and stack, leaving the code in
ROM or Flash. In all cases, if the same process is loaded more than once, its
code will be shared.
Before we can run an executable, firstly we have to load it into memory.

This is done by the loader, which is generally part of the operating system. The
loader does the following things (from other things):
1. Memory and access validation - Firstly, the OS system kernel reads in the
program files header information and does the validation for type, access
permissions, memory requirement and its ability to run its instructions. It
confirms that file is an executable image and calculates memory
requirements.
2. Process setup includes:
i.
ii.
iii.
iv.
v.
vi.

Allocates primary memory for the program's execution.

Copies address space from secondary to primary memory.
Copies the .text and .data sections from the executable into primary
memory.
Copies program arguments (e.g., command line arguments) onto
the stack.
Initializes registers: sets the esp (stack pointer) to point to top of
stack, clears the rest.
Jumps to start routine, which: copies main()'s arguments off of the
stack, and jumps to main().

Address space is memory space that contains program code, stack, and data
segments or in other word, all data the program uses as it runs.
The memory layout, consists of three segments (text, data, and stack), in
simplified form is shown in Figure w.5.
The dynamic data segment is also referred to as the heap, the place dynamically
allocated memory (such as from malloc() and new) comes from. Dynamically
allocated memory is memory allocated at run time instead of compile/link time.
This organization enables any division of the dynamically allocated memory
between the heap (explicitly) and the stack (implicitly). This explains why the
stack grows downward and heap grows upward.

Figure w.4: Process memory layout

W.8 RUNTIME DATA STRUCTURE From Sections to Segments

A process is a running program. This means that the operating system has
loaded the executable file for the program into memory, has arranged it to have
access to its command-line arguments and environment variables, and has
started it running.
Typically a process has 5 different areas of memory allocated to it as listed in
Table w.5 (refer to Figure w.4):
ent
Description
Often referred to as the text segment, this is the area in which the executable instructions reside.
Linux/Unix arranges things so that multiple running instances of the same program share their code
ment
one copy of the instructions for the same program resides in memory at any time. The portion of th
containing the text segment is the text section.
Statically allocated and global data that are initialized with nonzero values live in the data segme
data segment
process running the same program has its own data segment. The portion of the executable file co
segment is the data section.
BSS stands for Block Started by Symbol. Global and statically allocated data that initialized to
default are kept in what is called the BSS area of the process. Each process running the same pro
BSS area. When running, the BSS data are placed in the data segment. In the executable file, the
the BSS section. For Linux/Unix the format of an executable, only variables that are initialized to a
occupy space in the executables disk file.
The heap is where dynamic memory (obtained by malloc(), calloc(), realloc() and new for C++) com
from. Everything on a heap is anonymous, thus you can only access parts of it through a pointer. As
allocated on the heap, the processs address space grows. Although it is possible to give memory b
system and shrink a processs address space, this is almost never done because it will be allocated
again. Freed memory (free() and delete) goes back to the heap, creating what is called holes. It is
heap togrow upward. This means that successive items that are added to the heap are added at a
are numerically greater than previous items. It is also typical for the heap to start immediately after
the data segment. The end of the heap is marked by a pointer known as the break. You cannot ref
break. You can, however, move the break pointer (via brk() and sbrk() system calls) to a new positio
amount of heap memory available.
The stack segment is where local (automatic) variables are allocated. In C program, local variab
variables declared inside the opening left curly brace of a function body including the main() or othe
that arent defined as static. The data is popped up or pushed into the stack following the Last In F
rule. The stack holds local variables, temporary information, function parameters, return address an
like. When a function is called, a stack frame (or a procedure activation record) is created and PU
top of the stack. This stack frame contains information such as the address from which the function
where to jump back to when the function is finished (return address), parameters, local variables, a
information needed by the invoked function. The order of the information may vary by system and c
function returns, the stack frame is POPped from the stack. Typically the stack grows downward,
items deeper in the call chain are at numerically lower addresses and toward the heap.

Table w.5
When a program is running, the initialized data, BSS and heap areas are usually
placed into a single contiguous area called a data segment.

The stack segment and code segment are separate from the data segment and
from each other as illustrated in Figure w.4.
Although it is theoretically possible for the stack and heap to grow into each
other, the operating system prevents that event.
The relationship among the different sections/segments is summarized in Table
w.6, executable program segments and their locations.
Executable file section
Address space segment
Program memory segment
(disk file)
.text
Text
Code
.data
Data
Initialized data
.bss
Data
BSS
Data
Heap
Stack
Stack

Table w.6
W.9 THE PROCESS (IMAGE)
The diagram below shows the memory layout of a typical Cs process. The
process load segments (corresponding to "text" and "data" in the diagram) at the
process's base address.
The main stack is located just below and grows downwards. Any additional
threads or function calls that are created will have their own stacks, located
below the main stack.
Each of the stack frames is separated by a guard page to detect stack overflows
among stacks frame. The heap is located above the process and grows upwards.
In the middle of the process's address space, there is a region is reserved for
shared objects. When a new process is created, the process manager first maps
the two segments from the executable into memory.
It then decodes the program's ELF header. If the program header indicates that
the executable was linked against a shared library, the process manager will
extract the name of the dynamic interpreter from the program header.
The dynamic interpreter points to a shared library that contains the runtime linker
code. The process manager will load this shared library in memory and will then
pass control to the runtime linker code in this library.

Figure w.5: Cs process memory layout on an x86.

W.10 RUNTIME LINKER AND SHARED LIBRARY LOADING
The runtime linker is invoked when a program that was linked against a shared
object is started or when a program requests that a shared object be dynamically
loaded.
So the resolution of the symbols can be done at one of the following time:

Load-time dynamic linking the application program is read from the

disk (disk file) into memory and unresolved references are located. The
load time loader finds all necessary external symbols and alters all
references to each symbol (all previously zeroed) to memory references
relative to the beginning of the program.
2. Run-time dynamic linking the application program is read from disk
(disk file) into memory and unresolved references are left as invalid
(typically zero). The first access of an invalid, unresolved, reference
results in a software trap. The run-time dynamic linker determines why
this trap occurred and seeks the necessary external symbol. Only this
symbol is loaded into memory and linked into the calling program.
1.

The runtime linker is contained within the C runtime library. The runtime linker
performs several tasks when loading a shared library (.so file).
The dynamic section provides information to the linker about other libraries that
this library was linked against.
It also gives information about the relocations that need to be applied and the
external symbols that need to be resolved. The runtime linker will first load any
other required shared libraries (which may themselves reference other shared
libraries).
It will then process the relocations for each library. Some of these relocations
are local to the library, while others require the runtime linker to resolve a global
symbol.
In the latter case, the runtime linker will search through the list of libraries for this
symbol. In ELF files, hash tables are used for the symbol lookup, so they're very
fast.
Once all relocations have been applied, any initialization functions that have
been registered in the shared library's init section are called. This is used in
some implementations of C++ to call global constructors.
W.11 SYMBOL NAME RESOLUTION
When the runtime linker loads a shared library, the symbols within that library
have to be resolved. Here, the order and the scope of the symbol resolution are
important.
If a shared library calls a function that happens to exist by the same name in
several libraries that the program has loaded, the order in which these libraries
are searched for this symbol is critical. This is why the OS defines several
options that can be used when loading libraries.
All the objects (executables and libraries) that have global scope are stored on
an internal list (the global list).
Any global-scope object, by default, makes available all of its symbols to any
shared library that gets loaded.
The global list initially contains the executable and any libraries that are loaded at
the program's startup.
W.12 DYNAMIC ADDRESS TRANSLATION
In the view of the memory management, modern OS with multitasking, normally
implement dynamic relocation instead of static.

All the program layout in the address space is virtually same. This dynamic
relocation (in processor term it is called dynamic address translation) provides
the illusion that:
1. Each process can use addresses starting at 0, even if other processes are
running, or even if the same program is running more than one time.
2. Address spaces are protected.
3. Can fool process further into thinking it has memory that's much larger
than available physical memory (virtual memory).

In dynamic relocation the address changed dynamically during every

reference. Virtual address is generated by a process (also called logical
address) and the physical address is the actual address in physical memory at
the run-time.
The address translation normally done by Memory Management Unit (MMU)
that incorporated in the processor itself.
Virtual addresses are relative to the process. Each process believes that its
virtual addresses start from 0. The process does not even know where it is
located in physical memory; the code executes entirely in terms of virtual
addresses.
MMU can refuse to translate virtual addresses that are outside the range of
memory for the process for example by generating the segmentation faults. This
provides the protection for each process.
During translation, one can even move parts of the address space of a process
between disk and memory as needed (normally called swapping or paging).
This allows the virtual address space of the process to be much larger than the
physical memory available to it.
Graphically, this dynamic relocation for a process is shown in Figure w.6.

Figure w.6: Physical and virtual address: Address translation

More complete related information can be found at Tenouk's buffer overflow
Tutorial that include the stack construction and destruction for function call.
You may also want to explore the Windows .NET Framework from the system
perspective where the executable is called assembly with all new terms and
features.

Further related reading:

1.
2.

3.
4.
5.
6.

Check the best selling C / C++, Linux and Open Source books at Amazon.com.
To view Windows the executable file content, you can use dumpbin tool that
comes with Microsoft Visual Studio or more powerful one is a
free PEBrowse utility.
For Linux/Unix/Fedora you can use readelf or other tools that can be found Linux
gnu gcc, g++, gdb and gas 1 or Linux gnu gcc, g++, gdb and gas 2.
Windows implementation of processes, threads and synchronization using C can
be found Win32 processes and threads tutorials.
Windows Dynamic-Link Library, DLL story and program examples can be
found Win32 DLL tutorials.
Windows Services story can be found Windows Services tutorials.

The C# Player's Guide - 5th Edition - 5.0.0
83% (18)
The C# Player's Guide - 5th Edition - 5.0.0
497 pages
Corce
70% (46)
Corce
206 pages
Introduction To Computer Theory by Cohen Solutions Manual
80% (5)
Introduction To Computer Theory by Cohen Solutions Manual
198 pages
Ap Computer Science Principles Practice Exam and Notes 2021
86% (7)
Ap Computer Science Principles Practice Exam and Notes 2021
108 pages
The Ethical Slut PDF
55% (69)
The Ethical Slut PDF
298 pages
Hacking The Art of Exploitation 2nd Edition Jon Erickson
100% (20)
Hacking The Art of Exploitation 2nd Edition Jon Erickson
492 pages
Typography For Lawyers
33% (6)
Typography For Lawyers
9 pages
PrepTest 83 - Print and Take Test - 7sage Lsat
100% (3)
PrepTest 83 - Print and Take Test - 7sage Lsat
46 pages
50 Phone Hacks DR - Brad
58% (19)
50 Phone Hacks DR - Brad
29 pages
One-Page Mythic GME
100% (8)
One-Page Mythic GME
11 pages
Update to Modern C++
From Everand
Update to Modern C++
James Raynard
No ratings yet
C# Cheat Sheet
100% (6)
C# Cheat Sheet
12 pages
Learn Python in A Day
100% (14)
Learn Python in A Day
141 pages
Programming in C: Reema Thareja
100% (1)
Programming in C: Reema Thareja
23 pages
C# For Beginners: An Introduction to C# Programming with Tutorials and Hands-On Examples
From Everand
C# For Beginners: An Introduction to C# Programming with Tutorials and Hands-On Examples
Nathan Metzler
5/5 (1)
All Codes Mobile
100% (1)
All Codes Mobile
53 pages
The Compiler, Assembler, Linker, Loader and Process Address Space Tutorial - Hacking The Process of Building Programs Using C Language - Notes and Illustrations
No ratings yet
The Compiler, Assembler, Linker, Loader and Process Address Space Tutorial - Hacking The Process of Building Programs Using C Language - Notes and Illustrations
5 pages
The Compiler, Assembler, Linker, Loader
No ratings yet
The Compiler, Assembler, Linker, Loader
10 pages
Compiler Assembler Linker
100% (1)
Compiler Assembler Linker
15 pages
The Compiler, Assembler, Linker, Loader and Process Address Space Tutorial - Hacking The Process of Building Programs Using C Language - Notes and Illustrations
No ratings yet
The Compiler, Assembler, Linker, Loader and Process Address Space Tutorial - Hacking The Process of Building Programs Using C Language - Notes and Illustrations
12 pages
CS252 Slides New
No ratings yet
CS252 Slides New
642 pages
UNIX ELF File Format
No ratings yet
UNIX ELF File Format
45 pages
Linkers and Loaders
No ratings yet
Linkers and Loaders
8 pages
Lecture3 Cda3101
No ratings yet
Lecture3 Cda3101
13 pages
Chapter 2.12: Compilation, Assembling, Linking and Program Execution
No ratings yet
Chapter 2.12: Compilation, Assembling, Linking and Program Execution
43 pages
Advanced C++ Programming Advanced C++ Programming
100% (2)
Advanced C++ Programming Advanced C++ Programming
319 pages
OS Lecture 04
No ratings yet
OS Lecture 04
48 pages
Intro To Reverse Engineering: Intropy
No ratings yet
Intro To Reverse Engineering: Intropy
63 pages
The Inside Story On Shared Libraries and Dynamic Loading PDF
No ratings yet
The Inside Story On Shared Libraries and Dynamic Loading PDF
8 pages
The Inside Story On Shared Libraries and Dynamic Loading
No ratings yet
The Inside Story On Shared Libraries and Dynamic Loading
8 pages
Part 2: Advanced Static Analysis
No ratings yet
Part 2: Advanced Static Analysis
105 pages
Tutorial-4 Linker Loader Part1
100% (1)
Tutorial-4 Linker Loader Part1
23 pages
Dat Structure
No ratings yet
Dat Structure
3 pages
211 Midterm II Review
No ratings yet
211 Midterm II Review
22 pages
C Program Compilation Steps
No ratings yet
C Program Compilation Steps
46 pages
Lab 07
No ratings yet
Lab 07
8 pages
006 Mold Slides
No ratings yet
006 Mold Slides
44 pages
Cot Slides Linker8777
No ratings yet
Cot Slides Linker8777
96 pages
What Is A Compilation
100% (1)
What Is A Compilation
8 pages
C Track: Compiling C Programs.: The Different Kinds of Files
No ratings yet
C Track: Compiling C Programs.: The Different Kinds of Files
4 pages
Features of C Language
No ratings yet
Features of C Language
6 pages
CHP 3 Tools
No ratings yet
CHP 3 Tools
7 pages
The Detail of Compiling, Linking, Assembling and Loading The C Program Image of Process (Binary - Executable) Into Memory
100% (1)
The Detail of Compiling, Linking, Assembling and Loading The C Program Image of Process (Binary - Executable) Into Memory
7 pages
06-compilation-linking-loading
No ratings yet
06-compilation-linking-loading
19 pages
Elf
No ratings yet
Elf
47 pages
Compilation Process in C
No ratings yet
Compilation Process in C
4 pages
Chapter 3
No ratings yet
Chapter 3
13 pages
Linking
No ratings yet
Linking
31 pages
Computer Architecture: Running A Program Khiyam Iftikhar
No ratings yet
Computer Architecture: Running A Program Khiyam Iftikhar
26 pages
ch2编程背景
No ratings yet
ch2编程背景
95 pages
Development Tools: Compiler and Assembler
No ratings yet
Development Tools: Compiler and Assembler
4 pages
C Program Process and Assembly Language
No ratings yet
C Program Process and Assembly Language
10 pages
Embedded RTOS System Development Environment
No ratings yet
Embedded RTOS System Development Environment
38 pages
4 - Compilation Process in C
No ratings yet
4 - Compilation Process in C
5 pages
What Is C Programming (Autosaved)
No ratings yet
What Is C Programming (Autosaved)
29 pages
Assignment 1: C Programming Language
No ratings yet
Assignment 1: C Programming Language
5 pages
More On GCC
No ratings yet
More On GCC
62 pages
Compiling Process
No ratings yet
Compiling Process
4 pages
Pilation Linking Loading
No ratings yet
Pilation Linking Loading
3 pages
Au Porting PDF
No ratings yet
Au Porting PDF
9 pages
Compilation and Execution Process of C Program
100% (3)
Compilation and Execution Process of C Program
2 pages
Ees04 C
No ratings yet
Ees04 C
47 pages
Debugging Linux Applications
100% (24)
Debugging Linux Applications
153 pages
Memory and Classification
No ratings yet
Memory and Classification
8 pages
Lecture 1 - Intro to C 1
No ratings yet
Lecture 1 - Intro to C 1
40 pages
05 Compilation Linking Loading
No ratings yet
05 Compilation Linking Loading
19 pages
test1 (1)
No ratings yet
test1 (1)
25 pages
CS107E Guide - Binary Utilities (Binutils)
No ratings yet
CS107E Guide - Binary Utilities (Binutils)
3 pages
Lab 5
No ratings yet
Lab 5
13 pages
C, Embedded Linux, LDD Interview Questions: 1. Explain About Compilation Process in C
No ratings yet
C, Embedded Linux, LDD Interview Questions: 1. Explain About Compilation Process in C
564 pages
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
From Everand
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
Miguel Miranda de Mattos
No ratings yet
Mastering Node.js Web Development: Go on a comprehensive journey from the fundamentals to advanced web development with Node.js
From Everand
Mastering Node.js Web Development: Go on a comprehensive journey from the fundamentals to advanced web development with Node.js
Adam Freeman
No ratings yet
C Programming: Core Concepts and Techniques
From Everand
C Programming: Core Concepts and Techniques
William Smith
No ratings yet
Operating Characteristics of Synchronous Machine
No ratings yet
Operating Characteristics of Synchronous Machine
8 pages
Compiler and Assembler
No ratings yet
Compiler and Assembler
21 pages
Porter Value Chain Ai
No ratings yet
Porter Value Chain Ai
1 page
Expectancy Theory Vroom HR Motivation
No ratings yet
Expectancy Theory Vroom HR Motivation
2 pages
Classification of Relays
No ratings yet
Classification of Relays
13 pages
Characteristic of Shunt Wound DC Generator
No ratings yet
Characteristic of Shunt Wound DC Generator
10 pages
Transfer Functions Examples Part 1
0% (1)
Transfer Functions Examples Part 1
21 pages
Parallel Operation of Transformers III
No ratings yet
Parallel Operation of Transformers III
3 pages
Construction of Circle Diagram
100% (1)
Construction of Circle Diagram
18 pages
Two Port Parameters
100% (1)
Two Port Parameters
2 pages
Transformer Cooling Method1
No ratings yet
Transformer Cooling Method1
12 pages
Models of Two-Port Networks Z, Y, H, Parameters
No ratings yet
Models of Two-Port Networks Z, Y, H, Parameters
17 pages
Principle Operation of Synchronous Motor
No ratings yet
Principle Operation of Synchronous Motor
9 pages
Equivalent Circuit of Transformer Referred To Primary
100% (1)
Equivalent Circuit of Transformer Referred To Primary
5 pages
Harare Power Station
No ratings yet
Harare Power Station
4 pages
Medium Voltage Switchgear
No ratings yet
Medium Voltage Switchgear
5 pages
Use of Measuring Instruments
No ratings yet
Use of Measuring Instruments
47 pages
A Critical Review of Epistemological and Methodological Issues in Cross
No ratings yet
A Critical Review of Epistemological and Methodological Issues in Cross
11 pages
Milton Friedman Was Wrong About Corporate Social Responsibility
No ratings yet
Milton Friedman Was Wrong About Corporate Social Responsibility
7 pages
Coding With JavaScript For Dummies Everything To Know About JavaScript (2020) - 40153
100% (1)
Coding With JavaScript For Dummies Everything To Know About JavaScript (2020) - 40153
247 pages
AI Tools and Prompts
100% (4)
AI Tools and Prompts
94 pages
Linux Cheat Sheet
No ratings yet
Linux Cheat Sheet
4 pages
NWO, Illuminati, Freemason, Occult, Bible Prophecy, Conspiracy, Secret Society, Etc. Links
No ratings yet
NWO, Illuminati, Freemason, Occult, Bible Prophecy, Conspiracy, Secret Society, Etc. Links
47 pages
Eat That Frog
100% (10)
Eat That Frog
124 pages
Introduction To Computer Science
100% (6)
Introduction To Computer Science
202 pages
The JavaScript Beginner's Handbook
90% (10)
The JavaScript Beginner's Handbook
76 pages
Simple Sabotage Field Manual
100% (2)
Simple Sabotage Field Manual
16 pages
Structured and Unstructured Maintenance With Example
0% (1)
Structured and Unstructured Maintenance With Example
9 pages
Learn To Code Getting Started Guide
100% (4)
Learn To Code Getting Started Guide
23 pages
Python Programming For Beginners - A Crash Course To Learn Python and Other Recommended Coding
83% (6)
Python Programming For Beginners - A Crash Course To Learn Python and Other Recommended Coding
86 pages
Do You Speak Java
No ratings yet
Do You Speak Java
186 pages
Credit Card Processing System
No ratings yet
Credit Card Processing System
18 pages
Learning Liquid
100% (1)
Learning Liquid
89 pages
LINUX COMMAND LINE An Introduction To Linux Command Line Environment
No ratings yet
LINUX COMMAND LINE An Introduction To Linux Command Line Environment
174 pages
Learn To Code HTML and CSS Develop Style Websites PDF
100% (2)
Learn To Code HTML and CSS Develop Style Websites PDF
595 pages
How To Use PATS Module Initialization Function
No ratings yet
How To Use PATS Module Initialization Function
5 pages