0% found this document useful (0 votes)
14 views

Lectures-Linkload CSS2

This document summarizes a lecture on linking and loading programs from multiple C source files. It discusses how linking combines object files into an executable by resolving external references. Loading then runs the executable by resolving any remaining symbolic references through dynamic linking. An example C program with two source files is used to illustrate how the compiler generates separate object files and how the linker resolves references between them to create a runnable executable.

Uploaded by

Deepthi Jayanth
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Lectures-Linkload CSS2

This document summarizes a lecture on linking and loading programs from multiple C source files. It discusses how linking combines object files into an executable by resolving external references. Loading then runs the executable by resolving any remaining symbolic references through dynamic linking. An example C program with two source files is used to illustrate how the compiler generates separate object files and how the linker resolves references between them to create a runnable executable.

Uploaded by

Deepthi Jayanth
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

CS61: Systems Programming and Machine Organization

Harvard University, Fall 2009

Lecture 8:
Linking and Loading

Prof. Matt Welsh


September 29, 2009
Topics for today
● Getting from C programs to running executables
● Linking:
● Process of combining multiple code modules into a runnable program
● Example: Using standard library code (printf, malloc, etc.) in your own program

● Loading:
● Process of getting an executable running on the machine
● Requires resolving references to external symbols in the program (dynamic linking)

© 2009 Matt Welsh – Harvard University 2


Example program with two C files

main.c swap.c

int buf[2] = {1, 2}; extern int buf[];

int main() void swap()


{ {
swap(); int temp;
return 0;
} temp = buf[1];
buf[1] = buf[0];
buf[0] = temp;
}

© 2009 Matt Welsh – Harvard University 3


Example program with two C files

main.c swap.c

int buf[2] = {1, 2}; extern int buf[];

int main() void swap()


{ {
swap(); int temp;
return 0;
} temp = buf[1];
buf[1] = buf[0];
buf[0] = temp;
Definition of buf[] }

Definition of swap()
Definition of main()

Declaration of buf[]

© 2009 Matt Welsh – Harvard University 4


Example program with two C files

main.c swap.c

int buf[2] = {1, 2}; extern int buf[];

int main() void swap()


{ {
swap(); int temp;
return 0;
} temp = buf[1];
buf[1] = buf[0];
buf[0] = temp;
}

Reference to swap()

References
to buf[]

© 2009 Matt Welsh – Harvard University 5


Static Linking
Programs are translated and linked using a compiler driver:
● unix> gcc -c main.c # Compile .c to .o
● unix> gcc -c swap.c
● unix> gcc -o myprog main.o swap.o # This runs ld with
# the right flags

main.c swap.c Source files

gcc -c main.c gcc -c swap.c

Separately compiled
main.o swap.o relocatable object files

gcc -o myprog ...

Fully linked executable object file


myprog (contains code and data for both
main.o and swap.o)

© 2009 Matt Welsh – Harvard University 6


The linker's problem...
main.c swap.c

int buf[2] = {1, 2}; extern int buf[];

int main() void swap()


{ {
swap(); int temp;
return 0;
} temp = buf[1];
buf[1] = buf[0];
buf[0] = temp;
}

● Find the definition of each undefined reference in an object file


● Example: The use of “swap()” in main.o needs to be resolved to the definition
found in swap.o

● Resolve all missing references in the code


● We can compile main.c to main.o without knowing the memory location of swap()
● So, main.o will contain a “dummy” address in the call instruction.
● The linker must patch the assembly code in main.o to reference the correct location of
swap() at link time.

© 2009 Matt Welsh – Harvard University 7


Why Linkers?
Reason 1: Modularity
● Program can be written as a collection of smaller source files,
rather than one monolithic mass.
● Can build libraries of common functions (more on this later)
● e.g., Math library, standard C library

© 2009 Matt Welsh – Harvard University 8


Why Linkers? (cont)
Reason 2: Efficiency
● Time efficiency: Separate Compilation
● Change one source file, recompile just that file, and then relink the executable.
● No need to recompile other source files.

● Space efficiency: Libraries


● Common functions can be combined into a single library file...
● Yet executable files contain only code for the functions they actually use.

© 2009 Matt Welsh – Harvard University 9


Disassembly of main.o
int buf[2] = {1,2};

int main()
{
swap();
return 0;
}
00000000 <main>:
...
a: 55 push %ebp
b: 89 e5 mov %esp,%ebp
d: 51 push %ecx
e: 83 ec 04 sub $0x4,%esp
11: e8 fc ff ff ff call 12 <main+0x12>
16: b8 00 00 00 00 mov $0x0,%eax
...

This is a bogus address!

Just a placeholder for


“swap” (which we don't know the
address of ... yet!)
© 2009 Matt Welsh – Harvard University 10
main.o object file
main.o
main.c .text section
code for main()
int buf[2] = {1,2};

int main()
{
swap(); .data section
return 0; buf[]
}
symbol table
name section off
main .text 0
buf .data 0
swap undefined

relocation info
for .text
Relocation info tells the linker name offset
where references to external swap 12
symbols are in the code

© 2009 Matt Welsh – Harvard University 11


Disassembly of swap.o
extern int buf[];

void swap()
{
int temp; Again, these are placeholders.
temp = buf[1]; 0x0 refers to buf[0]
buf[1] = buf[0]; 0x4 refers to buf[1]
buf[0] = temp;

00000000 <swap>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 8b 15 04 00 00 00 mov 0x4,%edx
9: a1 00 00 00 00 mov 0x0,%eax
e: a3 04 00 00 00 mov %eax,0x4
13: 89 15 00 00 00 00 mov %edx,0x0
19: 5d pop %ebp
1a: c3 ret

© 2009 Matt Welsh – Harvard University 12


swap.o object file
swap.o
.text section
swap.c code for swap()

extern int buf[];

void swap()
{
int temp; symbol table

temp = buf[1]; name section off


swap .text 0
buf[1] = buf[0]; buf undefined
buf[0] = temp;
relocation info
} for .text
name offset
buf 0x5
buf 0xa
buf 0xf
buf 0x15

© 2009 Matt Welsh – Harvard University 13


The linker
● The linker takes multiple object files and combines them into a single
executable file.
● Three basic jobs:
● 1) Copy code and data from each object file to the executable
● 2) Resolve references between object files
● 3) Relocate symbols to use absolute memory addresses, rather than
relative addresses.

© 2009 Matt Welsh – Harvard University 14


00000000 <swap>:
0: push %ebp
Linker operation
1: mov %esp,%ebp swap.o
main.o
... myprog
.text section .text section .text section
code for main() code for swap() code for main()

.data section code for swap()


buf[] symbol table
name section off
symbol table swap .text 0
buf undefined
name section off
main .text 0
buf .data 0 relocation info
swap undefined for .text .data section
name offset 08048368 buf[]
<swap>:
relocation info buf 0x5 8048368: push %ebp
for .text buf 0xa 8048369: mov %esp,%ebp
name offset buf 0xf ...
swap 12 buf 0x15

The executable file contains the absolute memory addresses


of the code and data (that is, where they will be located when
the program actually runs!)
© 2009 Matt Welsh – Harvard University 15
Linker operation
swap.o Where's
main.o myprog swap()?
.text section .text section .text section
code for main() code for swap() code for main()

.data section code for swap()


buf[] symbol table
name section off
symbol table swap .text 0
buf undefined
name section off
main .text 0
buf .data 0 relocation info Oh, there
swap undefined for .text it is! .data section
name offset buf[]
relocation info buf 0x5
for .text buf 0xa
name offset buf 0xf
swap 12 buf 0x15

© 2009 Matt Welsh – Harvard University 16


Linker operation
main.o swap.o myprog
.text section .text section .text section
code for main() code for swap() code for main()

Where's
.data section code for swap() buf?
buf[] symbol table
name section off
symbol table swap .text 0
buf undefined
name section off
main .text 0
buf .data 0 relocation info
swap undefined for .text .data section
name offset buf[]
relocation info Right here
buf 0x5
for .text buf 0xa
name offset buf 0xf
swap 12 buf 0x15

© 2009 Matt Welsh – Harvard University 17


Disassembly after linking
08048344 <main>: What's up with this?
... Not the same as
804834e: 55 0x8048368... push %ebp
804834f: 89 e5 mov %esp,%ebp
8048351: 51 push %ecx
8048352: 83 ec 04 sub $0x4,%esp
8048355: e8 0e 00 00 00 call 8048368 <swap>
804835a: b8 00 00 00 00 mov $0x0,%eax
...

08048368 <swap>:
8048368: 55 push %ebp
8048369: 89 e5 mov %esp,%ebp
804836b: 8b 15 5c 95 04 08 mov 0x804955c,%edx
8048371: a1 58 95 04 08 mov 0x8049558,%eax
8048376: a3 5c 95 04 08 mov %eax,0x804955c
804837b: 89 15 58 95 04 08 mov %edx,0x8049558
8048381: 5d pop %ebp
8048382: c3 ret

$ objdump -t myprog
...
08049558 g O .data 00000008 buf
...
Address Section Size Symbol name

© 2009 Matt Welsh – Harvard University 18


Strong vs. Weak Symbols
● The compiler exports each global symbol as either strong or weak
● Strong symbols:
● Functions
● Initialized global variables

Weak symbols:
● Uninitialized global variables

© 2009 Matt Welsh – Harvard University 19


Strong vs. Weak Symbols
Weak Not a global symbol!
(Just a reference.)
main.c Strong swap.c

int buf[2] = {1, 2}; int myvar;


extern int buf[];
int main()
{ Strong Strong
swap(); void swap()
return 0; {
} int temp;
Not a global symbol.
temp = buf[1];
buf[1] = buf[0];
buf[0] = temp;
}

© 2009 Matt Welsh – Harvard University 20


Linker rules
● Rule 1: Multiple strong symbols with the same name are not
allowed.
foo1.c foo2.c

int somefunc(){ int somefunc() {


return 0; return 1;
} }

$ gcc foo1.c foo2.c


ld: duplicate symbol _somefunc in /var/folders/Yt/Yta+
ZMkTGjWQDZZ0edwfCE++UHg/-Tmp-//ccAsmhEJ.o and
/var/folders/Yt/Yta+ZMkTGjWQDZZ0edwfCE++UHg/-Tmp-//ccVxxcKX.o
collect2: ld returned 1 exit status

© 2009 Matt Welsh – Harvard University 21


Linker rules
● Rule 2: Given a strong symbol and multiple weak symbols,
choose the strong symbol.
foo1.c foo2.c
Weak
void f(void); int x;
int x = 38;
Strong void f() {
int main() { x = 42;
f(); }
printf(“x = %d\n”, x);
return 0;
}

$ gcc -o myprog foo1.c foo2.c


$ ./myprog
x = 42

You might not expect this to happen but it does!

© 2009 Matt Welsh – Harvard University 22


Linker rules
● This can lead to some pretty weird bugs!!!

foo1.c foo2.c

double x; Weak
void f(void);
int x = 38;
int y = 39; void f() {
Strong x = 42.0;
int main() { }
f();
printf(“x = %d\n”, x);
printf(“y = %d\n”, y);
return 0;
}
“double x” is 8 bytes in size!
But resolves to address
$ gcc -o myprog foo1.c foo2.c of “int x” in foo1.c.
$ ./myprog
x = 0
y = 1078263808

© 2009 Matt Welsh – Harvard University 23


Linker rules
● Rule 3: Given multiple weak symbols, pick any one of them

foo1.c foo2.c
Weak
void f(void); int x;
int x; Weak
void f() {
int main() { x = 42;
x = 38; }
f();
printf(“x = %d\n”, x);
return 0;
}

$ gcc -o myprog foo1.c foo2.c


$ ./myprog
x = 42

© 2009 Matt Welsh – Harvard University 24


Executable and Linkable Format (ELF)
Standard binary format for object files
Originally proposed by AT&T System V Unix
● Later adopted by BSD Unix variants and Linux

One unified format for


● Relocatable object files (.o)
● Shared object files (.so)
● Executable files

© 2009 Matt Welsh – Harvard University 25


ELF Object File Format
Elf header 0
ELF header
● Magic number, type (.o, exec, .so), machine, byte
ordering, etc. Segment header table
(required for executables)
Segment header table
● Page size, virtual addresses memory segments .text section
(sections), segment sizes.
.data section
.text section
.bss section
● Code
.symtab section
.data section
● Initialized global variables .rel.txt section

.bss section .rel.data section


● Uninitialized global variables .debug section
● “Block Started by Symbol”
● “Better Save Space” Section header table
● Has section header but occupies no space

© 2009 Matt Welsh – Harvard University 26


ELF Object File Format (cont)
.symtab section 0
ELF header
● Symbol table
● Procedure and static variable names Segment header table
● Section names and locations (required for executables)

.rel.text section .text section


● Relocation info for .text section
.data section
● Addresses of instructions that will need to be
modified in the executable .bss section
● Instructions for modifying.
.symtab section
.rel.data section
● Relocation info for .data section .rel.text section
● Addresses of pointer data that will need to be .rel.data section
modified in the merged executable
.debug section .debug section
● Info for symbolic debugging (gcc -g)
Section header table
Section header table
● Offsets and sizes of each section

© 2009 Matt Welsh – Harvard University 27


objdump: Looking at ELF files
● Use the “objdump” tool to peek inside of ELF files (.o, .so, and
executable files)
● objdump -h: print out list of sections in the file.

$ objdump -h myprog
myprog: file format elf32-i386

Sections:
Idx Name Size VMA LMA File off Algn
0 .interp 00000013 08048134 08048134 00000134 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
1 .note.ABI-tag 00000020 08048148 08048148 00000148 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
2 .hash 00000028 08048168 08048168 00000168 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
...

© 2009 Matt Welsh – Harvard University 28


objdump: Looking at ELF files
● Use the “objdump” tool to peek inside of ELF files (.o, .so, and
executable files)
● objdump -s: print out full contents of each section.

$ objdump -s myprog
myprog: file format elf32-i386

Contents of section .interp:


8048134 2f6c6962 2f6c642d 6c696e75 782e736f /lib/ld-linux.so
8048144 2e3200 .2.
Contents of section .note.ABI-tag:
8048148 04000000 10000000 01000000 474e5500 ............GNU.
8048158 00000000 02000000 06000000 08000000 ................
Contents of section .hash:
8048168 03000000 05000000 01000000 02000000 ................
8048178 03000000 00000000 00000000 00000000 ................

© 2009 Matt Welsh – Harvard University 29


objdump: Looking at ELF files
● Use the “objdump” tool to peek inside of ELF files (.o, .so, and
executable files)
● objdump -t: print out contents of symbol table.

$ objdump -t myprog
myprog: file format elf32-i386

SYMBOL TABLE:
08048134 l d .interp 00000000 .interp
08048148 l d .note.ABI-tag 00000000 .note.ABI-tag
08048168 l d .hash 00000000 .hash
08048190 l d .gnu.hash 00000000 .gnu.hash
...

© 2009 Matt Welsh – Harvard University 30


Packaging Commonly Used Functions
How to package functions commonly used by programmers?
● printf, malloc, strcmp, all that stuff?

Awkward, given the linker framework so far:


● Option 1: Put all functions in a single source file
● Programmers link big object file into their programs:
● gcc -o myprog myprog.o somebighonkinglibraryfile.o
● This would be really inefficient!

● Option 2: Put each routine in a separate object file


● Programmers explicitly link appropriate object files into their programs
● gcc -o myprog myprog.o printf.o malloc.o free.o strcmp.o
strlen.o strerr.o .....
● More efficient, but a real pain to the programmer

© 2009 Matt Welsh – Harvard University 31


Static Libraries
Solution: Static libraries
● Combine multiple object files into a single archive file (file extension “.a”)
● Example: libc.a contains a whole slew of object files bundled together.

Linker can also take archive files as input:


● gcc -o myprog myprog.o /usr/lib/libc.a
● Linker searches the .o files within the .a file for needed references and links them
into the executable.

© 2009 Matt Welsh – Harvard University 32


Creating Static Libraries
atoi.c printf.c random.c

gcc -c gcc -c ... gcc -c

atoi.o printf.o random.o

ar

libc.a Static library

Create a static library file using the UNIX ar command


● ar rs libc.a atoi.o printf.o random.o ...

● Can list contents of a library using “ar t libc.a”

© 2009 Matt Welsh – Harvard University 33


Commonly Used Libraries
libc.a (the C standard library)
● 2.8 MB archive of 1400 object files.
● I/O, memory allocation, signal handling, string handling, data and time, random
numbers, integer math
libm.a (the C math library)
● 0.5 MB archive of 400 object files.
● floating point math (sin, cos, tan, log, exp, sqrt, …)

% ar -t /usr/lib/libc.a | sort % ar -t /usr/lib/libm.a | sort


… …
fork.o e_acos.o
… e_acosf.o
fprintf.o e_acosh.o
fpu_control.o e_acoshf.o
fputc.o e_acoshl.o
freopen.o e_acosl.o
fscanf.o e_asin.o
fseek.o e_asinf.o
fstab.o e_asinl.o
… …

© 2009 Matt Welsh – Harvard University 34


Using Static Libraries
Linker’s algorithm for resolving external references:
● Scan .o files and .a files in command line order.
● During the scan, keep a list of the current unresolved references.
● As each new .o or .a file is encountered, try to resolve each unresolved
reference in the list against the symbols in the new file.
● If any entries unresolved when done, then return an error.

Problem:
● Command line order matters!
● Moral: put libraries at the end of the command line.

unix> gcc mylib.a libtest.o


libtest.o: In function `main':
libtest.o(.text+0x4): undefined reference to `libfun'

unix> gcc libtest.o mylib.a


# works fine!!!

© 2009 Matt Welsh – Harvard University 35


Loading an executable
● Loading is the process of reading code and data from an executable
file and placing it in memory, where it can run.
● This is done by the operating system.
● In UNIX, you can use the execve() system call to load and run an executable.

● What happens when the OS runs a program:


● 1) Create a new process to contain the new program (more later this semester!)
● 2) Allocate memory to hold the program code and data
● 3) Copy contents of executable file to the newly-allocated memory
● 4) Jump to the executable's entry point (which calls the main() function)

© 2009 Matt Welsh – Harvard University 36


Address space layout
Process address space Memory
invisible to
OS memory user code
Executable 0xc0000000
0
Stack
ELF header
%esp
Program header table (stack
(required for executables) pointer)

.text section
Shared libraries
.data section 0x40000000

.bss section

.symtab

.rel.text Heap
(used by malloc)
.rel.data
Loaded
.debug Data from
the
Section header table executable
(required for relocatables) file
Code
0x08048000

0 Unused
© 2009 Matt Welsh – Harvard University 37
Shared Libraries
Static libraries have the following disadvantages:
● Lots of code duplication in the resulting executable files
● Every C program needs the standard C library.

● e.g., Every program calling printf() would have a copy of the printf() code in

the executable. Very wasteful!


● Lots of code duplication in the memory image of each running program
● OS would have to allocate memory for the standard C library routines

being used by every running program!


● Any changes to system libraries would require relinking every binary!

Solution: Shared libraries


● Libraries that are linked into an application dynamically, when a program runs!
● On UNIX, “.so” filename extension is used
● On Windows, “.dll” filename extension is used

© 2009 Matt Welsh – Harvard University 38


Shared Libraries
When the OS runs a program, it checks whether the executable
was linked against any shared library (.so) files.
● If so, it performs the linking and loading of the shared libraries on the fly.

● Example: gcc -o myprog main.o /usr/lib/libc.so


● Can use UNIX ldd command to see which shared libs an
executable is linked against
unix> ldd myprog

linux-gate.so.1 => (0xffffe000)


libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0xb7df5000)
/lib/ld-linux.so.2 (0xb7f52000)

Can create your own shared libs using gcc -shared:


gcc -shared -o mylib.so main.o swap.o

© 2009 Matt Welsh – Harvard University 39


Runtime dynamic linking
● You can also link against shared libraries at run time!
● Three UNIX routines used for this:
● dlopen() -- Open a shared library file
● dlsym() -- Look up a symbol in a shared library file
● dlclose() -- Close a shared library file
● Why would you ever want to do this?
● Can load new functionality on the fly, or extend a program after it was originally
compiled and linked.
● Examples include things like browser plug-ins, which the user might download from the
Internet well after the original program was compiled and linked.

© 2009 Matt Welsh – Harvard University 40


Dynamic linking example
#include <stdio.h>
#include <dlfcn.h>
int main() {
void *handle;
void (*somefunc)(int, int);
char *error;

/* dynamically load the shared lib that contains somefunc() */


handle = dlopen("./libvector.so", RTLD_LAZY);
if (!handle) {
fprintf(stderr, "%s\n", dlerror());
exit(1);
}
/* get a pointer to the somefunc() function we just loaded */
somefunc = dlsym(handle, "somefunc");
if ((error = dlerror()) != NULL) {
fprintf(stderr, "%s\n", error);
exit(1);
}
/* Now we can call somefunc() just like any other function */
somefunc(42, 38);

/* unload the shared library */


if (dlclose(handle) < 0) {
fprintf(stderr, "%s\n", dlerror());
exit(1);
}
return 0;
}
© 2009 Matt Welsh – Harvard University 41
Next time
● Memory and storage technologies
● All about SRAM and DRAM
● How disks work

© 2009 Matt Welsh – Harvard University 42

You might also like