We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10
Program encoding
• Suppose we write a C program as two files p1.c and p2.c. We can then compile this code using a Unix command line: • linux> gcc -Og -o p p1.c p2.c
• The command gcc indicates the gcc C compiler.
• The command-line option –Og instructs the compiler to apply a level of optimization that yields machine code that follows the overall structure of the original C code. • We use -Og optimization as a learning tool and then see what happens as we increase the level of optimization. • In practice, higher levels of optimization (e.g., specified with the option -O1 or - O2) are considered a better choice in terms of the resulting program performance. • First, the C preprocessor expands the source code to include any files specified with #include commands and to expand any macros, specified with #define declarations. • Second, the compiler generates assembly code versions of the two source files having names p1.s and p2.s. • Next, the assembler converts the assembly code into binary object-code files p1.o and p2.o. • Object code is one form of machine code—it contains binary representations of all of the instructions. • Finally, the linker merges these two object-code files along with code implementing library functions (e.g., printf) and generates the final executable code file p (as specified by the command-line directive -o p). Machine Level Code • The machine code for x86-64 differs greatly from the original C code. • Parts of the processor state are visible that normally are hidden from the C programmer: • The program counter (commonly referred to as the PC, and called %rip in x86-64) indicates the address in memory of the next instruction to be executed. • The integer register file contains 16 named locations storing 64-bit values. • These registers can hold addresses (corresponding to C pointers) or integer data. • Some registers are used to keep track of critical parts of the program state, while others are used to hold temporary data, such as the arguments and local variables of a procedure, as well as the value to be returned by a function. • The condition code registers hold status information about the most recently executed arithmetic or logical instruction. These are used to implement conditional changes in the control or data flow, such as is required to implement if and while statements. • A set of vector registers can each hold one or more integer or floating-point values. Example • Suppose we write a C code file mstore.c • To see the assembly code containing the following function definition: generated by the C compiler, we can use the –S option on long mult2 (long x, long y); void multstore(long x, long y, long *dest) { the command line: long t = mult2(x, y); • linux> gcc -Og -S mstore.c *dest = t; • This will cause gcc to run the } compiler, generating an c code shows a function that makes a procedure call to mult2 passing arguments x and y. assembly file mstore.s, and go then it stores the return value to the location no further. pointed to by dest. long c type is 64 bits on a 64 bit architecture long mult2 (long x, long y); multstore: void multstore(long x, long y, long *dest) { System V AMD 64 pushq %rbx ABI (Application long t = mult2(x, y); Binary Interface) *dest = t; movq %rdx, %rbx Calling conventions
} call mult2 Up to six integer or
movq %rax, (%rbx) pointer arguments are passed by X in rdi registers in the Y in rsi popq %rbx order of rdi, rsi, Dest in rdx rdx, rcx, ra, r9 Rbx is a callee save ret Rax is used to register return upto 64 bit Mov the data from value. Rdx can be rdx to rbx used upto 128 bit Rdx is caller saved value. • The pushq instruction indicates that the contents of register %rbx should be pushed onto the program stack. All information about local variable names or data types has been stripped away. • If we use the -c command-line option, gcc will both compile and assemble the code • linux> gcc -Og -c mstore.c • This will generate an object-code file mstore.o that is in binary format and hence cannot be viewed directly. • Embedded within the 1,368 bytes of the file mstore.o is a 14-byte sequence with the hexadecimal representation • 53 48 89 d3 e8 00 00 00 00 48 89 03 5b c3 • This is the object code corresponding to the assembly instructions listed previously. • A key lesson to learn from this is that the program executed by the machine is simply a sequence of bytes encoding a series of instructions. • The machine has very little information about the source code from which these instructions were generated. Disassembler • To inspect the contents of machine-code files, a class of programs known as disassemblers can be invaluable. • These programs generate a format similar to assembly code from the machine code. • With Linux systems, the program objdump (for “object dump”) can serve this role given the -d command-line flag: • linux> objdump -d mstore.o Data Formats