Slide Set 1 Language Types and Software Development
Slide Set 1 Language Types and Software Development
1
Goal of this course
To understand the practical and scalable theory and methods needed to build a
wide variety of program analysis tools, such as:
a. Compilers
b. Debuggers
c. Bug Detection tools
d. Application Performance Monitoring (APM) tools.
e. Memory Leak Detection tools
f. Vulnerability Detection tools
g. Malware Detection tools
● Translation: to a low-level form (e.g., machine code) and run that directly on
the hardware.
● Interpretation: On the fly execution of a high level program using a software
program called an interpreter.
Eg:
5
Example of interpreted languages
Pure interpreted language:
Hybrid:
● Instrumentable
● Portable across platforms. (In contrast, machine code only runs on a
computer if its ISA and its OS match those of the computer.)
● Machine code: The only type of code that can directly run on the CPU hardware.
● Instruction Set Architecture: The format of machine code instructions. Examples include:
○ x86 (including x86-32 and x86-64): used in Intel and AMD desktops, laptops, and servers.
○ ARM: Used in iOS and Android mobile devices
○ MIPS: Used in embedded devices including routers
○ Sparc: Used in SunOS computers
○ PowerPC: Used in video game console processors.
○ z/architecture: Used in IBM mainframes.
● Object code: the output of the compiler for one file. It has the format of machine code but lacks
relocation and external symbol resolution.
● Assembly code: A human-readable ASCII text format representation of machine code.
● Assembler: A piece of software that converts assembly code into machine code.
○ Only needed if a developer wants to hand-write machine code directly without using a
compiler.
● Disassembler: A piece of software that converts machine code into assembly language.
○ Needed if a developer wants to read machine code produced by a compiler. 8
Software development process (compilation)
IR is language and ISA independent. I.e.,
any HL language is compiled to the same
IR, and any IR is compiled into any ISA
instructions. Assembly Other object files
language (one per HLL file)
Compiler
Debugger
Intermediate Assembler
representation
High-level (IR) Object Machine
Front end Back end/ Loader
language file Linker -code
program (part of OS)
file
(aka binary
Disassembler code, or
executable
code)
Assembly
language
Compiler itself is compiled in the above manner, using earlier compiler, Hardware 9
or after that, using itself.
Linker and loader functionality
Tasks of the linker: It combines object files into a single executable file. It includes:
● Relocation: Adjusting addresses when files are combined by adding the size of previous
files to the addresses in the current file being linked.
● External symbol resolution: Filling in the addresses of external symbols referenced in a
file.
● This is also called static linking because any linked routines become part of the
executable.
● Loads a machine code (i.e., binary) file from disk, and stores it in main memory.
● Performs dynamic linking, which fills in the addresses of dynamically linked libraries
(aka DLLs) into programs. (E.g., the address of symbol printf() may be filled in from
external library called stdio.h.)
10
● Finally, it starts execution of the binary file at its entry point (usually symbol _main).
Software development component details
Compiler: Convert program files into object code Loader: Not part of compiler, but part of OS. It loads the
files. executable into memory, allocates memory segments needed,
and performs dynamic linking of libraries. In machines without
Object code is similar to executable code, but virtual memory, it assigns relocation register.
has unresolved external symbol and library
references. Debugger: Not part of compiler, but often sold by compiler
vendors as part of "Software development Kit" (SDK). SDK=
Eg usage: gcc -c file1.c -o file1.o compiler, linker, debugger, and a GUI interface for all of these
(aka., an Integrated Development Environment (IDE)).
//-c means do not link.
//-o means specify output file
Needs compiler support to associate machine code
entities (registers, memories, hex addresses) with
Linker: Combines multiple obj files into single high-level program entities (symbols, line numbers).
executable. Resolves above external references. Eg: The optional "-g" flag in gcc produces an
executable with debug information present.
Eg usage: gcc file1.o file2.o (absent by default).This information maps
-o file.exe variables to registers and memory locations. An
// Can also combine into one step: exe without debug info is called a “stripped
gcc file1.c file2.c -o file.exe binary” 11