0% found this document useful (0 votes)
17 views

Slide Set 1 Language Types and Software Development

This course focuses on practical program analysis techniques needed to build tools like compilers, debuggers, and vulnerability detectors. It covers topics like lexical analysis, parsing, control flow analysis, and alias analysis. Interpreted languages like Python are executed on the fly by an interpreter, while compiled languages like C are translated to machine code. A compiler performs translation in two main steps - first to an intermediate representation, then to object code with the help of an assembler and linker. The linker combines object files and resolves external references, while the loader loads the executable into memory.

Uploaded by

tushowergoyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Slide Set 1 Language Types and Software Development

This course focuses on practical program analysis techniques needed to build tools like compilers, debuggers, and vulnerability detectors. It covers topics like lexical analysis, parsing, control flow analysis, and alias analysis. Interpreted languages like Python are executed on the fly by an interpreter, while compiled languages like C are translated to machine code. A compiler performs translation in two main steps - first to an intermediate representation, then to object code with the help of an assembler and linker. The linker combines object files and resolves external references, while the loader loads the executable into memory.

Uploaded by

tushowergoyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Language types and software development

Prof Rajeev Barua


CS 3101 -- Slide Set 1

1
Goal of this course
To understand the practical and scalable theory and methods needed to build a
wide variety of program analysis tools, such as:
a. Compilers
b. Debuggers
c. Bug Detection tools
d. Application Performance Monitoring (APM) tools.
e. Memory Leak Detection tools
f. Vulnerability Detection tools
g. Malware Detection tools

● This course focuses on what is needed to understand the practical theory


needed by practitioners to build the tools above.
● Less focus on theory not needed by practitioners, because it is in portions of
tools that you can use as a black box. 2
Topics include

1. Software development process


2. Program analysis needs of non-standard architectures.
3. Lexical analysis
4. Parsing
5. Abstract syntax
6. Bootstrapping
7. Stack memory: layout, usage, and compilation
8. Control flow analysis and optimizations
9. Traditional Dataflow analysis
10. Single Static Assignment (SSA)
11. Alias analysis

We will understand these as the course progresses. 3


What is a compiler?

High level Low level language


language Compiler program
program (Eg object code,
(Eg C, Java) machine code, or
bytecode)

● A compiler generally will need expertise in all of the previously mentioned


topics to build.
● Other tools listed will need expertise in some of the topics listed, in especially
topics 7-11.
4
Translation and interpretation

To execute a high-level language program, we have two main choices:

● Translation: to a low-level form (e.g., machine code) and run that directly on
the hardware.
● Interpretation: On the fly execution of a high level program using a software
program called an interpreter.

Eg:

● C is compiled to machine code


● But Python is interpreted using its interpreter
● Java is a hybrid: it is compiled to Java bytecode, which is interpreted.

5
Example of interpreted languages
Pure interpreted language:

Python Interpreter machine code Produces outputs of


Python
(Produced by C compiler -- python Python program
program
interpreter is written in C)

Hybrid:

Java Interpreter machine code -- AKA


Java Java Java Java Virtual Machine (JVM) Produces outputs of
program compiler bytecode (Produced by C compiler -- JVM is written Java program
in C)

No machine code is produced for input program in either case!


6
But interpreters and compilers are in machine code.
Interpreted vs. compiled languages

Advantages of Interpreted languages:

● Instrumentable
● Portable across platforms. (In contrast, machine code only runs on a
computer if its ISA and its OS match those of the computer.)

(Used for internal corporate code and SaaS code)

Advantages of Compiled languages:

● Protect intellectual property (IP) by not revealing source code.


● Much faster (especially vs. Python)

(Used for externally distributed software, or when speed is important.)


7
Important terms

● Machine code: The only type of code that can directly run on the CPU hardware.
● Instruction Set Architecture: The format of machine code instructions. Examples include:
○ x86 (including x86-32 and x86-64): used in Intel and AMD desktops, laptops, and servers.
○ ARM: Used in iOS and Android mobile devices
○ MIPS: Used in embedded devices including routers
○ Sparc: Used in SunOS computers
○ PowerPC: Used in video game console processors.
○ z/architecture: Used in IBM mainframes.
● Object code: the output of the compiler for one file. It has the format of machine code but lacks
relocation and external symbol resolution.
● Assembly code: A human-readable ASCII text format representation of machine code.
● Assembler: A piece of software that converts assembly code into machine code.
○ Only needed if a developer wants to hand-write machine code directly without using a
compiler.
● Disassembler: A piece of software that converts machine code into assembly language.
○ Needed if a developer wants to read machine code produced by a compiler. 8
Software development process (compilation)
IR is language and ISA independent. I.e.,
any HL language is compiled to the same
IR, and any IR is compiled into any ISA
instructions. Assembly Other object files
language (one per HLL file)
Compiler
Debugger

Intermediate Assembler
representation
High-level (IR) Object Machine
Front end Back end/ Loader
language file Linker -code
program (part of OS)
file
(aka binary
Disassembler code, or
executable
code)
Assembly
language
Compiler itself is compiled in the above manner, using earlier compiler, Hardware 9
or after that, using itself.
Linker and loader functionality
Tasks of the linker: It combines object files into a single executable file. It includes:

● Relocation: Adjusting addresses when files are combined by adding the size of previous
files to the addresses in the current file being linked.
● External symbol resolution: Filling in the addresses of external symbols referenced in a
file.
● This is also called static linking because any linked routines become part of the
executable.

Tasks of the loader.

● Loads a machine code (i.e., binary) file from disk, and stores it in main memory.
● Performs dynamic linking, which fills in the addresses of dynamically linked libraries
(aka DLLs) into programs. (E.g., the address of symbol printf() may be filled in from
external library called stdio.h.)
10
● Finally, it starts execution of the binary file at its entry point (usually symbol _main).
Software development component details
Compiler: Convert program files into object code Loader: Not part of compiler, but part of OS. It loads the
files. executable into memory, allocates memory segments needed,
and performs dynamic linking of libraries. In machines without
Object code is similar to executable code, but virtual memory, it assigns relocation register.
has unresolved external symbol and library
references. Debugger: Not part of compiler, but often sold by compiler
vendors as part of "Software development Kit" (SDK). SDK=
Eg usage: gcc -c file1.c -o file1.o compiler, linker, debugger, and a GUI interface for all of these
(aka., an Integrated Development Environment (IDE)).
//-c means do not link.
//-o means specify output file
Needs compiler support to associate machine code
entities (registers, memories, hex addresses) with
Linker: Combines multiple obj files into single high-level program entities (symbols, line numbers).
executable. Resolves above external references. Eg: The optional "-g" flag in gcc produces an
executable with debug information present.
Eg usage: gcc file1.o file2.o (absent by default).This information maps
-o file.exe variables to registers and memory locations. An
// Can also combine into one step: exe without debug info is called a “stripped
gcc file1.c file2.c -o file.exe binary” 11

You might also like