COMPILERGGGG
COMPILERGGGG
What is a Compiler?
A compiler is a software program that translates code written in a high-level
programming language (source code) into a lower-level language, typically machine code
or intermediate code. The output from the compiler can be directly executed by a
computer's hardware or interpreted by another program.
⦁ Translation: Converts high-level code (C, C++, Java etc.) into machine code
(binary instructions) that the processor can execute.
⦁ Error Checking: Detects and reports errors in the source code, such as syntax or
semantic errors.
⦁ Code Generation: Produces the final machine code or bytecode for execution.
⦁ Correctness: The compiler must generate correct machine code that faithfully
executes the behavior of the source code.
⦁ Efficiency: The generated code should be efficient in terms of speed and resource
1
usage.
1. Compiler
⦁ How it works: Compilers convert the entire source code at once into machine
code. If there are any errors, they are reported during this compilation process.
2. Interpreter
⦁ How it works: It reads the source code and executes it directly without producing
an intermediate machine code file. Errors are reported during execution.
⦁ Output: It doesn't produce machine code files; instead, it executes the code
dynamically.
3. Assembler
2
⦁ Definition: An assembler translates assembly language (low-level code close to
machine code) into actual machine code (binary instructions that the processor
understands).
⦁ How it works: Assembly language is closely tied to the hardware's instruction set,
and the assembler directly converts this code into machine-specific instructions.
4. Preprocessor
⦁ Definition: The preprocessor is a tool that processes the source code before
compilation. It handles directives (e.g., #include, #define) and performs tasks like
macro substitution, file inclusion, and conditional compilation.
⦁ How it works: It runs before the actual compilation begins, modifying the source
code by expanding macros or including files. This modified code is then passed to
the compiler.
5. Linker
⦁ How it works: After compilation, programs often consist of several object files.
The linker brings them together and resolves symbols or addresses that need to
3
interact between these files.
6. Loader
⦁ Definition: A loader is responsible for loading the executable into memory and
preparing it for execution by the operating system.
⦁ How it works: It reads the executable file, loads the necessary code and data into
memory, and passes control to the starting point of the program.
⦁ How it works: For languages like Java or C#, the compiler produces bytecode
that runs on a VM (like the JVM or .NET CLR). The VM translates the bytecode
into machine code or interprets it at runtime.
Summary of Differences:
4
⦁ Loader: Loads the executable into memory and prepares it for execution.
5
Phases and Passes during the compilation process
Compiler Phases
A compiler typically processes the source code in multiple phases, each responsible for a
specific task in transforming source code into executable code. These phases can be
broadly grouped into front-end and back-end categories.
The analysis phase breaks down the source code into an intermediate representation (IR) and
checks it for correctness.
⦁ Task: Checks whether the sequence of tokens follows the grammatical structure
(syntax) of the language.
⦁ Example: Verifies if int x = 10; is valid according to the grammar rules of the
language and builds a tree structure representing the statement.
⦁ Semantic Analysis
6
⦁ Task: Ensures the semantic correctness of the code, such as type checking and
variable declarations.
⦁ Output: Annotated syntax tree with semantic information (e.g., type information).
⦁ Example: Checks if the variable x is declared and if its type matches the assigned
value.
⦁ Phases of a Compiler
7
⦁ Example: Eliminates dead code (unused variables or unreachable code) and
optimizes loops.
⦁ Task: Converts the optimized intermediate code into machine code (or assembly
code) specific to the target architecture (CPU).
⦁ Task: Involves assembling the machine code into an executable and linking with
other object files or libraries.
⦁ Example: Linking multiple object files into a final .exe or .out file.
Phase Purpose
Syntax Analysis Checks token sequences against grammar rules to build a syntax
tree.
8
Intermediate Code Converts the source code into an intermediate representation (IR).
Generation
Code Optimization Improves the efficiency of the intermediate code (optional phase).
Code Linking Links external libraries and resolves symbol references (if needed).
Compiler Passes
A pass refers to how many times the compiler traverses the source code (or the
intermediate representation) during the compilation process. A single-pass compiler
processes each phase once, while a multi-pass compiler may process the code in
multiple passes for additional analysis or optimization.
Types of Passes
⦁ Single-Pass Compiler:
⦁ Definition: A compiler that reads the source code and performs all
compilation tasks (lexical analysis, syntax analysis, semantic analysis,
code generation) in one go.
⦁ Multi-Pass Compiler:
9
passes, often going over the intermediate representation multiple times to
apply further optimizations or transformations.
⦁ First Pass: The compiler reads the source code, performs lexical and syntax
analysis, and generates an intermediate representation.
⦁ Second Pass: The compiler processes the intermediate representation to check for
semantic errors and performs basic optimizations.
⦁ Third Pass: The compiler further optimizes the intermediate code and generates
the target machine code.
Summary
⦁ Compiler Phases refer to the distinct steps in the compilation process, like lexical
10
analysis, syntax analysis, code generation, and optimization.
⦁ Compiler Passes describe how many times the source code or intermediate code
is processed during compilation. Single-pass compilers are faster but less
powerful, while multi-pass compilers allow for greater optimization and error
checking.
11
Classification of compiler
Compilers can be classified based on different criteria such as the target language they
produce, the number of passes they make, the platform they run on, and how they execute
the source program. Here are the major classifications of compilers:
⦁ Description: Produces machine code or executable code for the same machine on
which the compiler is running.
⦁ Use Case: Used when the same system is used for development and execution.
⦁ Use Case: Common in embedded systems development where the target platform
differs from the development environment.
12
⦁ Use Case: Used when migrating code between languages or converting higher-
level abstractions into more widely supported languages.
⦁ Example: The Java compiler that generates Java bytecode, which runs on the Java
Virtual Machine (JVM).
⦁ Description: The compiler processes the source code in a single pass, performing
lexical analysis, syntax analysis, and code generation in one go.
⦁ Use Case: Suitable for simple languages and environments where fast compilation
is critical, but limited in optimization capabilities.
⦁ Description: The compiler goes through the source code multiple times,
performing different tasks in each pass (e.g., one pass for syntax checking, another
for optimization, etc.).
⦁ Example: Modern C and C++ compilers, which optimize code across multiple
13
passes.
⦁ Use Case: Used for more complex languages, where optimization and detailed
error checking are important.
⦁ Use Case: Provides better optimization than single-pass compilers without the
overhead of multiple passes.
⦁ Example: Java JIT compiler within the Java Virtual Machine (JVM), or .NET
CLR JIT compiler.
⦁ Use Case: Used in environments like Java or .NET where the code is first
compiled to an intermediate format and then executed by the JIT compiler for
performance optimization.
⦁ Description: Compiles the code directly into machine code before execution
(during build time), rather than at runtime.
14
⦁ Example: Compilers for statically typed languages like C, C++, or Rust.
⦁ Use Case: Suitable for languages where compilation happens before execution,
optimizing for speed and resource usage at runtime.
4. Based on Platform
4.2. Cross-Compiler
⦁ Description: Generates machine code for a platform other than the one on which
the compiler itself is running. This is particularly useful when developing
software for systems that have different architectures or operating environments
than the development system.
⦁ Use Case: Critical for embedded system development and environments where
the development and target platforms differ.
5. Based on Output
⦁ Description: Directly generates machine code or object code that the processor
15
can execute.
⦁ Use Case: Used in system-level programming where the compiled code runs
directly on the hardware.
16
Output Machine Code, Intermediate GCC, Java compiler
Purpose General, Specific Clang, VHDL compiler
Incremental vs Batch Batch, Incremental GCC, IDE incremental compiler
Retargetable Retargetable LLVM
Compiler
17
Symbol Table in Compiler Design
The symbol table is a critical data structure used by the compiler during the compilation
process. It stores information about identifiers (like variables, functions, classes, etc.)
declared in the source code. The symbol table helps the compiler keep track of scope,
type information, memory locations, and other attributes of identifiers.
⦁ Stores Information:
⦁ Types: Data types associated with the identifiers (e.g., int, float, class).
⦁ Manages Scope:
18
it is used, matching types, etc.
⦁ The symbol table helps to enforce type safety and name resolution
(binding uses of variables to their declarations).
Example:
int x = 5;
float y = 3.14;
The symbol table might look like:
⦁ Hash Table: The most common structure for symbol tables, allowing fast
insertion and lookup operations.
19
| name: x | type: int | scope: 1 | memory: 0x100 | -> | name: y | type: float | scope: 1
| memory: 0x104 | -> NULL
20
Error Handling in Compiler Design
Types of Errors:
⦁ Lexical Errors:
⦁ Syntax Errors:
⦁ Occur during syntax analysis when the source code does not conform to
the grammar rules of the language.
Handling: The compiler can use techniques like panic-mode recovery (the parser skips
ahead in the input to a pre-designated set of tokens (like semicolons or braces) where it
expects to be in a syntactically valid state) or phrase-level recovery (attempting to fix the
structure in place, e.g., by inserting a missing semicolon).
⦁ Semantic Errors:
21
⦁ Example: Type mismatches (e.g., trying to assign a float to an int),
undeclared variables, or using a function with the wrong number of
parameters.
Handling: The compiler reports the error and provides information about the type
mismatch, undeclared variables, or incorrect function calls. It may attempt to recover
from this error by continuing to check the remaining code.
⦁ Logical Errors:
⦁ These are not typically caught by the compiler but are logical mistakes in
the code that lead to incorrect behavior at runtime.
⦁ Runtime Errors:
⦁ Example: int a = 10 / 0;
Handling: These are typically handled by the runtime environment (e.g., throwing
exceptions or crashing the program), but the compiler might insert code to check for such
conditions in certain languages.
int x = 5;
y = 10;
⦁ Lexical error: If y contained invalid characters (e.g., @y = 10;), it would result
in a lexical error.
22
⦁ Syntax error: Missing semicolon or an unmatched bracket.
Note: This allows the programmer to fix the problem before proceeding further.
23
Compiler Design Tools
24