0% found this document useful (0 votes)
11 views37 pages

Compiler Easy Notes - Hamza Zahoor

The document provides an overview of compiler construction, focusing on lexical analysis, which converts source code into tokens. It discusses key functions such as removal of whitespaces, tokenization, and error detection, along with input buffering techniques to optimize reading operations. Additionally, it covers context-free grammar (CFG) for defining syntax rules and the importance of writing grammar for effective parsing and compilation.

Uploaded by

hamzazahoor182
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views37 pages

Compiler Easy Notes - Hamza Zahoor

The document provides an overview of compiler construction, focusing on lexical analysis, which converts source code into tokens. It discusses key functions such as removal of whitespaces, tokenization, and error detection, along with input buffering techniques to optimize reading operations. Additionally, it covers context-free grammar (CFG) for defining syntax rules and the importance of writing grammar for effective parsing and compilation.

Uploaded by

hamzazahoor182
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

NOTES

SUBJECT :
Compiler
CLASS :
BSCS 6TH Semester
WRITTEN BY :
Hamza Zahoor

Lexical analysis is the first phase of compiler construction. Its primary goal is to convert
the source code into a stream of tokens, which are the smallest units of meaning in
programming. The lexical analyzer, also called a lexer or scanner, reads the source code
character by character and groups them into meaningful sequences known as lexemes,
which are then classified into tokens.

Key Functions of Lexical Analysis:

1. Removal of Whitespaces and Comments: It discards any unnecessary


spaces, newlines, and comments that do not affect the execution of the code.

2. Tokenization: It converts sequences of characters (lexemes) into tokens such


as keywords, operators, identifiers, literals, etc.
NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

3. Error Detection: The lexer checks for illegal characters and informs the
compiler of any lexical errors.

Example:

Consider the following piece of C code:

int a = 5;

During lexical analysis, the code is broken down into the following tokens:

Lexeme Token

int KEYWORD

a IDENTIFIER

= OPERATOR

5 CONSTANT

; SEMICOLON

Token Types:

• Keywords: Reserved words in the programming language (e.g., int, if, for).

• Identifiers: Names of variables, functions, etc. (e.g., a, myFunction).

• Operators: Symbols representing operations (e.g., +, =, *).

• Literals: Constant values (e.g., 5, "Hello").

• Punctuation: Special characters like semicolons, commas, etc.


NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

Working:

1. The lexer reads the first three characters: i, n, t, and recognizes them as the
keyword int.

2. It encounters the identifier a, followed by the assignment operator =.

3. Next, the number 5 is recognized as a constant.

4. Finally, the semicolon ; is recognized as punctuation.

Regular Expressions in Lexical Analysis:

Lexical analyzers often use regular expressions to define patterns for recognizing tokens.
For example:

• Keywords: int|float|if|else|while (a set of reserved words)

• Identifiers: [a-zA-Z_][a-zA-Z0-9_]* (starts with a letter or underscore and


followed by alphanumeric characters)

• Integer Literals: [0-9]+ (one or more digits)

Example of Lexical Error:

If the code contains an illegal character or token, for example:

int a = 5$;

The lexer will flag $ as an illegal character since it does not belong to the valid token set of
the language.
NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

Lexical analysis plays a critical role by cleaning the input and preparing it for the next phase
of the compiler, which is syntax analysis.

: In compiler construction, input buffering is a technique used to efficiently handle the


reading of source code during lexical analysis (the first phase of compilation). The goal is to
minimize the number of I/O operations while scanning the input stream, ensuring that the
source code is read quickly and processed effectively.

Why Input Buffering is Needed

Reading the source program character by character from a file can be slow because each
character read from the disk involves an I/O operation. Frequent disk accesses can
significantly impact performance. To mitigate this, compilers use input buffering to
optimize reading operations.

How Input Buffering Works

1. Buffer Setup: Instead of reading one character at a time, the compiler reads
the input file in chunks or blocks of data into a buffer in memory. This reduces the number
of disk I/O operations because multiple characters are loaded into memory in a single
read.

2. Dual Buffering: A common strategy is the use of two buffers (called double
buffering). The source code is divided into two parts, each part being loaded into one of the
buffers:

• When the first buffer is exhausted (all characters have been processed), the
second buffer is loaded with the next chunk of the input file.

• While one buffer is being processed, the other is being loaded from disk in
parallel, improving efficiency.
NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

This setup avoids the need to wait for the disk to load the next portion of the input, making
the processing of the source code continuous.

3. Sentinel Characters: A special character, known as a sentinel, is placed at


the end of each buffer to indicate the end of the buffer. This is typically a special non-
character, like EOF (End of File), to prevent accidental overruns when scanning.

Example of Input Buffering

Let’s consider a situation where a compiler is reading a file:

• The source file is loaded in two buffers. Suppose each buffer holds 1KB of
data.

• The lexical analyzer reads characters sequentially from the first buffer. When
it reaches the end, it switches to the second buffer, which has already been filled.

• When both buffers are exhausted, the first buffer is reloaded with the next
part of the input file while the second buffer is being processed.

Advantages of Input Buffering

1. Efficiency: By reducing the number of I/O operations, input buffering


significantly speeds up the lexical analysis process.

2. Parallelism: With double buffering, one buffer is loaded while the other is
being processed, reducing idle time for the compiler.

3. Simpler Code Management: Buffering allows for efficient character


management in memory, especially when backtracking is needed during lexical analysis.

Challenges
NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

• Buffer Overruns: Care must be taken to ensure that the lexical analyzer
doesn’t exceed the buffer’s boundaries.

• End of File Handling: Proper management is required when the file’s end is
reached, ensuring no data is missed or redundant characters are read.

Conclusion

Input buffering is crucial in compiler design for optimizing the reading and processing of
source code. It reduces overhead, ensures efficient memory management, and speeds up
the overall compilation process by reducing the number of direct I/O operations.

In compiler construction, token specification and recognition are key steps in the lexical
analysis phase, where the source code is converted into tokens.

1. Token Specification

A token is a string of characters that are grouped together as a meaningful element. For
example, keywords (if, while), operators (+, -), identifiers (variable names), and punctuation
(semicolon ;) are tokens.

Tokens are specified by regular expressions (regex). A regular expression defines a pattern
that matches specific sequences of characters in the input. Each token type, like an
identifier or an operator, has its own regular expression. For example:

• An identifier might be defined as: [a-zA-Z_][a-zA-Z0-9_]*

• A number might be defined as: [0-9]+

Example Token Types:


NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

1. Keywords: if, else, while, for

2. Identifiers: Names given to variables or functions (e.g., x, sum)

3. Operators: +, -, *, ==

4. Punctuation: ;, ,, (, )

5. Literals: Numbers like 123, 3.14, or string literals like "hello"

2. Token Recognition

Once tokens are specified by regular expressions, the lexical analyzer (also known as the
scanner) reads the source code character by character to match these patterns and
produce a sequence of tokens. This process is called token recognition.

Steps of Token Recognition:

1. Input Scanning: The lexical analyzer scans the source code from left to right.

2. Matching: It attempts to match the longest possible sequence of characters


to a valid token definition (as per the regular expressions).

3. Token Creation: When a match is found, the corresponding token is created


and passed to the parser.

4. Handling Errors: If no matching pattern is found for a sequence of characters,


the lexical analyzer reports a lexical error.

Example of Token Recognition:

Consider the following input:

int x = 10;
NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

• The scanner identifies int as a keyword.

• x is recognized as an identifier.

• = is an operator.

• 10 is a literal.

• ; is a punctuation token.

Summary:

• Token Specification: Defines the pattern (using regular expressions) that


tokens should match.

• Token Recognition: The process of identifying and grouping input characters


into tokens based on the patterns defined.

These tokens are then used by the parser for syntax analysis.

[6/11/2025 2:50 PM] 卄卂爪乙卂 : In compiler construction, a Context-Free Grammar


(CFG) is a formal way to describe the syntax of a programming language. It is a set of
recursive rules used to generate patterns of strings within a language, enabling the
compiler to understand and process code according to predefined syntactic structures.

Key Concepts of CFG

1. Terminals: These are the basic symbols from which strings are formed. In
programming languages, terminals are typically tokens like keywords, operators, and
symbols.

2. Non-terminals: These are variables that represent patterns of terminals and


are used to help define the syntax structure of the language.
NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

3. Production Rules: These are rules that describe how terminals and non-
terminals can be combined to form valid strings. Each rule specifies how a non-terminal
can be replaced by a combination of terminals and non-terminals.

4. Start Symbol: This is the initial non-terminal from which production begins. It
serves as the root of the derivation tree that represents the structure of the code.

Structure of a CFG

A context-free grammar can be represented as a 4-tuple  where:

• : Set of non-terminals.

• : Set of terminals.

• : Set of production rules of the form , where  is a non-terminal and  is a


string of terminals and/or non-terminals.

• : Start symbol, which is a special non-terminal from where the parsing


begins.

Example

Consider a simple CFG for arithmetic expressions:

• Terminals: 

• Non-terminals: 

• Start Symbol: 

• Production Rules:

• 

• 
NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

• 

This CFG describes expressions where:

•  represents expressions,

•  represents terms,

•  represents factors (like an identifier or a sub-expression in parentheses).

Role in Compiler Construction

In compilers, CFGs are used to define the syntax of the source language. The syntax
analyzer, or parser, uses the CFG to verify that the source code follows the syntactic
structure. This helps the compiler to detect errors early in the process and build a parse
tree or syntax tree, which serves as the foundation for semantic analysis and further
compilation stages.

Understanding CFGs is fundamental for compiler construction because they allow the
compiler to understand the rules governing the structure of valid statements, expressions,
and program blocks in the language.

[6/11/2025 2:50 PM] 卄卂爪乙卂 : In compiler construction, writing grammar refers to


designing or specifying the syntax rules of a programming language, which allows the
compiler to correctly parse and understand source code. This grammar, often a context-
free grammar (CFG), defines how tokens (such as keywords, operators, and symbols)
combine to form valid statements and expressions in a programming language.

Key Aspects of Writing Grammar

1. Identify Language Constructs:


NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

• Start by listing all language constructs like statements, expressions, loops,


functions, classes, etc., that the programming language should support.

• Each construct will have specific syntax rules that need to be defined in the
grammar.

2. Define Terminals and Non-terminals:

• Terminals: These are the basic symbols (tokens) of the language, like
keywords (if, while, etc.), operators (+, -, *, etc.), and punctuation (commas, semicolons,
etc.).

• Non-terminals: These represent groups or patterns of terminals. Non-


terminals define language constructs like expressions, statements, or blocks of code.

3. Create Production Rules:

• Define production rules that describe how terminals and non-terminals can
combine to form valid code structures.

• Production rules follow a specific format, like , where  is a non-terminal,


and  and  can be terminals or non-terminals. This means “A” can be replaced by “B
followed by C.”

• Example: In a simple arithmetic grammar:

• Expression: 

• Term: 

• Factor: 

4. Specify the Start Symbol:

• Choose a start symbol, the top-level non-terminal, which serves as the root
of the parse tree. This represents the complete program or main structure.

• The start symbol enables the parser to start building the syntax tree from the
top and expand through other production rules.

5. Consider Ambiguity and Recursion:

• Ambiguity occurs when a grammar allows multiple parse trees for the same
string, making it unclear how to interpret certain statements. This can lead to issues in
parsing.
NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

• Left Recursion is a common issue where a non-terminal in the grammar calls


itself as the first element of its production, potentially causing infinite loops in parsers. For
example,  needs to be rewritten to remove left recursion for efficient parsing.

Example of a Simple Grammar

For a language with basic arithmetic and assignment, a possible grammar might look like
this:

• Terminals: 

• Non-terminals: 

• Start Symbol: 

• Production Rules:

• 

• 

• 

• 

Purpose of Writing Grammar in Compiler Construction

Defining grammar in compiler construction helps create a parser that can systematically
break down source code into its syntactic components, check its correctness, and
construct a syntax tree or parse tree. This process enables the compiler to:

1. Validate the structure of code.

2. Identify syntax errors early.

3. Construct a framework for semantic analysis, code generation, and


optimization.
NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

Tools for Grammar Writing

• BNF (Backus-Naur Form) and EBNF (Extended Backus-Naur Form):


Notations for expressing grammar rules in a structured way.

• Grammar Checkers: Tools to help verify and visualize grammar, ensuring it is


unambiguous and suitable for parsing.

In summary, writing grammar is crucial in compiler construction as it lays the foundation


for interpreting and translating a programming language accurately.

[6/11/2025 2:50 PM] 卄卂爪乙卂 : In *compiler construction*, various tools are used
to automate different parts of the compilation process, making it easier to develop,
maintain, and extend compilers. These tools help in tasks like lexical analysis, parsing,
code generation, optimization, and error handling.

Here’s a detailed look at the common *compiler construction tools*, along with examples
of each.

### 1. *Lexical Analyzer Generators (Lexer Tools)*

- *Purpose: Automates the generation of the **lexical analyzer* or *scanner* which


breaks down the source code into *tokens*.

- *How It Works: The developer specifies patterns for tokens using **regular expressions*,
and the tool generates a lexical analyzer based on these patterns.

- *Example*:

- *Lex* and *Flex* (Fast Lexical Analyzer) are popular lexer tools.

- *Process*: You define a set of token patterns like keywords, operators, identifiers, etc.
The tool generates C code to recognize and return these tokens.

- *Usage*: For instance, a C-like token definition in Flex might look like:
NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

digit [0-9]

letter [a-zA-Z]

%%

"int" return INT;

{letter}({letter}|{digit})* return IDENTIFIER;

{digit}+ return NUMBER;

%%

This would recognize keywords like int, identifiers, and numbers in a source program.

### 2. *Parser Generators (Syntax Analyzers)*

- *Purpose: Automates the generation of the **parser, which checks the syntactic
structure of the code and builds the **parse tree* or *syntax tree*.

- *How It Works: The developer defines the **grammar* of the language using *context-
free grammar (CFG)* rules. The tool generates a parser that recognizes sentences
(programs) that match the grammar.

- *Example*:

- *Yacc* (Yet Another Compiler Compiler) and *Bison* (GNU version of Yacc) are widely
used parser generators.

- *Process*: In Yacc or Bison, you define grammar rules like:

expr: expr '+' term

| term;

term: term '*' factor

| factor;
NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

factor: '(' expr ')'

| NUMBER;

This would generate a parser that understands arithmetic expressions involving addition
and multiplication.

### 3. *Syntax-Directed Translation Engines*

- *Purpose: Automates the translation of high-level language constructs into


**intermediate representations* based on the syntax of the language.

- *How It Works: These tools extend the parser to not only check for syntactic correctness
but also perform **actions* (like generating intermediate code) while parsing.

- *Example*: Syntax-directed translation engines like Yacc and Bison allow actions to be
associated with grammar rules. For instance:

expr: expr '+' term { $$ = $1 + $3; }

| term { $$ = $1; };

Here, an action ($$ = $1 + $3;) is triggered when the parser matches the expr + term rule,
meaning intermediate code or values can be generated as parsing proceeds.

### 4. *Intermediate Code Generators*

- *Purpose: Tools that help generate an **intermediate representation (IR)* of the source
code.

- *How It Works*: These tools transform the syntax tree or parse tree generated by the
parser into an intermediate form that is easier to optimize and target across different
machine architectures.
NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

- *Example*: LLVM (Low-Level Virtual Machine) provides a powerful infrastructure for


generating intermediate code. The IR generated by LLVM is platform-independent and can
be further optimized before generating machine-specific code.

### 5. *Code Generators*

- *Purpose: Automates the generation of the final **machine code* or *assembly code*
from the intermediate representation.

- *How It Works*: These tools take the intermediate code and translate it into the specific
machine instructions for a target architecture.

- *Example*: LLVM's backend generates machine-specific code (e.g., x86, ARM) from the
intermediate representation. Other code generators include GCC (GNU Compiler
Collection), which produces optimized machine code for different platforms.

### 6. *Code Optimizers*

- *Purpose: Tools that help improve the intermediate or final machine code by applying
**optimizations* to enhance performance or reduce resource usage.

- *How It Works: These tools take the intermediate or machine code and apply various
transformations such as **loop unrolling, **dead code elimination, and **constant
propagation*.

- *Example: The **GCC optimizer* (-O1, -O2, -O3 flags) optimizes the code for different
levels of performance. LLVM also includes powerful optimization passes for intermediate
code.

- For example, constant folding might turn:

int x = 2 * 3;

Into:

int x = 6;
NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

### 7. *Error Handling Tools*

- *Purpose: Tools or frameworks to automatically detect and report **syntax, **semantic,


and **runtime errors* during compilation.

- *How It Works*: These tools generate error messages and diagnostics that help
developers understand the nature of the errors in their code.

- *Example: In **Yacc/Bison*, you can define error-handling rules. For instance:

expr: expr '+' term { $$ = $1 + $3; }

| term { $$ = $1; }

| error { printf("Syntax error in expression\n"); };

This would trigger an error message if the expression does not follow the correct
grammar.

### 8. *Assembly and Linking Tools*

- *Purpose: Tools that help in converting the **assembly code* into *machine code* and
linking various code modules together to form an executable.

- *How It Works*: After code generation, the assembly code needs to be converted into an
object file and linked with libraries and other code modules.

- *Example*:

- *Assembler*: Converts assembly language into machine code. Examples include GNU
as (assembler).

- *Linker*: Combines object files and libraries into an executable. Examples include GNU
ld (linker).

### 9. *Profiling and Debugging Tools*


NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

- *Purpose: Tools used to **profile* the performance of the code and *debug* it by tracing
errors, runtime exceptions, and crashes.

- *Example*:

- *GDB* (GNU Debugger) allows you to debug programs by setting breakpoints, stepping
through the code, and examining variables.

- *Valgrind* is a profiling tool that helps detect memory leaks and analyze performance.

---

### Example of Tool Workflow

Here’s how these tools might work together to build a simple compiler:

1. *Lexical Analysis* (Flex):

- Source code int x = 10; is broken into tokens like int, x, =, 10, and ;.

2. *Syntax Analysis* (Bison):

- The tokens are parsed according to a grammar that recognizes the structure of variable
declarations and statements.

3. *Intermediate Code Generation* (LLVM):

- The parse tree is transformed into an intermediate representation like:

%1 = alloca i32

store i32 10, i32* %1

4. *Optimization* (LLVM):
NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

- The intermediate code is optimized, possibly by removing unnecessary instructions or


improving the memory layout.

5. *Code Generation* (LLVM):

- The optimized intermediate code is converted into machine-specific code like:

mov eax, 10

mov [rbp-4], eax

6. *Assembly and Linking* (GCC):

- The machine code is assembled and linked to produce an executable file.

---

### Conclusion

*Compiler construction tools* greatly simplify the process of building a compiler by


automating various tasks such as lexical analysis, syntax analysis, code generation, and
optimization. Tools like *Lex, **Yacc/Bison, **LLVM, and **GCC* are widely used in
building efficient and reliable compilers for different programming languages.

[6/11/2025 2:50 PM] 卄卂爪乙卂 : In compiler construction, a translator is a program


that converts one form of language (source language) into another (target language). For
simple expressions, like those involving arithmetic operations, a typical translation process
involves parsing and generating intermediate or target code. Here’s a breakdown of how
this works for a simple expression like a + b * c:

Steps in Translating Simple Expressions


NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

1. Lexical Analysis: Breaks down the input into tokens.

• For a + b * c, the tokens are:

• a (identifier)

• + (plus operator)

• b (identifier)

• * (multiplication operator)

• c (identifier)

2. Syntax Analysis (Parsing): Organizes tokens into a structure (typically an


Abstract Syntax Tree, AST) according to grammar rules.

• The tree for a + b * c would prioritize b * c (multiplication happens first) and


then a + (result).

3. Intermediate Code Generation: Converts the tree into intermediate code


(usually in three-address code form).

• The expression a + b * c could be translated to:

t1 = b * c

t2 = a + t1

• Here, t1 and t2 are temporary variables.

4. Optimization: (Optional) Simplifies the intermediate code, though simple


expressions often don’t require much optimization.

5. Target Code Generation: Produces machine code or assembly code.

• In assembly, the translation might look like:

MOV R1, b
NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

MUL R1, c

MOV R2, a

ADD R2, R1

6. Assembly/Linking: Converts the target code into binary executable form.

Example Translator (Pseudo Code)

Here’s a simplified pseudo code example of a translator for arithmetic expressions:

class Translator:

def _init_(self, expression):

self.expression = expression

self.temp_counter = 0

self.intermediate_code = []

def generate_temp(self):

self.temp_counter += 1

return f"t{self.temp_counter}"

def translate(self, expr):

if '+' in expr:

left, right = expr.split('+')

left_temp = self.translate(left.strip())

right_temp = self.translate(right.strip())
NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

temp = self.generate_temp()

self.intermediate_code.append(f"{temp} = {left_temp} + {right_temp}")

return temp

elif '*' in expr:

left, right = expr.split('*')

left_temp = self.translate(left.strip())

right_temp = self.translate(right.strip())

temp = self.generate_temp()

self.intermediate_code.append(f"{temp} = {left_temp} * {right_temp}")

return temp

else:

return expr.strip()

def print_code(self):

for code in self.intermediate_code:

print(code)

# Example usage:

expr = "a + b * c"

translator = Translator(expr)

translator.translate(expr)

translator.print_code()

For the expression a + b * c, the output would be:

t1 = b * c
NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

t2 = a + t1

This basic structure can be extended to handle more complex expressions, optimizations,
and different target languages.

[6/11/2025 2:50 PM] 卄卂爪乙卂 : Incorporation of a symbol table refers to the


process of creating and maintaining a data structure (symbol table) used by a compiler or
interpreter to store information about the identifiers (symbols) in a program. A symbol table
typically contains information such as variable names, function names, objects, and their
associated attributes like data types, scopes, memory addresses, etc.

Why Use a Symbol Table?

In programming, compilers need to keep track of identifiers, their attributes, and their
scopes. The symbol table facilitates:

• Efficient lookup of symbols during various phases of compilation.

• Semantic analysis, ensuring proper use of variables and functions.

• Code generation, where memory addresses are assigned based on the


information stored in the table.

Structure of a Symbol Table

Symbol tables are typically implemented as hash tables, binary search trees, or other data
structures that allow efficient insertions and lookups.

Example

Let’s take a simple example in C to understand how a symbol table is used.


NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

int a = 10;

int b = 20;

void foo() {

int c = 30;

a = a + b;

Symbol Table Construction

Symbol Type Scope Memory Address Value

a int Global 0x001 10

b int Global 0x002 20

foo void Global 0x100 N/A

c int Local (in foo) 0x003 30

How it Works:

1. Global Scope: When the compiler encounters the global variables a and b, it
adds them to the symbol table with their data types (int), scope (Global), memory
addresses (assigned during code generation), and initial values (10 and 20).

2. Function Declaration: The function foo is added to the table. It has its own
scope (global), and further analysis of its contents takes place when the function is
compiled.

3. Local Variables: Inside the function foo, the local variable c is added to the
table with its scope marked as Local (in foo).
NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

4. Usage: When the compiler processes the statement a = a + b;, it checks the
symbol table for a and b, retrieves their types, and generates code based on their memory
addresses.

Conclusion:

The symbol table plays a crucial role in resolving identifiers during the different phases of
compilation, ensuring correct code generation and type checking. It helps the compiler
know which identifiers refer to which variables or functions, even when they have local or
global scope distinctions.

[6/11/2025 2:50 PM] 卄卂爪乙卂 : The *phases of a compiler* represent the distinct
stages a compiler goes through in order to translate high-level programming code (source
code) into machine code (target code). Each phase processes the input from the previous
phase, and together they ensure correct transformation from human-readable code to a
format that a machine can execute. The phases are often categorized into *two main
groups: **analysis* and *synthesis*.

### 1. *Lexical Analysis (Scanner)*

- *Objective:* To break the source code into *tokens* (basic syntactical units such as
keywords, operators, identifiers, etc.).

- *Output:* A sequence of tokens, which is passed on to the next phase.

- *Example:* For the input int x = 5;, the output tokens might be:

- int (keyword)

- x (identifier)

- = (assignment operator)

- 5 (literal)

- ; (delimiter)

### 2. *Syntax Analysis (Parser)*


NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

- *Objective:* To analyze the sequence of tokens based on the grammar of the


programming language, constructing a *syntax tree* or *parse tree*.

- *Process:* It checks if the source program follows the correct syntactic structure (based
on rules like BNF or CFG).

- *Output:* A parse tree or abstract syntax tree (AST), which represents the hierarchical
structure of the code.

- *Example:* For int x = 5;, the syntax analysis will check if this is a valid variable
declaration according to the language’s grammar.

### 3. *Semantic Analysis*

- *Objective:* To check the *semantic consistency* of the code, ensuring that operations
and data types make sense together.

- *Process:* This phase involves type checking, verifying variable declarations, and
function calls for correctness.

- *Output:* The AST is annotated with type information, ensuring that semantic rules (e.g.,
no addition of integers with strings) are followed.

- *Example:* The semantic analyzer checks if int x = "hello"; is valid, and would flag it as
an error because of type mismatch.

### 4. *Intermediate Code Generation*

- *Objective:* To generate an intermediate representation (IR) of the source code, which is


independent of any machine architecture.

- *Process:* This IR is closer to machine code but is still abstract and can be optimized
more easily.

- *Output:* The IR is often in the form of three-address code, quadruples, or another


simple code structure.

- *Example:* For x = a + b * c;, the intermediate code might look like:

t1 = b * c
NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

t2 = a + t1

x = t2

### 5. *Code Optimization*

- *Objective:* To improve the intermediate code so that it runs more efficiently by


minimizing resource usage like memory and CPU time.

- *Types of Optimization:*

- *Peephole Optimization:* A localized optimization applied to small sections of code.

- *Loop Optimization:* Improving loops to reduce redundant calculations.

- *Output:* An optimized intermediate representation that reduces unnecessary


operations or computations.

### 6. *Code Generation*

- *Objective:* To convert the optimized intermediate code into *target code* (assembly or
machine code) for the specific machine architecture.

- *Process:* This involves generating machine-level instructions for the corresponding


architecture (like x86 or ARM).

- *Output:* Assembly code or machine code instructions.

- *Example:* A high-level assignment like x = a + b; might generate machine instructions


like:

MOV R1, a

ADD R1, b

MOV x, R1
NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

### 7. *Code Linking and Loading*

- *Objective:* To handle external function calls, link libraries, and assign memory
addresses to program variables and functions.

- *Process:* The linker resolves external references and combines object code into a
single executable.

- *Output:* The final *executable machine code*, ready for execution by the operating
system.

### Summary of the Compiler Phases:

1. *Lexical Analysis*: Converts source code into tokens.

2. *Syntax Analysis*: Creates a parse tree or syntax tree from tokens.

3. *Semantic Analysis*: Ensures the logical validity of the syntax tree.

4. *Intermediate Code Generation*: Produces an intermediate, platform-independent


representation of the code.

5. *Code Optimization*: Refines the intermediate code for efficiency.

6. *Code Generation*: Translates the optimized intermediate code into machine code.

7. *Code Linking and Loading*: Links external references and produces the final
executable.

These phases ensure that a compiler can correctly and efficiently translate high-level code
into machine code.

: The *grouping of phases* in compiler design refers to how different phases of the
compilation process can be logically grouped based on the nature of the tasks they
perform. Generally, the phases of a compiler are grouped into two major categories:

### 1. *Analysis Phase*

The goal of the analysis phase is to *understand* and *validate* the source code. This
phase takes the high-level source code and breaks it down into a form that the compiler
NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

can more easily process, while also ensuring that it is syntactically and semantically
correct.

The analysis phase is often divided into three sub-phases:

- *Lexical Analysis*

- *Syntax Analysis*

- *Semantic Analysis*

These phases can be summarized as follows:

- *Lexical Analysis*: Converts source code into tokens, which are the basic building
blocks of the code (keywords, operators, identifiers, etc.).

- *Syntax Analysis*: Checks if the sequence of tokens forms a syntactically valid


structure, creating a parse tree or abstract syntax tree.

- *Semantic Analysis*: Ensures that the syntax follows the rules of meaning (semantics)
for the programming language, such as type checking and variable declarations.

After the analysis phase, the compiler has a clear understanding of the structure and
meaning of the source code. The output of this phase is typically a well-formed *abstract
syntax tree (AST)*, possibly annotated with type and scope information.

### 2. *Synthesis Phase*

The goal of the synthesis phase is to *translate* the intermediate representation into
target code, often with optimizations to improve performance or resource efficiency.

The synthesis phase typically involves:

- *Intermediate Code Generation*

- *Code Optimization*
NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

- *Code Generation*

- *Code Linking and Loading*

These can be summarized as follows:

- *Intermediate Code Generation*: Converts the high-level abstract syntax tree into an
intermediate form, which is easier to manipulate and optimize.

- *Code Optimization*: Improves the intermediate code by removing inefficiencies,


minimizing resource use like memory and CPU cycles.

- *Code Generation*: Produces the final machine code or assembly code from the
optimized intermediate representation.

- *Code Linking and Loading*: Combines external libraries and object files, resolving
references and creating the final executable.

### Why Group the Phases?

- *Clarity and Modularity: By grouping the phases into analysis and synthesis, it becomes
easier to conceptualize how the compiler works. The analysis phase is all about
**understanding* the source code, while the synthesis phase focuses on *translating* and
*optimizing* it for execution.

- *Separation of Concerns: This grouping allows for better **separation of concerns*. The
analysis phase is concerned with ensuring that the source code is correct and meaningful,
whereas the synthesis phase focuses on translating and improving performance.

- *Modular Compiler Design*: In practice, many modern compilers are built in a modular
fashion where the analysis and synthesis phases are separated. This allows for easier
debugging, code reuse, and even support for multiple target architectures (through
separate synthesis phases).
NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

### Example of Grouping:

- *Analysis Phase*:

1. *Lexical Analysis*: Breaks down the source code into tokens.

2. *Syntax Analysis*: Constructs a parse tree from the tokens.

3. *Semantic Analysis*: Ensures the parse tree adheres to the language’s semantics.

- *Synthesis Phase*:

1. *Intermediate Code Generation*: Produces a platform-independent intermediate


representation.

2. *Code Optimization*: Optimizes the intermediate representation.

3. *Code Generation*: Converts the optimized code into target machine code.

4. *Code Linking/Loading*: Produces the final executable.

### Front-End and Back-End Compilers

- *Front-End (Analysis)*:

- This includes the *lexical analysis, **syntax analysis, and **semantic analysis*.

- The front-end checks for correctness and builds an intermediate representation of the
code.

- It is concerned with the structure, syntax, and semantics of the program.

- *Back-End (Synthesis)*:

- This includes *intermediate code generation, **optimization, and **code generation*.

- The back-end translates the intermediate representation into machine-specific code


and optimizes it for better performance.
NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

- It is focused on resource management, efficiency, and actual machine-level code


generation.

### Conclusion

In summary, the *grouping of phases* into *analysis* and *synthesis* helps streamline the
compilation process. The *analysis phase* handles understanding and validating the
source code, while the *synthesis phase* deals with translating it into efficient machine
code. This separation simplifies the design and implementation of compilers, making them
more modular and easier to manage.

In compiler construction, top-down translation is a parsing strategy where the compiler


begins analyzing a source program from the highest-level construct (the start symbol) and
works its way down to the basic elements (tokens). This approach follows a hierarchical or
recursive structure, breaking down expressions and statements into smaller parts,
eventually translating the entire program into an intermediate or target form.

Key Concepts in Top-Down Translation

1. Recursive Descent Parsing:

• This is a common method of top-down parsing where each non-terminal in


the grammar is implemented as a function. The functions call each other based on the
grammar rules, constructing the parse tree as they proceed.

• Recursive descent parsers are easy to implement but only work on grammars
that are free of left recursion. If the grammar has left recursion, it can lead to infinite
recursion.

2. Predictive Parsing:

• A more specific form of recursive descent parsing, known as LL parsing (Left-


to-right, Leftmost derivation).

• Predictive parsers use lookahead tokens to decide which rule to apply. They
typically need the grammar to be LL(1), meaning they only require a single lookahead token
to make parsing decisions.
NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

• Predictive parsers can handle a restricted set of grammars but are efficient
for many practical languages.

3. Translation Process:

• In a top-down approach, translation often happens in tandem with parsing.


As each rule is recognized, corresponding code (intermediate representation or machine
code) may be generated immediately.

• For example, if the grammar rule matches an arithmetic expression, the


parser can generate code for evaluating that expression at that point in the parse.

Example of Top-Down Translation Process

Consider the grammar for a simple arithmetic expression:

Expr → Term + Expr | Term

Term → Factor * Term | Factor

Factor → (Expr) | number

Using top-down parsing:

1. The parser starts with Expr and tries to match the input tokens.

2. If Expr requires parsing a Term, it calls the Term rule.

3. For each recognized rule, the parser can translate the expression directly into
intermediate code.

Pros and Cons of Top-Down Translation

Pros:
NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

• Easy to implement for grammars that are suitable (LL grammars).

• Good for simple, small languages or specific sections of a language.

• Each parsing function is directly tied to a rule, making it easy to maintain.

Cons:

• Limited to grammars without left recursion (or requiring grammar


transformations to eliminate it).

• Generally less efficient than bottom-up parsers for more complex languages,
especially those requiring significant lookahead.

Practical Use

Top-down translation is often used in compilers for languages with a simpler syntax or in
specific stages of larger compilers (e.g., expression parsing). It’s widely seen in interpreters
or lightweight language processors where fast implementation and readability are more
important than parsing complex syntax.

: Syntax-Directed Definition (SDD) in Compiler Construction

In compiler design, Syntax-Directed Definitions (SDDs) are used to specify the semantic
rules associated with the grammar of a language. These rules define how the syntax of a
language is translated into intermediate representations or how computations are carried
out during parsing.

SDDs integrate two components:

1. Grammar: A context-free grammar (CFG) that defines the syntactic structure.


NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

2. Semantic Rules: Attributes and rules attached to grammar symbols to


compute values, generate code, or check conditions.

Components of SDD

1. Attributes:

• Synthesized Attributes: Computed from the attributes of children in a parse


tree.

• Inherited Attributes: Computed from the attributes of the parent or siblings in


a parse tree.

2. Semantic Rules: Rules that define how the attributes are evaluated.

Example: Arithmetic Expressions

Consider a grammar for simple arithmetic expressions:

E → E1 + T

E→T

T → T1 * F

T→F

F → (E)

F → num

Now, we add semantic rules to compute the value of an expression. Let’s use a
Synthesized Attribute val to hold the value of a symbol.

Syntax-Directed Definition:
NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

1. E → E1 + T { E.val = E1.val + T.val }

2. E → T { E.val = T.val }

3. T → T1 * F { T.val = T1.val * F.val }

4. T → F { T.val = F.val }

5. F → (E) { F.val = E.val }

6. F → num { F.val = num.lexval }

Explanation:

1. E → E1 + T: The value of E is the sum of E1.val and T.val.

2. E → T: The value of E is directly inherited from T.val.

3. T → T1 * F: The value of T is the product of T1.val and F.val.

4. T → F: The value of T is directly inherited from F.val.

5. F → (E): The value of F is the same as E.val inside the parentheses.

6. F → num: The value of F is the numerical value of the token num.

Example Input and Parse Tree

Input: 2 + 3 * 4

1. Parse tree:

/\

E T

| |
NOTES BY HAMZA ZAHOOR WHATS APP (0341-8377-917)

T *

| /\

F F num

| |

num num

2. Attribute evaluation:

• num = 2 → F.val = 2 → T.val = 2 → E1.val = 2

• num = 3 → F.val = 3

• num = 4 → F.val = 4 → T1.val = 3 * 4 = 12

• T.val = 12

• E.val = 2 + 12 = 14

Output: 14

Applications:

• Abstract Syntax Tree (AST) generation

• Semantic analysis (e.g., type checking)

• Intermediate code generation

Syntax-Directed Definitions are fundamental in defining the behavior of a compiler and


serve as a bridge between syntax and semantics.

You might also like