0% found this document useful (0 votes)
18 views29 pages

Lect 02

Uploaded by

ruhinalmuhit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views29 pages

Lect 02

Uploaded by

ruhinalmuhit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Phases (or structure) of

Compiler

2 Dec 2024 1
HLL

Lexical analysis
Stream of tokens
Syntax analysis

Semantic analysis

Intermediate Code gen

Code optimization

Code generation

Assembly
2 Dec 2024 Code 2
HLL

Lexical analysis
Stream of tokens

Syntax analysis
Parse tree
Semantic analysis

Intermediate Code gen

Code optimization

Code generation

Assembly
2 Dec 2024 code 3
HLL

Lexical analysis
Stream of tokens
Syntax analysis
Parse tree
Semantic analysis
Parse tree (semantically)
Intermediate Code gen

Code optimization

Code generation

Assembly
2 Dec 2024 code 4
HLL

Lexical analysis
Stream of tokens
Syntax analysis
Parse tree
Semantic analysis
Parse tree (semantically)
Intermediate Code gen
Three address code
Code optimization

Code generation

Assembly code
2 Dec 2024 5
HLL

Lexical analysis
Stream of tokens
Syntax analysis
Parse tree
Semantic analysis
Parse tree (semantically)
Intermediate Code gen
Three address code
Code optimization
Three address code (modified)
Code generation

Assembly
2 Dec 2024 code 6
HLL

Lexical analysis
Stream of tokens
Syntax analysis
Parse tree
Semantic analysis
Parse tree (semantically)
Intermediate Code gen
Three address code
Code optimization
Three address code (modified)
Code generation

Assembly
2 Dec 2024 Code 7
HLL

Lexical analysis
Stream of tokens
Syntax analysis
Parse tree

Symbol table Semantic analysis


manager Parse tree (semantically)
Intermediate Code gen
Three address code
Code optimization

Code generation

Assembly code
2 Dec 2024 8
HLL

Lexical analysis
Stream of tokens
Syntax analysis
Parse tree

Symbol table Semantic analysis Error


manager Parse tree (semantically)
Handler
Intermediate Code gen
Three address code
Code optimization

Code generation

Assembly code
2 Dec 2024 9
HLL

Lexical analysis
Stream of tokens
Syntax analysis
Parse tree
Semantic analysis
Parse tree (semantically)
Intermediate Code gen
Three address code

Code optimization
Three address
code (modified)
Code generation

2 Dec 2024 Assembly Code 10


HLL

Lexical analysis
Stream of tokens
Syntax analysis
Front-end Parse tree
Semantic analysis
Parse tree (semantically)
Intermediate Code gen
Three address code

Code optimization
Three address
code (modified)
Code generation

2 Dec 2024 Assembly Code 11


HLL

Lexical analysis
Stream of tokens
Syntax analysis
Front-end Parse tree
Semantic analysis
Parse tree (semantically)
Intermediate Code gen
Three address code

Code optimization
Three address
code (modified)
Back-end
Code generation

2 Dec 2024 Assembly Code 12


General Structure of a compiler
Source Lexical I.C.
Analysis Optimisation
tokens IR
Syntax Code
Analysis Generation
Parse Tree I.R. symbolic instructions
Semantic Target code
Analysis Optimisation
Annotated AST optimised symbolic instr.
Intermediate Target code
code generat. Generation Target
front-end back-end

2 Dec 2024 13
Conceptual Structure:two major phases

Source code Front-End Intermediate Back-End Target code


Representation

• Front-end performs the analysis of the source language:


– Recognises legal and illegal programs and reports errors.
– “understands” the input program and collects its semantics in an IR.
– Produces IR and shapes the code for the back-end.
– Much can be automated.

• Back-end does the target language synthesis:


– Chooses instructions to implement each IR operation.
– Translates IR into target code.
– Needs to conform with system interfaces.
– Automation has been less successful.

2 Dec 2024 14
mn compilers with m+n components!
Fortran target 1
Front-end Back-end
Pascal target 2
Front-end Back-end
C I.R. target 3
Front-end Back-end
Java target 4
Front-end Back-end
• All language specific knowledge must be encoded in the
front-end
• All target specific knowledge must be encoded in the back-
end
But: in practice, this strict separation is not free of charge.
2 Dec 2024 15
Phases with example

2 Dec 2024 16
2 Dec 2024 17
Lexical Analysis (Scanning)
• Reads characters in the source program and groups them into
words (basic unit of syntax)
• Produces words and recognises what sort they are.
• The output is called token and is a pair of the form <type,
lexeme> or <token_class, attribute>
• E.g.: a=b+c becomes <id,a> <=,> <id,b> <+,> <id,c>
• Needs to record each id attribute: keep a symbol table.
• Lexical analysis eliminates white space, etc…
• Speed is important - use a specialised tool: e.g., flex - a tool
for generating scanners: programs which recognise lexical
patterns in text; for more info: % man flex
2 Dec 2024 18
Syntax (or syntactic) Analysis (Parsing)
• Imposes a hierarchical structure on the token stream.
• This hierarchical structure is usually expressed by recursive
rules.
• Context-free grammars formalise these recursive rules and
guide syntax analysis.
• Example:
expression  expression ‘+’ term | expression ‘-’ term | term
term  term ‘*’ factor | term ‘/’ factor | factor
factor  identifier | constant | ‘(‘ expression ‘)’
(this grammar defines simple algebraic expressions)
2 Dec 2024 19
Parsing: parse tree for b*b-4*a*c
expression

expression - term

term term * factor

term factor term * factor <id,c>


*

factor <id,b> factor <id,a>

<const, • Useful to recognise


<id,b>
4> a valid sentence!
• Contains a lot of unneeded
2 Dec 2024
information! 20
AST for b*b-4*a*c
-

* *

<id,b> <id,b> * <id,c>


<const,
<id,a>
4>
• An Abstract Syntax Tree (AST) is a more useful data
structure for internal representation. It is a compressed
version of the parse tree (summary of grammatical structure
without details about its derivation)
• ASTs are one form of IR
2 Dec 2024 21
Semantic Analysis (context handling)
• Collects context (semantic) information, checks
for semantic errors, and annotates nodes of the
tree with the results.
• Examples:
– type checking: report error if an operator is applied
to an incompatible operand.
– check flow-of-controls.
– uniqueness or name-related checks.

2 Dec 2024 22
Intermediate code generation
• Translate language-specific constructs in the AST into
more general constructs.
• A criterion for the level of “generality”: it should be
straightforward to generate the target code from the
intermediate representation chosen.
• Example of a form of IR (3-address code):
tmp1=4
tmp2=tmp1*a
tmp3=tmp2*c
tmp4=b*b
tmp5=tmp4-tmp3
2 Dec 2024 23
Code Optimisation
• The goal is to improve the intermediate code and, thus, the
effectiveness of code generation and the performance of the
target code.
• Optimisations can range from trivial (e.g. constant folding)
to highly sophisticated (e.g, in-lining).
• For example: replace the first two statements in the example
of the previous slide with: tmp2=4*a
• Modern compilers perform such a range of optimisations,
that one could argue for:
Source IR Middle-End IR Target
Front-End Back-End
code (optimiser) code

2 Dec 2024 24
Code Generation Phase
• Map the AST onto a linear list of target machine
instructions in a symbolic form:
– Instruction selection: a pattern matching problem.
– Register allocation: each value should be in a register when it is
used (but there is only a limited number): NP-Complete problem.
– Instruction scheduling: take advantage of multiple functional
units: NP-Complete problem.
• Target, machine-specific properties may be used to
optimise the code.
• Finally, machine code and associated information
required by the Operating System are generated.

2 Dec 2024 25
Qualities of a Good Compiler
What qualities would you want in a compiler?
– generates correct code (first and foremost!)
– generates fast code
– conforms to the specifications of the input language
– copes with essentially arbitrary input size, variables, etc.
– compilation time (linearly)proportional to size of source
– good diagnostics
– consistent optimisations
– works well with the debugger

2 Dec 2024 COMP36512 Lecture 1 26


Principles of Compilation
The compiler must:
• preserve the meaning of the program being compiled.
• “improve” the source code in some way.
Other issues (depending on the setting):
• Speed (of compiled code)
• Space (size of compiled code)
• Feedback (information provided to the user)
• Debugging (transformations obscure the relationship source code vs target)
• Compilation time efficiency (fast or slow compiler?)

2 Dec 2024 COMP36512 Lecture 1 27


Historical Notes:
The Move to Higher-Level Programming Languages
• Machine Languages (1st generation)
• Assembly Languages (2nd generation) – early 1950s
• High-Level Languages (3rd generation) – later 1950s
• 4th generation higher level languages (SQL, Postscript)
• 5th generation languages (logic based, eg, Prolog)
• Other classifications:
– Imperative (how); declarative (what)
– Object-oriented languages
– Scripting languages
2 Dec 2024 28
Finally...

Parts of a compiler can be generated automatically using


generators based on formalisms. E.g.:
• Scanner generators: flex
• Parser generators: bison

2 Dec 2024 29

You might also like