Chapter 1
Chapter 1
• Classes of Translator:
– Compiler
– Assembler
– Interpreter
– Preprocessor
Compiler Vs Assembler
• Compiler:
• Assembler:
Compiler Vs Interpreter
• Compiler:
• Front-End: Analysis
– Read source program and understand its structure and meaning.
– A front-end of a compiler is responsible for the analysis of source code.
• Back-End: Synthesis
– Generate equivalent target language program.
– A back-end of a compiler is responsible for the synthesis of target code.
Front-end
• Identifier (sum)
• Assignment (=)
• Identifier (sum)
• Operator (+)
• Identifier (unit)
• Operator (∗)
• numeric constant (1.2e-12)
2. Syntax Analysis
• The syntax analysis phase is often called the parser.
• Parser has two functions:
– receives a stream of tokens from the lexer and groups them
into phrases that match specified grammatical patterns.
– The output of the parser is an abstract syntax tree
representing the syntactical structure of the tokens.
Conti…
• Syntax analysis or parsing is about discovering structure in
text and is used to determine whether or not a token conforms
to an expected format (grammatical patterns).
• Grammatical patterns are described by a context-free grammar.
• For example, an assignment statement may be defined as:
stmt id = expr ;
expr expr + expr | expr * expr | id | num
• The compiler checks to make sure the statements and
expressions are correctly formed:
– Example: "Is this a correct assignment statement?“
position = initial + rate * 60 ;
Push-down Automata/Machine (PDA)
• Pushdown machines can be used for syntax analysis, just as
finite state machines are used for lexical analysis.
Cont…
• Syntax tree (parser tree):
– A parse tree is known as a concrete syntax tree.
– each interior node represents an operation and the children
of the node represent the arguments of the operation.
• Example: Show a syntax tree for the C/C++ statement:
a) (A / B) * C
b) A / (B * C)
c) position = initial + rate * 60
d) If x>100 then y :=1 else y:=2; (Assume that an if statement
consists of three subtrees, one for the condition, one for the
consequent statement, and one for the else statement, if necessary.)
Solution
A) B)
C) D)
3. Semantic Analyzer
• A semantic analyzer traverses the abstract syntax tree,
checking that each node is appropriate for its context, i.e., it
checks for semantic errors. It outputs a refined abstract syntax
tree.
• Find remaining errors that would make program invalid
• undefined variables, types
• type errors that can be caught statically
• Figure out useful information for later phases
• types of all expressions
• data layout
Kinds of Checks
• Uniqueness checks
– Certain names must be unique
– Many languages require variable declarations
• Flow-of-control checks
– Match control-flow operators with structures
– Example: break applies to innermost loop/switch
• Type checks
– Check compatibility of operators and operands
• Logical checks
– Program is syntactically and semantically correct, but does
not do the “correct” thing
Examples of Reported Errors
• Undeclared identifier
• Multiple declared identifier
• Index out of bounds
• Wrong number or types of args to call
• Incompatible types for operation
• Break statement outside switch/loop
• Goto with no label
• etc…
4. Intermediate Code Generation
• Intermediate code is code that represents the semantics of a
program, but is machine-independent.
• An intermediate code generator receives the abstract syntax
tree and it outputs intermediate code that semantically
corresponds to the abstract syntax tree.
• This stage marks the boundary between the front end and the
back end.
• intermediate representation should have two important
properties; it should be easy to produce, and easy to translate
into the target program.
Cont…
• One popular type of intermediate-language representation is
“Three Address Code (TAC)”.
• Three-address code statement is: A := B op C
where A, B and C are operands and op is a binary operator.
• The parse tree for (A/B) * C might be converted into the three-
address sequence:
T1 := A/B;
T2 := T1 * C;
• Optimized code:
– Executes faster
– efficient memory usage
– yielding better performance.
Example (Optimization)
temp1 = inttoreal(60)
temp2 = rate * temp1 temp1 = rate * 60.0
temp3 = initial + temp2 position= initial+ temp1
position = temp3
MOVF rate, R2
temp1 = inttoreal(60)
MULF #60.0, R2
temp2 = rate * temp1
MOVF initial, R1
temp3 = initial + temp2
ADDF R2, R1
position = temp3
MOVF R1, position
TAC Assembly
Symbol Table
• Symbol Table is a data structure meant to collect information
about names appearing in the source program.
• It keeps track about the scope/binding information about
names.
• used during all phases of compilation.
• Each entry in the symbol table has a pair of the form (name
and information).
• Information consists of attributes (e.g. type, location)
depending on the language.
• may or may not be constructed during lexical and syntax
analysis, depending on the compiler.
Error Handling
• One of the most important functions of a compiler is the
detection and reporting of errors in the source program.
• Errors can be encountered by all of the phases of a compiler.
• What Happens When an Error Is Found?
• Whenever a phase of the compiler discovers an error, it
must report the error to the error handler, which issues an
appropriate diagnostic message.
• What Kinds of Errors Are Found During the Analysis Phase?
The Phases of a Compiler
Phase Output Sample
Programmer (source code producer) Source string A=B+C;
Scanner (performs lexical analysis) Token string ‘A’, ‘=’, ‘B’, ‘+’, ‘C’, ‘;’
And symbol table with names
Parser (performs syntax analysis Parse tree or abstract syntax tree ;
|
based on the grammar of the =
programming language) / \
A +
/ \
B C