0% found this document useful (0 votes)
22 views22 pages

Compiler Front End

Uploaded by

Nikhil Ashok
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views22 pages

Compiler Front End

Uploaded by

Nikhil Ashok
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Compiler Construction

1
Two Pass Compiler

2
Two Pass Compiler

• The figure above shows the structure of a two-pass


compiler.
• The front end maps legal source code into an
intermediate representation (IR).
• The back end maps IR into target machine code.
• An immediate advantage of this scheme is that it allows
multiple front ends and multiple passes.

3
Front End

4
Front End

• The front end recognizes legal and illegal programs


presented to it.
• When it encounters errors, it attempts to report errors
in a useful way.
• For legal programs, front end produces IR and
preliminary storage map for the various data structures
declared in the program.

5
Front End

• The front end consists of two modules:


1. Scanner
2. Parser

6
Scanner

• The scanner takes a program as input and maps the


character stream into “words” that are the basic unit of
syntax.
• It produces pairs – a word and its part of speech.

7
Scanner

• For example,
the input
x = x + y
becomes
<id,x>
<assign,=>
<id,x>
<op,+>
<id,y> 8
Scanner

• We call the pair “<token type, word>” a token.


• Typical tokens are: number, identifier, +,
- , new, while, if.

9
Parser

• The parser takes in the stream of tokens, recognizes


context- free syntax and reports errors.
• It guides context-sensitive (“semantic”) analysis for
tasks like type checking.
• The parser builds IR for source program.

10
Context Free Grammars

• The syntax of most programming languages is


specified using Context-Free Grammars (CFG).
• Context- free syntax is specified with a grammar
G=(S,N,T,P) where
• S is the start symbol
• N is a set of non-terminal symbols
• T is set of terminal symbols or words
• P is a set of productions or rewrite rules

11
Context Free Grammars

• For example, the Context-Free Grammar for arithmetic


expressions is
1. goal → expr
2. expr → expr op term
3. | term
4. term → number
5. | id
6. op → +
7. | –
12
CFG

For This CFG


• S = goal
• T = { number, id, +, -}
• N = { goal, expr, term, op}
• P = { 1, 2, 3, 4, 5, 6, 7}

13
Context Free Grammar

• Given a CFG, we can derive sentences by repeated


substitution.
• Consider the sentence
x+2–y

14
Context Free Grammars

• For example, the Context-Free Grammar for arithmetic


expressions is
1. goal → expr
2. expr → expr op term
3. | term
4. term → number
5. | id
6. op → +
7. | –
15
Derivation
Production Result
goal

1. goal → expr expr

2. expr → expr op term expr op term

5. term → id expr op y

7. op → - expr - y

2. expr → expr op term expr op term - y

4. term → number expr op 2 - y

6. op → + expr + 2 - y

3. expr → term term + 2 - y

5. term → id x + 2 - y
16
Front End

• To recognize a valid sentence in some CFG, we reverse


this process and build up a parse, thus the name
“parser”.
• A parse can be represented by a tree: parse tree or
syntax tree.

17
Parse

18
Parse Tree

• A parse can be represented by a tree: parse tree or syntax


tree. For example, here is the parse tree for the expression
x+2–y

19
Parse Tree

• The parse tree captures all rewrite during the


derivation.
• The derivation can be extracted by starting at the root
of the tree and working towards the leaf nodes.

20
Abstract Syntax Tree

• The parse tree contains a lot of unneeded information.


Compilers often use an abstract syntax tree (AST).
• For example, the AST for the above parse tree is

21
Abstract Syntax Tree

• This is much more concise.


• AST summarizes grammatical structure without the
details of derivation.
• ASTs are one kind of intermediate representation
(IR).

22

You might also like