0% found this document useful (0 votes)

29 views87 pages

Lecture-1 & 2 Compiler

Lecture notes on introduction to compiler construction

Uploaded by

Felix Wanga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views87 pages

Lecture-1 & 2 Compiler

Lecture notes on introduction to compiler construction

Uploaded by

Felix Wanga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 87

CSC 421:

Compiler Construction
Course Objectives
• By the end of the course unit, the student
should be able :
1. To appreciate basic concepts of compiler
construction.
2. To construct compilers.
Course Contents
• WEEK 1
– Overview of the Compilation Process
– The phases of compilation
– Review of necessary concepts from programming languages
• WEEK 2
– Lexical Analysis
– The role of lexical analyzers
– Regular expressions
– Conversion of regular expressions to finite automata and lexical
analyzers
– The use of Lex in developing lexical analyzers under Unix
• WEEK 3 to 7
– Parsing
– Basic bottom-up and top-down parsing techniques
– LR, SLR, and LALR parsing
– YACC under Unix
– Other parser generating schemes
• WEEK 8 to 9
– Syntax Directed Translation
– Use of attributes in translation
– Syntax directed translation schemes for the common constructs of
programming languages
– Intermediate code representations
• WEEK 10
– Supporting Considerations
– Symbol table management
– Run time support
– Error detection and recovery techniques
• WEEK 11
– Optimization and Code Generation
– Brief introduction to code generation issues
• WEEK 12
– Reviews and Examinations
Course Info
• Meeting time:
Mondat 4:00-7:00 p.m

• Meeting Room: N6
• Prerequisites:
• Automata and Languages
• Computer Programming
• Computer Architecture
• Assembly Language Programming
Textbooks
• Primary Textbooks:
– Compilers: Principles, Techniques and Tools by Aho, Sethi, and
Ullman; Addison-Wesley Pub Co, ISBN: 0201100886
– Compiler Design by Santanu chattopadhyay. Prentice Hall of India
Private limited: ISBN: 81-203-2725-X
• Recommended Textbooks:
– The Theory And Practice Of Compiler Writing by Jean-Paul
Tremblay, Paul G. Scoreman;
– Systems Software: An Introduction to Systems Programming
by Leland L Beck; Addison-Wesley Pub Co, ISBN: 0201423006
– Constructing Language Processors for Little Languages by
Randy M. Kaplan; John Wiley & Sons, ISBN: 0471597546
Lecturer Info
• Name: Dr. Shikali
• Office: Northern
• Phone: 0720-832863
• E-mail: [email protected]
• Office Hours:
• Mondays: 10:00 - 1:00,
• Tuesday: 11:00 – 1:30,
• Wednesday: 11:00 – 1:30, and
• By appointment.
Delivery & Grading

• Delivery Lectures

• Evaluation
– Continuous Assessment - 10%;
– Written assignments & Projects - 20%;
– Final Examination - 70%
-----
100%
Projects
• Basically, one big project in 5 parts.
• You must work in small groups ; 2-3 students
per group. Only hand in one written set of
answers.
Projects (cont’d)

Project 1: Lexical Analysis (Scanner)

Project 2: Syntax Analysis (Parser)
Project 3: Semantic Analysis (Compile-time error handling)
Project 4: Intermediate Code Generation
Project 5: Target Code Generation
Why take this course?
• Compilers draw together all of the theory and
techniques that you’ve learned about in most of your
previous computer sciences courses.
• We will focus on “little languages” - you will be
writing simple compilers to solve the kinds of
problems you may face in a career as a programmer.
• You will gain a deeper understanding of how
compilers work, and be able to write better code.
• You will learn to write other useful tools, such as
parsers, interpreters, and debuggers.
Programming Languages
 Human use natural languages to
communicate with each other
– Kiswahili, English, French, etc
 Human use programming languages to
communicate with computers
– Perl, Pascal, C++, …
The translation process
1. The sequence of characters of a source text is translated
into a corresponding sequence of symbols of the
vocabulary of the language. For instance, identifiers
consisting of letters and digits, numbers consisting of
digits, delimiters and operators consisting of special
characters are recognized in this phase, which is called
lexical analysis.
2. The sequence of symbols is transformed into a
representation that directly mirrors the syntactic structure
of the source text and lets this structure easily be
recognized. This phase is called syntax analysis
(parsing).
The translation process
3. High-level languages are characterized by the fact that
objects of programs, for example variables and functions,
are classified according to their type. Therefore, in
addition to syntactic rules, compatibility rules among
types of operators and operands define the language.
Hence, verification of whether these compatibility rules
are observed by a program is an additional duty of a
compiler. This verification is called type
checking/Semantic analysis.
4. On the basis of the representation resulting from step 2, a
sequence of instructions taken from the instruction set of
the target computer is generated. This phase is called code
generation. In general it is the most involved part, not
least because the instruction sets of many computers lack
the desirable regularity. Often, the code generation part is
therefore subdivided further.
Language and Syntax
• Every language displays a structure called
its grammar or syntax.
• For example, a correct sentence always
consists of a subject followed by a
predicate, correct here meaning well
formed.
• This fact can be described by the following
formula:
sentence = subject predicate.
Cont..
• If we add to this formula the two further formulas
subject = "John" | "Mary".
predicate = "eats" | "talks".
• then we define herewith exactly four possible
sentences, namely
John eats Mary eats
John talks Mary talks
• where the symbol | is to be pronounced as or. We
call these formulas syntax rules,productions, or
simply syntactic equations.
Cont..
• Subject and predicate are syntactic classes. A
shorter notation for the above omits meaningful
identifiers:
S = AB. L = {ac, ad, bc, bd}
A = "a" | "b".
B = "c" | "d".
• The set L of sentences which can be generated in
this way, that is, by repeated substitution of the
left hand sides by the right-hand sides of the
equations, is called the language.
A language is defined by the
following:
1. The set of terminal symbols. These are the symbols that
occur in its sentences. They are said to be terminal,
because they cannot be substituted by any other symbols.
The substitution process stops with terminal symbols. In
our first example this set consists of the elements a, b, c
and d. The set is also called vocabulary.
2. The set of nonterminal symbols. They denote syntactic
classes and can be substituted. In our first example this set
consists of the elements S, A and B.
3. The set of syntactic equations (also called productions).
These define the possible substitutions of nonterminal
symbols. An equation is specified for each nonterminal
symbol.
4. The start symbol. It is a nonterminal symbol, in the
examples above denoted by S.
Computer Organization

Applications

Translator

Operating System

Hardware Machine
What is a compiler?
• A computer program is a set of instructions that the
computer can understand and execute.
• In reality computers don’t understand the instructions, they
simply process data
• Computer languages need to be unambiguous and have an
exactly defined syntax and semantic (unlike humans
language)
• High level programming languages have been developed
for human convenience and readability
• A compiler is a program that reads the high level input
program and translates the high level language into
machine code.
Compiler

Program text Compiler Machine code

Errors
What is a language?
• Major elements of a language
– Syntax – determines what phrases there are in the language
– Semantics – determines what a phrase means
– Pragmatics – how the language is used

• Language has two parts

– “Words” of the language, or tokens (e.g. “if”, “{“)
– “Phrases” of the language (e.g. if (x<y) then x++;)
– Words and phrases define language syntax

• Tokens themselves may be specified using regular

expressions
• The language structure can be defined using context free
grammars (see later)
What is a compiler? Contd.
Compilers are large, complicated programs that can only convert
programs that conform to the syntax and semantic rules for a
particular language.
 Analysis-Front End
» Lexical Analysis
» Syntax Analysis
» Semantic Analysis
1. Recognises legal procedures
2. Reports errors
3. Produces Intermediate Language, preliminary storage map
4. Shapes code for back end
 Synthesis-Back End
» Intermediate Code Generation
» Code Optimization
» Code Generation
1. Translates the intermediate language into target machine code
2. Chooses the instructions required for each IL operation
3. Decide what information to keep on the processor registers
4. Ensures that the resulting program uses the target system interfaces correctly
How are languages
implemented?
• Various strategies depend on how much pre-
processing is done before a program can be
run, and how CPU-specific the program is.
• Interpreters run a program “as is” with little or
no pre-processing, but no changes need to be
made to run on a different platform.
• Compilers do extensive pre-processing, but
will run a program 2- to 20- times faster.
Source Compilation Process
Program

Source Program and

Compiler Data data Processed at
different times

Object
Program Executing
Result
Computer

Runtime
Interpretative Process
Data

Source
Program Interpreter
Result

• Processes an internal form of the source program

and data at the same time. Interpretation of the
internal source form occurs at Runtime and no
object program is generated
Language implementations
cont’d
• Some newer languages use a combination of
compiler and interpreter to get many of the benefits
of each.
• Examples are Java and Microsoft’s .NET, which
compile into a virtual assembly language (while
being optimized), which can then be interpreted on
any computer.
• Some languages (such as Basic or Lisp) have both
compilers and interpreters written for them.
• Recently, “Just-in-Time” compilers are becoming
more common - compile code only when its used!
History of compiler
development
1953 IBM develops the 701 EDPM (Electronic Data
Processing Machine), the first general purpose computer,
built as a “defense
calculator” in the Korean
War

No high-level
languages were
available, so all
programming was
done in assembly
History of compilers (cont’d)
As expensive as these early computers were, most of the
money companies spent was for software development,
due to the complexities of assembly.
In 1953, John Backus came up with the
idea of “speed coding”, and
developed the first
interpreter. Unfortunately, this
was 10-20 times slower than
programs written in
John Backus
assembly.

He was sure he could do better.

History of compilers (cont’d)
In 1954, Backus and his team released a research paper
titled “Preliminary Report, Specifications for the IBM
Mathematical FORmula TRANslating System,
FORTRAN.”
The initial release of FORTRAN I was in 1956, totaling
25,000 lines of assembly code. Compiled programs ran
almost as fast as handwritten assembly!
Projects that had taken two weeks to write now took
only 2 hours. By 1958 more than half of all software
was written in FORTRAN.
Modern Compilers
Compilers have not changed a great deal since the days
of Backus. They still consist of two main components:

The front-end reads in the program in the source

languages, makes sense of it, and stores it in an internal
representation…

…and the back-end takes the internal representation and

converts it into the target language, perhaps with some
optimizations. The target language is typically
assembly, but it is often easier to use an established,
higher-level language.
A Compiler Model
Analysis
Source
Program Scanner

Parser S
Y
Analysis M
and Error Semantic B
diagnostics Analyzer O
L

Intermediate Form
T
A
Error B
Messages L
Initial code E
generator

Object
Code Code
generator

Synthesis
Structure Source Language
of a
Compiler
Errors

?
Warnings

Target Language
Source Language
Structure
of a
Compiler Front End

Intermediate Code

Back End

Target Language
Source Language
Structure
Lexical Analyzer
of a
Syntax Analyzer Front
Compiler End
Semantic Analyzer

Int. Code Generator

Intermediate Code

Back End

Target Language
Source Language
Structure
Lexical Analyzer
of a
Syntax Analyzer Front
Compiler End
Semantic Analyzer

Int. Code Generator

Intermediate Code

Code Optimizer Back

Target Code Generator End

Target Language
Source Language Example Compilation
Lexical Analyzer
Source Code:
cur_time = start_time + cycles * 60
Syntax Analyzer

Semantic Analyzer

Int. Code Generator

Intermediate Code

Code Optimizer

Target Code Generator

Target Language
Source Language Example Compilation
Lexical Analyzer
Source Code:
cur_time = start_time + cycles * 60
Syntax Analyzer
Lexical Analysis:
ID(1) ASSIGN ID(2) ADD ID(3) MULT INT(60)
Semantic Analyzer

Int. Code Generator

Intermediate Code

Code Optimizer

Target Code Generator

ID(1) ADD
Intermediate Code
ID(2) MULT
Code Optimizer ID(3) INT(60)

Target Code Generator

Target Language
Source Language Example Compilation
Syntax Analysis:
Lexical Analyzer ASSIGN

Syntax Analyzer ID(1) ADD

ID(2) MULT
Semantic Analyzer
ID(3) INT(60)
Int. Code Generator Sematic Analysis:
ASSIGN

Intermediate Code ID(1) ADD

ID(2) MULT
Code Optimizer
ID(3) int2real
Target Code Generator
INT(60)

Target Language
Source Language Example Compilation
Lexical Analyzer Sematic Analysis:
ASSIGN
Syntax Analyzer
ID(1) ADD

Semantic Analyzer ID(2) MULT

Int. Code Generator ID(3) int2real

INT(60)
Intermediate Code
Intermediate Code:
temp1 = int2real(60)
Code Optimizer temp2 = id3 * temp1
temp3 = id2 + temp2
Target Code Generator id1 = temp3

Target Language
Source Language Example Compilation
Intermediate Code:
Lexical Analyzer temp1 = int2real(60)
temp2 = id3 * temp1
Syntax Analyzer temp3 = id2 + temp2
id1 = temp3

Semantic Analyzer Optimized Code (step 0):

temp1 = int2real(60)
Int. Code Generator temp2 = id3 * temp1
temp3 = id2 + temp2
id1 = temp3
Intermediate Code

Code Optimizer

Target Code Generator

Target Language
Source Language Example Compilation
Intermediate Code:
Lexical Analyzer temp1 = int2real(60)
temp2 = id3 * temp1
Syntax Analyzer temp3 = id2 + temp2
id1 = temp3

Semantic Analyzer Optimized Code (step 1):

temp1 = 60.0
Int. Code Generator temp2 = id3 * temp1
temp3 = id2 + temp2
id1 = temp3
Intermediate Code

Code Optimizer

Target Code Generator

Target Language
Source Language Example Compilation
Intermediate Code:
Lexical Analyzer temp1 = int2real(60)
temp2 = id3 * temp1
Syntax Analyzer temp3 = id2 + temp2
id1 = temp3

Semantic Analyzer Optimized Code (step 2):

Int. Code Generator temp2 = id3 * 60.0

temp3 = id2 + temp2
id1 = temp3
Intermediate Code

Code Optimizer

Target Code Generator

Target Language
Source Language Example Compilation
Intermediate Code:
Lexical Analyzer temp1 = int2real(60)
temp2 = id3 * temp1
Syntax Analyzer temp3 = id2 + temp2
id1 = temp3

Semantic Analyzer Optimized Code (step 3):

Int. Code Generator temp2 = id3 * 60.0

id1 = id2 + temp2

Intermediate Code

Code Optimizer

Target Code Generator

Target Language
Source Language Example Compilation
Intermediate Code:
Lexical Analyzer temp1 = int2real(60)
temp2 = id3 * temp1
Syntax Analyzer temp3 = id2 + temp2
id1 = temp3

Semantic Analyzer Optimized Code:

Int. Code Generator temp1 = id3 * 60.0

id1 = id2 + temp1

Intermediate Code

Code Optimizer

Target Code Generator

Target Language
Source Language Example Compilation
Intermediate Code:
Lexical Analyzer temp1 = int2real(60)
temp2 = id3 * temp1
Syntax Analyzer temp3 = id2 + temp2
id1 = temp3

Semantic Analyzer Optimized Code:

Int. Code Generator temp1 = id3 * 60.0

id1 = id2 + temp1

Intermediate Code Target Code:

MOVF id3, R2
MULF #60.0, R2
Code Optimizer MOVF id2, R1
ADDF R2, R1
Target Code Generator MOVF R1, id1

Target Language
Example II
Refer to section 1.5 of SANTANU
Lexical Analysis
Source Language
Structure
Lexical Analyzer
of a
Syntax Analyzer Front
Compiler End
Semantic Analyzer

Int. Code Generator

Intermediate Code

Code Optimizer Back

Target Code Generator End

Target Language
Source Language

Lexical Analyzer
Today!
Syntax Analyzer Front
End
Semantic Analyzer

Int. Code Generator

Intermediate Code

Code Optimizer Back

Target Code Generator End

Target Language
What exactly is lexing?
Consider the code:
if (i==j);
z=1;
else;
z=0;
endif;

This is really nothing more than a string of characters:

i f _ ( i = = j ) ; \n\tz = 1 ; \ne l s e ; \n\tz = 0 ; \ne n d i f ;
During our lexical analysis phase we must divide this
string into meaningful sub-strings.
Tokens
The output of our lexical analysis phase is a
streams of tokens.
A token is a syntactic category.
In English this would be types of words or
punctuation, such as a “noun”, “verb”,
“adjective” or “end-mark”.
In a program, this an identifier, a floating-point
number, a math symbol or a keyword.
Identifying Tokens
A sub-string that represents an instance of a token
is called a lexeme.
The class of all possible lexemes in a token is
described by the use of a pattern.
For example, the pattern to describe an identifier
(a variable) is a string of letters, numbers, or
underscores, beginning with a non-number.
Patterns are typically described using regular
expressions.
Implementation
A lexical analyzer must be able to do three things:
1. Remove all whitespace and comments.

2. Identify tokens within a string.

3. Return the lexeme of a found token, as

well as the line number it was found on.

How do we go about implementing this?

Example
i f _ ( i = = j ) ; \n\tz = 1 ; \ne l s e ; \n\tz = 0 ; \ne n d i f ;

Line-number - Token - Lexeme

1 COM_BLOCK if
1 OPEN (
1 ID i
1 OP_RELATION ==
1 CLOSE )
1 ENDLINE ;
2 ID z
2 OP_ASSIGN =
2 NUMBER 1
Etc…
Lookahead
Lookahead will typically be important to a
lexical analyzer.
Tokens are typically read in from left-to-right,
recognized one at a time from the input string.
It is not always possible to instantly decide if a
token is finished without looking ahead at the
next character. For example…
Is “i” a variable, or the first character of “if”?
Is “=” an assignment or the beginning of “==”?
Lookahead example
Some languages require more lookahead than
others. Example: Fortran removes all whitespace
before processing and cannot get clues from it.

DO 5 I = 1.25

“DO5I” is a variable!

DO 5 I = 1,25

Here, “DO” is a keyword!

Uglier Lookahead example
PL/I is another example of a difficult to lex
language because it allows identifiers to be the
same as keywords. Consider this legal statement:

IF THEN THEN THEN = ELSE;

ELSE ELSE = THEN;

ELSE and THEN were both previously defined as

variables.
Details…
How much lookahead will we need? And how
do we figure out ambiguities?
If we see the characters “intact”, how do we
know if we are declaring an integer called “ act”
or making use of a previously defined identifier
by the name “intact”?
We need specific rules that will ensure we never
have more than one possible answer, ideally with
only one character of lookahead.
To do…
Introduction to Lexical Analysis
Lexing examples
Regular expressions
Project 1 overview
Review of our language
Finite automata (Refer to what was taught…)
Languages
An alphabet is a well defined set of characters.
The character ∑ is typically used to represent an
alphabet.

A language over ∑ is a set of strings made up of

characters drawn from ∑.

Examples:
Alphabet: A-Z Language: English
Alphabet: ASCII Language: C++
Regular Expressions
Each regular expression is a notation for a
regular language (a well-defined set of possible
words.)
If A is a regular expression, then L(A) is the
language defined by that regular expression.
L(“c”) is the language with the single word “c”.
Concatenation:
L(AB) = { ab | a  L(A) and b  L(B) }
L(“i” “f”) is the language with just “if” in it.
Regular Expressions (cont’d)
Union:
L(A | B) = { s | s  L(A) or s  L(B) }

L(“if” | “then” | “else”) is the language with just

the words “if”, “then”, and “else”.

L((“0” | “1”)(“0” | “1”)) is the language

consisting of “00”, “01”, “10” and “11”.
Regular Expressions (cont’d)
There are several special symbols:

“+” indicates one or more repeats can be used.

L(A+) = { L(A) or L(AA) or L(AAA) or … }

 is the empty string.

“*” indicates zero or more repeats can be used.

L(A*) = {  or L(A+) }
Defining our language…
The first thing we can define in our language are
keywords. These are easy:

L(“if” | “else” | “while” | “find” | …)

When we scan a file, we can either have a single

token represent all keywords, or else break them
down into groups, such as “commands”, “types”,
etc.
Language def cont’d
Next we will define integers in our language:

digit = “0” | “1” | “2” | “3” | “4” | “5” | “6” | “7” | “8” | “9”
integer = digit+

Note that we can abbreviate ranges using the dash (“-”).

Thus, digit = 0-9

Float is not much more complicated:

float = digit+ “.” digit+

Language def cont’d
Identifiers are strings of letters, underscores, or digits
beginning with a non-digit. Identifies are used for
names the programmer comes up with, such as variable
or function names.

letter = a-z | A-Z

identifier = (letter | “_”)(letter | “_” | digit)*

Note that in most languages (including ours) keywords

are reserved and cannot be used as identifiers.
Real-world example
What is the regular expression that defines all phone
numbers?

∑ = { 0-9, (, ), - }

area = digit3
exchange = digit3
local = digit4

phone_number = “(” area “)” exchange “-” local

How many strings are defined by L(phone_number)?

Problems for you
What is the regular expression that defines all e-mail
addresses?

Describe what binary strings are defined by the

following languages:
0 (0 | 1)* 0
(0 | 1)* 0 (0 | 1)*
( | 0) 1*
( ( | 0) 1* )*
Lexical Analysis
Identifiers and integers are recognized
directly as single tokens:
e.g.
<ident> ::= <letter> | <ident> <letter> | <ident> <digit>
<letter> ::= A | B | C| D| … | Z
<digit> ::= 0|1|2|3|…|9

The parser would interpret a sequence of such characters as a

language construct <ident>.
Lexical Analysis
A large number of source program consists
of such multiple-character identifiers, there
is a saving in compilation time.
The scanner recognizes both single and
multiple character tokens directly.
The output of a scanner is a sequence of
tokens.
Example of a PASCAL program
PROGRAM STATS
VAR
SUM, SUMSQ, I, VALUE, MEAN, VARIANCE: INTEGER
BEGIN
SUM:=0;
SUMSQ:=0;
FOR I:=1 TO 100 DO
BEGIN
READ (VALUE);
SUM:=SUM+VALUE;
SUMSQ:=SUMSQ+VALUE*VALUE
END;
MEAN:=SUM DIV 100;
VARIANCE:=SUMSQ DIV 100 –MEAN*MEAN;
WRITE (MEAN, VARIANCE)
END.
BNF

BNF grammar contains a set of rules that

defines the syntax of some construct in the
programming language.
::= is defined as /(to be)
< > non terminal symbols
(Constructs defined in the
grammar)
No angle brackets terminal symbols
SIMPLIFIED PASCAL GRAMMAR
1. <program> ::= PROGRAM <prog-name> VAR <dec-list> BEGIN <stmt-
list> END.
2. <prog-name>::= id
3. <dec-list> ::= <dec> | <dec-list>; <dec>
4. <dec> ::= <id-lis> : <type>
5. <type> ::= INTEGER
6. <id-list> ::= id | <id-list> , id
7. <stmt-list> ::= <stmt> | <stmt-list> ; <stmt>
8. <stmt> ::= <assign> | <read> | <write> | <for>
9. <assign> ::= id := <exp>
10. <exp> ::= <term> | <exp> + <term> | <exp> - <term>
11. <term> ::= <factor> | <term> * <factor> | <term> DIV <factor>
12. <factor> ::= id | int | ( <exp> )
13. <read> ::= READ ( <id-list> )
14. <write> ::= WRITE ( <id-list> )
15. <for> ::= FOR <index-exp> DO <body>
16. <index-exp> ::= id := <exp> TO <exp>
17. <body> ::= <stmt> | BEGIN <stmt-list> END
Lexical Analysis

The scanner reads a character stream and converts it into
a token stream

White space is ignored (space, tab, return, formfeed)

Comments are ignored

Tokens are the basic entities of the language (identifiers,
numbers, operators, keywords, punctuation symbols etc.)

The character string associated with a token is called its
lexeme

The scanner may produce error messages (strings that
don't match any known token)

The scanner may store information in the symbol table
Token Coding scheme for the above grammar
Token Code

PROGRAM 1
VAR 2
BEGIN 3
END 4
END. 5
INTEGER 6
FOR 7
READ 8
WRITE 9
TO 10
DO 11
; 12
: 13
, 14
:= 15
+ 16
- 17
* 18
DIV 19
( 20
) 21
Id 22
int 23
Syntactic Analysis

Syntax refers to the structure (or grammar) of the language
(layout, statements, blocks etc.)

The parser groups tokens into grammatical phrases
corresponding to the structure of the language

Syntactic errors are things like "missing ;"
Example of a grammar for arithmetic expressions:
<exp> -> <exp> + <term> | <exp> - <term> | <term>
<term> -> <term> * <factor> | <term> / <factor> | <factor>
<factor> -> ( <exp> ) | id | num

tokens: id, num, (, ), +, -, *, /

structures: <exp>, <term>, <factor>
-> means is composed of | means or
Semantic Analysis

Semantics refers to meaning

Type checking and function argument checking

variables defined before use

operands are compatible (coercion may be applied)
reals can't be used to index arrays
the right number and type of function arguments


The Symbol Table has the required information for semantic
analysis
Code Generation and
Optimization
Possible intermediate code representations
syntax trees
directed acyclic graphs
postfix notation
3 address code

Possible optimizations
remove redundant or unreachable code
propagate constant values
optimize loops

Target code generation

optimal use of registers for frequently used data
taking advantage of specific architectural features
Compiler Structure
A pass is a reading of the program from file
Most compilers are 1 or 2 pass compilers.
Most C compilers are 3-pass
1st pass generates intermediate code
2nd pass generates assembler code
3rd pass generates optimized assembler
Some languages make 1-pass compiler structure
almost impossible because they allow undeclared
variables or forward jumps (gotos)
1-Pass Structure


Easy to implement

Requires large memory in order to store intermediate
representation

Produces relatively inefficient code

Example: the first Pascal compilers
2-Pass Structure

1st Pass: The front end or analysis phases

The parser is the main routine
it calls the scanner to get the next token
groups structures and calls generator subroutines
to write the intermediate representation
2nd Pass: The back end or synthesis phases
Reads intermediate representation
optimizes and writes target code
Example of a 2-pass
Assembler

Source program:
mov a, R1
add #2, R1
mov R1, b

1st pass: identifiers added to symbol table with relocatable addresses

id location
a 0
b 4

2nd pass: generate opcodes + addresses

00000001 00000000 11000001
00000010 10000010 11000001
00000001 11000001 00000100
Retargetable Compilers

Advantages of having 2-pass structure:


Increased opportunities for optimization

Platform-independent languages


Language-independent programs
Symbol Table
A data structure with a record for each identifier used in the program
(variables, user-defined type names, functions, formal arguments
etc)

Attributes may include:

Storage size
Type
Scope (visible within what language blocks)
Number and types of arguments

Possible structures:
Array
Linked List
Binary Search Tree
Hash Table
Error Handling

Each analysis phase may produce errors

Error messages should be meaningful

"Syntax Error" isn't very helpful


Error messages should indicate the location in the source file

Often not detected until already past it


Ideally, the compiler should recover and report as many errors as
possible rather than die the first time it encouters a problem

OPCRF of School Heads SY 24 25 Edited Template
No ratings yet
OPCRF of School Heads SY 24 25 Edited Template
42 pages
CD All Units
No ratings yet
CD All Units
117 pages
Compiler Design
No ratings yet
Compiler Design
156 pages
Compiler Design (R18) Notes
No ratings yet
Compiler Design (R18) Notes
155 pages
Dcit408 - Compilers Course Outline 2024 - 2025
No ratings yet
Dcit408 - Compilers Course Outline 2024 - 2025
7 pages
Introduction To Compilers1
No ratings yet
Introduction To Compilers1
47 pages
Compiler Design Notes
No ratings yet
Compiler Design Notes
157 pages
BESCK204E - INC - Question Bank - PDF
No ratings yet
BESCK204E - INC - Question Bank - PDF
32 pages
Unit 1 Compiler Design
No ratings yet
Unit 1 Compiler Design
124 pages
Compiler Design
No ratings yet
Compiler Design
152 pages
Compilers
No ratings yet
Compilers
86 pages
Introduction
No ratings yet
Introduction
53 pages
Notes Compile Complete
No ratings yet
Notes Compile Complete
117 pages
CSC 431 Compiler Construction
No ratings yet
CSC 431 Compiler Construction
81 pages
CSC303 - Compiler Design - 060624
No ratings yet
CSC303 - Compiler Design - 060624
49 pages
Introduction1
No ratings yet
Introduction1
63 pages
Compilers Intro Jan2025
No ratings yet
Compilers Intro Jan2025
60 pages
CSC Slides Intro N Lex
No ratings yet
CSC Slides Intro N Lex
77 pages
Mado CSC434 Compiler Const E-Lecture
No ratings yet
Mado CSC434 Compiler Const E-Lecture
79 pages
CD Unit I Part I Introduction
No ratings yet
CD Unit I Part I Introduction
67 pages
Compiler Lecture-1
No ratings yet
Compiler Lecture-1
47 pages
Compiler Design
No ratings yet
Compiler Design
188 pages
PoCD Chapter 01 PPT 2024-25
No ratings yet
PoCD Chapter 01 PPT 2024-25
56 pages
CC 1
No ratings yet
CC 1
41 pages
CD Chapter 1
No ratings yet
CD Chapter 1
39 pages
Structure and Phases of A Compiler
No ratings yet
Structure and Phases of A Compiler
54 pages
Compiler Design 1
No ratings yet
Compiler Design 1
55 pages
Compiler Design
No ratings yet
Compiler Design
25 pages
Unit-1 Introduction To Compilers: Goals of Translation
No ratings yet
Unit-1 Introduction To Compilers: Goals of Translation
22 pages
Unit 1 (A)
No ratings yet
Unit 1 (A)
40 pages
Compiler Construction Week 1
No ratings yet
Compiler Construction Week 1
34 pages
CSC-459-CC-Lab Manual
No ratings yet
CSC-459-CC-Lab Manual
71 pages
Bedasa
No ratings yet
Bedasa
31 pages
Compiler Construction 1
No ratings yet
Compiler Construction 1
25 pages
What Are Programming Languages
No ratings yet
What Are Programming Languages
154 pages
Compiler CH-2
No ratings yet
Compiler CH-2
60 pages
Compiler Design: B.Tech Cse Iii Year Ii Semester
No ratings yet
Compiler Design: B.Tech Cse Iii Year Ii Semester
25 pages
Chap1 (Minimized)
No ratings yet
Chap1 (Minimized)
23 pages
Compile Construction
No ratings yet
Compile Construction
84 pages
Compiler 1
No ratings yet
Compiler 1
33 pages
Copch 1
No ratings yet
Copch 1
32 pages
ch1 Intro1
No ratings yet
ch1 Intro1
27 pages
Compiler Design: Dr. Sherif Eletriby
No ratings yet
Compiler Design: Dr. Sherif Eletriby
46 pages
Computer Glossary
No ratings yet
Computer Glossary
45 pages
Unit 1
No ratings yet
Unit 1
38 pages
01 - Introduction To Compilers Structure & Goals
No ratings yet
01 - Introduction To Compilers Structure & Goals
22 pages
CH 1
No ratings yet
CH 1
21 pages
Chapter 1 Compiler
No ratings yet
Chapter 1 Compiler
31 pages
CSC 413 Organisation of Programming Language 2021
No ratings yet
CSC 413 Organisation of Programming Language 2021
41 pages
Week 1-2
No ratings yet
Week 1-2
27 pages
Indian Institute of Information Technology, Bhagalpur: Assignment - 1
No ratings yet
Indian Institute of Information Technology, Bhagalpur: Assignment - 1
26 pages
Chapter 1-1
No ratings yet
Chapter 1-1
25 pages
Ics 2370 Compiler Construction: Mr. Ussen Kimanuka
No ratings yet
Ics 2370 Compiler Construction: Mr. Ussen Kimanuka
19 pages
PPC Syllabus-2024-25
No ratings yet
PPC Syllabus-2024-25
3 pages
Lecture 01 02
No ratings yet
Lecture 01 02
12 pages
Advanced Computer Systems: Compiler Design & Implementation
No ratings yet
Advanced Computer Systems: Compiler Design & Implementation
20 pages
Dakshina Ranjan Kisku Associate Professor Department of Computer Science and Engineering National Institute of Technology Durgapur
No ratings yet
Dakshina Ranjan Kisku Associate Professor Department of Computer Science and Engineering National Institute of Technology Durgapur
31 pages
Compiler Construction Course
No ratings yet
Compiler Construction Course
12 pages
UNIT1
No ratings yet
UNIT1
40 pages
Introduction To Compiler Design (CD) : Mu-Mit
No ratings yet
Introduction To Compiler Design (CD) : Mu-Mit
22 pages
PDS Unit1-1
No ratings yet
PDS Unit1-1
104 pages
Audishankara Python Material
No ratings yet
Audishankara Python Material
108 pages
Guidelines For UG Training Project Seminar Report Format
No ratings yet
Guidelines For UG Training Project Seminar Report Format
31 pages
Compiler Design LectureNotes
No ratings yet
Compiler Design LectureNotes
45 pages
Compiler Construction CS-4207: Lecture 1 & 2 Instructor Name: Atif Ishaq
No ratings yet
Compiler Construction CS-4207: Lecture 1 & 2 Instructor Name: Atif Ishaq
29 pages
Lec00 Outline
No ratings yet
Lec00 Outline
27 pages
CS609 Quiz 2 Merged
No ratings yet
CS609 Quiz 2 Merged
545 pages
8085 MCQ - MCQ For Five Unit 8085 MCQ - MCQ For Five Unit
No ratings yet
8085 MCQ - MCQ For Five Unit 8085 MCQ - MCQ For Five Unit
21 pages
Chapter 5
No ratings yet
Chapter 5
53 pages
Cambridge International AS & A Level: Computer Science 9618/23
No ratings yet
Cambridge International AS & A Level: Computer Science 9618/23
24 pages
CIT 407 - Mobile Applications Development
No ratings yet
CIT 407 - Mobile Applications Development
51 pages
Computer Registers
No ratings yet
Computer Registers
7 pages
Imparative Programming in Python
No ratings yet
Imparative Programming in Python
16 pages
ICL Technical Journal V01i01
No ratings yet
ICL Technical Journal V01i01
98 pages
Aqa Gcse Computer Science Paper 1 May 2023 Question Paper 85251c Computational Thinking and Programming Skills Vbnet
No ratings yet
Aqa Gcse Computer Science Paper 1 May 2023 Question Paper 85251c Computational Thinking and Programming Skills Vbnet
40 pages
Project Documentation1
No ratings yet
Project Documentation1
57 pages
Microprocessor Note
No ratings yet
Microprocessor Note
50 pages
Topic 1 - 2 Fundamentals of Computer Security - Symmetric
No ratings yet
Topic 1 - 2 Fundamentals of Computer Security - Symmetric
75 pages
1.introduction To Bigdata Chap1
No ratings yet
1.introduction To Bigdata Chap1
35 pages
C02-Wireless Transmission
No ratings yet
C02-Wireless Transmission
53 pages
Project Proposal
No ratings yet
Project Proposal
2 pages
FPL Questions Bank 2024 Pattern-1
No ratings yet
FPL Questions Bank 2024 Pattern-1
4 pages
Advanced Web Programming-Part 2
No ratings yet
Advanced Web Programming-Part 2
23 pages
Python Question Bank
No ratings yet
Python Question Bank
5 pages
C03-Medium Access
No ratings yet
C03-Medium Access
30 pages
April Monthly Test (24 25)
No ratings yet
April Monthly Test (24 25)
3 pages
CHAPTER 2 - PRINCIPLES OF PROGRAMMING & DESIGN OF ALGORITHMSds
No ratings yet
CHAPTER 2 - PRINCIPLES OF PROGRAMMING & DESIGN OF ALGORITHMSds
13 pages
43445110402
No ratings yet
43445110402
3 pages
CIS1067 - June2022 - PastPaper
No ratings yet
CIS1067 - June2022 - PastPaper
5 pages
PLSQL Practical
No ratings yet
PLSQL Practical
76 pages
Differences Between Object Oriented Programming in Python and C
No ratings yet
Differences Between Object Oriented Programming in Python and C
1 page
Let Us Python (5th Ed.) - SOLUTIONS Yashavant Kanetkar 2024 Scribd Download
100% (2)
Let Us Python (5th Ed.) - SOLUTIONS Yashavant Kanetkar 2024 Scribd Download
40 pages
COMPUTER PROGRAMMING FOR KIDS: An Easy Step-by-Step Guide For Young Programmers To Learn Coding Skills (2022 Crash Course for Newbies)
From Everand
COMPUTER PROGRAMMING FOR KIDS: An Easy Step-by-Step Guide For Young Programmers To Learn Coding Skills (2022 Crash Course for Newbies)
Dexter Rogers
No ratings yet
Introduction to Programming Languages
From Everand
Introduction to Programming Languages
IntroBooks Team
4/5 (1)