lecture-1 & 2 compiler
lecture-1 & 2 compiler
Compiler Construction
Course Objectives
• By the end of the course unit, the student
should be able :
1. To appreciate basic concepts of compiler
construction.
2. To construct compilers.
Course Contents
• WEEK 1
– Overview of the Compilation Process
– The phases of compilation
– Review of necessary concepts from programming languages
• WEEK 2
– Lexical Analysis
– The role of lexical analyzers
– Regular expressions
– Conversion of regular expressions to finite automata and lexical
analyzers
– The use of Lex in developing lexical analyzers under Unix
• WEEK 3 to 7
– Parsing
– Basic bottom-up and top-down parsing techniques
– LR, SLR, and LALR parsing
– YACC under Unix
– Other parser generating schemes
• WEEK 8 to 9
– Syntax Directed Translation
– Use of attributes in translation
– Syntax directed translation schemes for the common constructs of
programming languages
– Intermediate code representations
• WEEK 10
– Supporting Considerations
– Symbol table management
– Run time support
– Error detection and recovery techniques
• WEEK 11
– Optimization and Code Generation
– Brief introduction to code generation issues
• WEEK 12
– Reviews and Examinations
Course Info
• Meeting time:
Mondat 4:00-7:00 p.m
• Meeting Room: N6
• Prerequisites:
• Automata and Languages
• Computer Programming
• Computer Architecture
• Assembly Language Programming
Textbooks
• Primary Textbooks:
– Compilers: Principles, Techniques and Tools by Aho, Sethi, and
Ullman; Addison-Wesley Pub Co, ISBN: 0201100886
– Compiler Design by Santanu chattopadhyay. Prentice Hall of India
Private limited: ISBN: 81-203-2725-X
• Recommended Textbooks:
– The Theory And Practice Of Compiler Writing by Jean-Paul
Tremblay, Paul G. Scoreman;
– Systems Software: An Introduction to Systems Programming
by Leland L Beck; Addison-Wesley Pub Co, ISBN: 0201423006
– Constructing Language Processors for Little Languages by
Randy M. Kaplan; John Wiley & Sons, ISBN: 0471597546
Lecturer Info
• Name: Dr. Shikali
• Office: Northern
• Phone: 0720-832863
• E-mail: [email protected]
• Office Hours:
• Mondays: 10:00 - 1:00,
• Tuesday: 11:00 – 1:30,
• Wednesday: 11:00 – 1:30, and
• By appointment.
Delivery & Grading
• Delivery Lectures
• Evaluation
– Continuous Assessment - 10%;
– Written assignments & Projects - 20%;
– Final Examination - 70%
-----
100%
Projects
• Basically, one big project in 5 parts.
• You must work in small groups ; 2-3 students
per group. Only hand in one written set of
answers.
Projects (cont’d)
Applications
Translator
Operating System
Hardware Machine
What is a compiler?
• A computer program is a set of instructions that the
computer can understand and execute.
• In reality computers don’t understand the instructions, they
simply process data
• Computer languages need to be unambiguous and have an
exactly defined syntax and semantic (unlike humans
language)
• High level programming languages have been developed
for human convenience and readability
• A compiler is a program that reads the high level input
program and translates the high level language into
machine code.
Compiler
Errors
What is a language?
• Major elements of a language
– Syntax – determines what phrases there are in the language
– Semantics – determines what a phrase means
– Pragmatics – how the language is used
Object
Program Executing
Result
Computer
Runtime
Interpretative Process
Data
Source
Program Interpreter
Result
No high-level
languages were
available, so all
programming was
done in assembly
History of compilers (cont’d)
As expensive as these early computers were, most of the
money companies spent was for software development,
due to the complexities of assembly.
In 1953, John Backus came up with the
idea of “speed coding”, and
developed the first
interpreter. Unfortunately, this
was 10-20 times slower than
programs written in
John Backus
assembly.
Parser S
Y
Analysis M
and Error Semantic B
diagnostics Analyzer O
L
Intermediate Form
T
A
Error B
Messages L
Initial code E
generator
Object
Code Code
generator
Synthesis
Structure Source Language
of a
Compiler
Errors
?
Warnings
Target Language
Source Language
Structure
of a
Compiler Front End
Intermediate Code
Back End
Target Language
Source Language
Structure
Lexical Analyzer
of a
Syntax Analyzer Front
Compiler End
Semantic Analyzer
Intermediate Code
Back End
Target Language
Source Language
Structure
Lexical Analyzer
of a
Syntax Analyzer Front
Compiler End
Semantic Analyzer
Intermediate Code
Target Language
Source Language Example Compilation
Lexical Analyzer
Source Code:
cur_time = start_time + cycles * 60
Syntax Analyzer
Semantic Analyzer
Intermediate Code
Code Optimizer
Target Language
Source Language Example Compilation
Lexical Analyzer
Source Code:
cur_time = start_time + cycles * 60
Syntax Analyzer
Lexical Analysis:
ID(1) ASSIGN ID(2) ADD ID(3) MULT INT(60)
Semantic Analyzer
Intermediate Code
Code Optimizer
Target Language
Source Language Example Compilation
Lexical Analyzer
Source Code:
cur_time = start_time + cycles * 60
Syntax Analyzer
Lexical Analysis:
ID(1) ASSIGN ID(2) ADD ID(3) MULT INT(60)
Semantic Analyzer
Syntax Analysis:
Int. Code Generator ASSIGN
ID(1) ADD
Intermediate Code
ID(2) MULT
Code Optimizer ID(3) INT(60)
Target Language
Source Language Example Compilation
Syntax Analysis:
Lexical Analyzer ASSIGN
ID(2) MULT
Semantic Analyzer
ID(3) INT(60)
Int. Code Generator Sematic Analysis:
ASSIGN
ID(2) MULT
Code Optimizer
ID(3) int2real
Target Code Generator
INT(60)
Target Language
Source Language Example Compilation
Lexical Analyzer Sematic Analysis:
ASSIGN
Syntax Analyzer
ID(1) ADD
INT(60)
Intermediate Code
Intermediate Code:
temp1 = int2real(60)
Code Optimizer temp2 = id3 * temp1
temp3 = id2 + temp2
Target Code Generator id1 = temp3
Target Language
Source Language Example Compilation
Intermediate Code:
Lexical Analyzer temp1 = int2real(60)
temp2 = id3 * temp1
Syntax Analyzer temp3 = id2 + temp2
id1 = temp3
Code Optimizer
Target Language
Source Language Example Compilation
Intermediate Code:
Lexical Analyzer temp1 = int2real(60)
temp2 = id3 * temp1
Syntax Analyzer temp3 = id2 + temp2
id1 = temp3
Code Optimizer
Target Language
Source Language Example Compilation
Intermediate Code:
Lexical Analyzer temp1 = int2real(60)
temp2 = id3 * temp1
Syntax Analyzer temp3 = id2 + temp2
id1 = temp3
Code Optimizer
Target Language
Source Language Example Compilation
Intermediate Code:
Lexical Analyzer temp1 = int2real(60)
temp2 = id3 * temp1
Syntax Analyzer temp3 = id2 + temp2
id1 = temp3
Intermediate Code
Code Optimizer
Target Language
Source Language Example Compilation
Intermediate Code:
Lexical Analyzer temp1 = int2real(60)
temp2 = id3 * temp1
Syntax Analyzer temp3 = id2 + temp2
id1 = temp3
Intermediate Code
Code Optimizer
Target Language
Source Language Example Compilation
Intermediate Code:
Lexical Analyzer temp1 = int2real(60)
temp2 = id3 * temp1
Syntax Analyzer temp3 = id2 + temp2
id1 = temp3
Target Language
Example II
Refer to section 1.5 of SANTANU
Lexical Analysis
Source Language
Structure
Lexical Analyzer
of a
Syntax Analyzer Front
Compiler End
Semantic Analyzer
Intermediate Code
Target Language
Source Language
Lexical Analyzer
Today!
Syntax Analyzer Front
End
Semantic Analyzer
Intermediate Code
Target Language
What exactly is lexing?
Consider the code:
if (i==j);
z=1;
else;
z=0;
endif;
DO 5 I = 1.25
“DO5I” is a variable!
DO 5 I = 1,25
Examples:
Alphabet: A-Z Language: English
Alphabet: ASCII Language: C++
Regular Expressions
Each regular expression is a notation for a
regular language (a well-defined set of possible
words.)
If A is a regular expression, then L(A) is the
language defined by that regular expression.
L(“c”) is the language with the single word “c”.
Concatenation:
L(AB) = { ab | a L(A) and b L(B) }
L(“i” “f”) is the language with just “if” in it.
Regular Expressions (cont’d)
Union:
L(A | B) = { s | s L(A) or s L(B) }
digit = “0” | “1” | “2” | “3” | “4” | “5” | “6” | “7” | “8” | “9”
integer = digit+
∑ = { 0-9, (, ), - }
area = digit3
exchange = digit3
local = digit4
PROGRAM 1
VAR 2
BEGIN 3
END 4
END. 5
INTEGER 6
FOR 7
READ 8
WRITE 9
TO 10
DO 11
; 12
: 13
, 14
:= 15
+ 16
- 17
* 18
DIV 19
( 20
) 21
Id 22
int 23
Syntactic Analysis
Syntax refers to the structure (or grammar) of the language
(layout, statements, blocks etc.)
The parser groups tokens into grammatical phrases
corresponding to the structure of the language
Syntactic errors are things like "missing ;"
Example of a grammar for arithmetic expressions:
<exp> -> <exp> + <term> | <exp> - <term> | <term>
<term> -> <term> * <factor> | <term> / <factor> | <factor>
<factor> -> ( <exp> ) | id | num
The Symbol Table has the required information for semantic
analysis
Code Generation and
Optimization
Possible intermediate code representations
syntax trees
directed acyclic graphs
postfix notation
3 address code
Possible optimizations
remove redundant or unreachable code
propagate constant values
optimize loops
Easy to implement
Requires large memory in order to store intermediate
representation
Produces relatively inefficient code
Example: the first Pascal compilers
2-Pass Structure
Source program:
mov a, R1
add #2, R1
mov R1, b
Language-independent programs
Symbol Table
A data structure with a record for each identifier used in the program
(variables, user-defined type names, functions, formal arguments
etc)
Possible structures:
Array
Linked List
Binary Search Tree
Hash Table
Error Handling
Each analysis phase may produce errors
Error messages should be meaningful
Error messages should indicate the location in the source file
Ideally, the compiler should recover and report as many errors as
possible rather than die the first time it encouters a problem