0% found this document useful (0 votes)
10 views

CH 1 Compiler Introduction

Compiler Construction Chspter 1

Uploaded by

abid khan
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

CH 1 Compiler Introduction

Compiler Construction Chspter 1

Uploaded by

abid khan
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 38

CHAPTER 1

INTRODUCTION

1
OUTLINE
• 1.1 OVERVIEW AND HISTORY
• 1.2 WHAT DO COMPILERS DO?
• 1.3 THE STRUCTURE OF A COMPILER
• 1.4 THE SYNTAX AND SEMANTICS OF PROGRAMMING
LANGUAGES
• 1.5 COMPILER DESIGN AND PROGRAMMING
LANGUAGE DESIGN
• 1.7 COMPUTER ARCHITECTURE AND COMPILER
DESIGN
• 1.8 COMPILER DESIGN CONSIDERATIONS
2
OVERVIEW AND HISTORY (1)
• CAUSE
• SOFTWARE FOR EARLY COMPUTERS WAS WRITTEN IN
ASSEMBLY LANGUAGE
• THE BENEFITS OF REUSING SOFTWARE ON DIFFERENT
CPUS STARTED TO BECOME SIGNIFICANTLY GREATER
THAN THE COST OF WRITING A COMPILER

• THE FIRST REAL COMPILER


• FORTRAN COMPILERS OF THE LATE 1950S
• 18 PERSON-YEARS TO BUILD

3
OVERVIEW AND HISTORY (2)

• COMPILER TECHNOLOGY
• IS MORE BROADLY APPLICABLE AND HAS BEEN
EMPLOYED IN RATHER UNEXPECTED AREAS
• TEXT-FORMATTING LANGUAGES,
LIKE NROFF AND TROFF; PREPROCESSOR PACKAGES
LIKE EQN, TBL, PIC
• SILICON COMPILER FOR THE CREATION OF VLSI
CIRCUITS
• COMMAND LANGUAGES OF OS
• QUERY LANGUAGES OF DATABASE SYSTEMS

4
COMPILERS AND INTERPRETERS
• “COMPILATION”
• TRANSLATION OF A PROGRAM WRITTEN IN A SOURCE
LANGUAGE INTO A SEMANTICALLY EQUIVALENT PROGRAM
WRITTEN IN A TARGET LANGUAGE

Input

Source Target
Compiler
Program Program

Error messages Output


COMPILERS AND INTERPRETERS
(CONT’D)

“INTERPRETATION”
• PERFORMING THE OPERATIONS IMPLIED BY THE SOURCE
PROGRAM

Source
Program
Interpreter Output
Input

Error messages
COMPILER
• READ AND ANALYZE ENTIRE PROGRAM
• TRANSLATE TO SEMANTICALLY EQUIVALENT
PROGRAM IN ANOTHER LANGUAGE
• PRESUMABLY EASIER TO EXECUTE OR MORE
EFFICIENT
• SHOULD “IMPROVE” THE PROGRAM IN SOME
FASHION
• OFFLINE PROCESS
• TRADEOFF: COMPILE TIME OVERHEAD
(PREPROCESSING STEP) VS EXECUTION
PERFORMANCE 7
INTERPRETERS & COMPILERS
• INTERPRETER
• A PROGRAM THAT READS A SOURCE PROGRAM AND PRODUCES
THE RESULTS OF EXECUTING THAT PROGRAM

• COMPILER
• A PROGRAM THAT TRANSLATES A PROGRAM FROM ONE
LANGUAGE (THE SOURCE) TO ANOTHER (THE TARGET)

8
WHAT DO COMPILERS DO (1)
• A COMPILER ACTS AS A TRANSLATOR,
TRANSFORMING HUMAN-ORIENTED PROGRAMMING
LANGUAGES
INTO COMPUTER-ORIENTED MACHINE LANGUAGES.
• IGNORE MACHINE-DEPENDENT DETAILS FOR
PROGRAMMER

Programming Machine
Language Compiler Language
(Source) (Target)
9
WHAT DO COMPILERS DO (2)
• COMPILERS MAY GENERATE THREE TYPES OF CODE:
• PURE MACHINE CODE
• MACHINE INSTRUCTION SET WITHOUT ASSUMING THE EXISTENCE
OF ANY OPERATING SYSTEM OR LIBRARY.
• MOSTLY BEING OS OR EMBEDDED APPLICATIONS.

• AUGMENTED MACHINE CODE


• CODE WITH OS ROUTINES AND RUNTIME SUPPORT ROUTINES.
• MORE OFTEN

• VIRTUAL MACHINE CODE


• VIRTUAL INSTRUCTIONS, CAN BE RUN ON ANY ARCHITECTURE
WITH A VIRTUAL MACHINE INTERPRETER OR A JUST-IN-TIME
COMPILER
• EX. JAVA
10
WHAT DO COMPILERS DO (3)
• ANOTHER WAY THAT COMPILERS
DIFFER FROM ONE ANOTHER IS IN THE FORMAT OF THE
TARGET MACHINE CODE THEY GENERATE:

• ASSEMBLY OR OTHER SOURCE FORMAT


• RELOCATABLE BINARY
• RELATIVE ADDRESS
• A LINKAGE STEP IS REQUIRED

• ABSOLUTE BINARY
• ABSOLUTE ADDRESS
• CAN BE EXECUTED DIRECTLY

11
THE STRUCTURE OF A COMPILER
(1)
• ANY COMPILER MUST PERFORM TWO MAJOR
TASKS

Compiler

Analysis Synthesis

• ANALYSIS OF THE SOURCE PROGRAM


• SYNTHESIS OF A MACHINE-LANGUAGE PROGRAM
12
STANDARD COMPILER
STRUCTURE
Source code
(character stream)
Lexical analysis
Token stream
Parsing Front end
(machine-independent)
Abstract syntax tree
Intermediate Code Generation
Intermediate code
Optimization
Intermediate code Back end
(machine-dependent)
Code generation
Assembly code 13
THE ANALYSIS-SYNTHESIS
MODEL OF COMPILATION
THERE ARE TWO PARTS TO COMPILATION:

• ANALYSIS DETERMINES THE OPERATIONS IMPLIED BY THE


SOURCE PROGRAM WHICH ARE RECORDED IN A TREE
STRUCTURE

• SYNTHESIS TAKES THE TREE STRUCTURE AND TRANSLATES


THE OPERATIONS THEREIN INTO THE TARGET PROGRAM
OTHER TOOLS THAT USE THE
ANALYSIS-SYNTHESIS MODEL

• EDITORS (SYNTAX HIGHLIGHTING)


• PRETTY PRINTERS (E.G. DOXYGEN)
• STATIC CHECKERS (E.G. LINT AND SPLINT)
• INTERPRETERS
• TEXT FORMATTERS (E.G. TEX AND LATEX)
• SILICON COMPILERS (E.G. VHDL)
• QUERY INTERPRETERS/COMPILERS (DATABASES)
PREPROCESSORS, COMPILERS,
ASSEMBLERS, AND LINKERS

Skeletal Source Program

Preprocessor
Source Program
Compiler
Target Assembly Program
Assembler
Relocatable Object Code
Linker Libraries and
Relocatable Object Files
Absolute Machine Code
THE STRUCTURE OF A COMPILER
(2)
Source
Program Tokens SyntacticSemantic
Scanner Parser
(Character StructureRoutines
Stream)
Intermediate
Representation

Symbol and Optimizer


Attribute
Tables

(Used by all Phases of The


Compiler)
Code
Generator
17

Target machine code


THE STRUCTURE OF A COMPILER
(3)
Source
Program Tokens SyntacticSemantic
Scanner Parser
(Character StructureRoutines
Stream)
Intermediate
Scanner Representation
 The scanner begins the analysis of the source
program by reading the input, character by
Symbol and Optimizer
character, and grouping characters into individual
Attribute
words and symbols (tokens)
Tables
 RE ( Regular expression )
 NFA (Used
( Non-deterministic by Automata
Finite all )
 DFA ( DeterministicPhases of
Finite Automata )
 LEX The Compiler) Code
Generator
18

Target machine code


THE STRUCTURE OF A COMPILER
(4)
Source
Program Tokens SyntacticSemantic
Scanner Parser
(Character StructureRoutines
Stream)
Intermediate
Parser Representation
 Given a formal syntax specification (typically as a
context-free grammar [CFG] ), the parse reads
Symbol and Optimizer
tokens and groups them into units as specified by
Attribute
the productions of the CFG being used.
 As syntactic structureTables
is recognized, the parser
either calls corresponding semantic routines
(Used by all
directly or builds a syntax tree.
 CFG ( Context-Free Phases
Grammarof )
 BNF ( Backus-Naur The
Form Compiler)
) Code
 GAA ( Grammar Analysis Algorithms ) Generator
19
 LL, LR, SLR, LALR Parsers
 YACC
Target machine code
THE STRUCTURE OF A COMPILER
(5)
Source
Program Tokens SyntacticSemantic
Scanner Parser
(Character StructureRoutines
Stream)
Intermediate
Semantic Routines Representation
 Perform two functions
 Check the static semantics of each construct
 Do the actualSymbol and
translation Optimizer
 The heart of a compiler
Attribute
Tables
 Syntax Directed Translation
 Semantic Processing Techniques
 (Used by all
IR (Intermediate Representation)
Phases of
The Compiler) Code
Generator
20

Target machine code


THE STRUCTURE OF A COMPILER
(6)
Source
Program Tokens SyntacticSemantic
Scanner Parser
(Character StructureRoutines
Stream)
Intermediate
Optimizer Representation
 The IR code generated by the semantic routines is
analyzed and transformed into functionally
Symbol and Optimizer
equivalent but improved IR code
 This phase can beAttribute
very complex and slow
Tables
 Peephole optimization
 loop optimization, register allocation, code
(Used by all
scheduling
Phases of
The Compiler)
 Register and Temporary Management Code
 Peephole Optimization Generator
21

Target machine code


THE STRUCTURE OF A COMPILER
(7)
Source
Program Tokens SyntacticSemantic
Scanner Parser
(Character StructureRoutines
Stream)
Intermediate
Code Generator Representation
 Interpretive Code Generation
 Generating Code from Tree/Dag
 Grammar-Based Code Generator
Optimizer

Code
Generator
22

Target machine code


THE STRUCTURE OF A
COMPILER (8)
Code Generator
[Intermediate Code Generator]

Non-optimized
Scanner
[Lexical Analyzer] Intermediate Code

Tokens

Code Optimizer
Parser
[Syntax Analyzer]
Optimized Intermediate Cod
Parse
tree

Code Optimizer
Semantic Process
[Semantic analyzer] Target machine code

Abstract Syntax Tree w/


Attributes
23
THE PHASES OF A COMPILER
Phase Output Sample
Programmer (source code producer) Source string A=B+C;
Scanner (performs lexical analysis) Token string ‘A’, ‘=’, ‘B’, ‘+’, ‘C’, ‘;’
And symbol table with names
Parser (performs syntax analysis Parse tree or abstract syntax tree ;
|
based on the grammar of the =
programming language) / \
A +
/ \
B C

Semantic analyzer (type checking, Annotated parse tree or abstract


etc) syntax tree
Intermediate code generator Three-address code, quads, or int2fp B t1
RTL + t1 C t2
:= t2 A
Optimizer Three-address code, quads, or int2fp B t1
RTL + t1 #2.3 A
Code generator Assembly code MOVF #2.3,r1
ADDF2 r1,r2
MOVF r2,A
Peephole optimizer Assembly code ADDF2 #2.3,r2
MOVF r2,A
THE GROUPING OF PHASES
• COMPILER FRONT AND BACK ENDS:
• FRONT END: ANALYSIS (MACHINE INDEPENDENT)
• BACK END: SYNTHESIS (MACHINE DEPENDENT)

• COMPILER PASSES:
• A COLLECTION OF PHASES IS DONE ONLY ONCE
(SINGLE PASS) OR MULTIPLE TIMES (MULTI PASS)
• SINGLE PASS: USUALLY REQUIRES EVERYTHING TO BE
DEFINED BEFORE BEING USED IN SOURCE PROGRAM
• MULTI PASS: COMPILER MAY HAVE TO KEEP ENTIRE
PROGRAM REPRESENTATION IN MEMORY
FRONT-END source
Scanner
tokens
Parser
IR

• SPLIT INTO TWO PARTS


• SCANNER: RESPONSIBLE FOR CONVERTING
CHARACTER STREAM TO TOKEN STREAM
• ALSO STRIPS OUT WHITE SPACE, COMMENTS
• PARSER: READS TOKEN STREAM; GENERATES IR

• BOTH OF THESE CAN BE GENERATED


AUTOMATICALLY
• SOURCE LANGUAGE SPECIFIED BY A FORMAL
GRAMMAR
• TOOLS READ THE GRAMMAR AND GENERATE 26

SCANNER & PARSER (EITHER TABLE-DRIVEN OR


BACK END

• RESPONSIBILITIES
• TRANSLATE IR INTO TARGET MACHINE CODE
• SHOULD PRODUCE FAST, COMPACT CODE
• SHOULD USE MACHINE RESOURCES EFFECTIVELY
• REGISTERS
• INSTRUCTIONS
• MEMORY HIERARCHY

27
BACK END STRUCTURE
• TYPICALLY SPLIT INTO TWO MAJOR PARTS WITH SUB PHASES

• “OPTIMIZATION” – CODE IMPROVEMENTS


• MAY WELL TRANSLATE PARSER IR INTO ANOTHER IR

• CODE GENERATION
• INSTRUCTION SELECTION & SCHEDULING
• REGISTER ALLOCATION

28
THE STRUCTURE OF A COMPILER
(9)

COMPILER WRITING TOOLS


• COMPILER GENERATORS OR
COMPILER-COMPILERS
• E.G. SCANNER AND PARSER
GENERATORS
• EXAMPLES : YACC, LEX

29
THE SYNTAX AND SEMANTICS OF
PROGRAMMING LANGUAGE (1)
• A PROGRAMMING LANGUAGE MUST INCLUDE THE
SPECIFICATION OF SYNTAX (STRUCTURE) AND SEMANTICS
(MEANING).
• SYNTAX TYPICALLY MEANS THE CONTEXT-FREE SYNTAX
BECAUSE OF THE ALMOST UNIVERSAL USE OF CONTEXT-
FREE-GRAMMAR (CFGS)
• EX.
• A = B + C IS SYNTACTICALLY LEGAL
• B + C = A IS ILLEGAL

30
THE SYNTAX AND SEMANTICS OF
PROGRAMMING LANGUAGE (2)
• THE SEMANTICS OF A PROGRAMMING LANGUAGE ARE
COMMONLY DIVIDED INTO TWO CLASSES:
• STATIC SEMANTICS
• SEMANTICS RULES THAT CAN BE CHECKED AT COMPILED TIME.
• EX. THE TYPE AND NUMBER OF A FUNCTION’S ARGUMENTS
• RUNTIME SEMANTICS
• SEMANTICS RULES THAT CAN BE CHECKED ONLY AT RUN TIME

31
COMPILER DESIGN AND
PROGRAMMING LANGUAGE
DESIGN (1)

• AN INTERESTING ASPECT IS HOW PROGRAMMING LANGUAGE


DESIGN AND COMPILER DESIGN INFLUENCE ONE ANOTHER.

• PROGRAMMING LANGUAGES THAT ARE EASY TO COMPILE


HAVE MANY ADVANTAGES

32
COMPILER DESIGN AND
PROGRAMMING LANGUAGE DESIGN(2)

• LANGUAGES SUCH AS SNOBOL AND APL ARE


USUALLY CONSIDERED NONCOMPILABLE

• WHAT ATTRIBUTES MUST BE FOUND IN A


PROGRAMMING LANGUAGE TO ALLOW
COMPILATION?
• CAN THE SCOPE AND BINDING OF EACH IDENTIFIER
REFERENCE BE DETERMINED BEFORE EXECUTION
BEGINS?
• CAN THE TYPE OF OBJECT BE DETERMINED BEFORE
EXECUTION BEGINS?
• CAN EXISTING PROGRAM TEXT BE CHANGED OR 33
ADDED TO DURING EXECUTION?
COMPUTER ARCHITECTURE AND
COMPILER DESIGN
• COMPILERS SHOULD EXPLOIT THE HARDWARE-SPECIFIC
FEATURE AND COMPUTING CAPABILITY TO OPTIMIZE CODE.
• THE PROBLEMS ENCOUNTERED IN MODERN COMPUTING
PLATFORMS:
• INSTRUCTION SETS FOR SOME POPULAR ARCHITECTURES ARE
HIGHLY NONUNIFORM.
• HIGH-LEVEL PROGRAMMING LANGUAGE OPERATIONS ARE NOT
ALWAYS EASY TO SUPPORT.
• EX. EXCEPTIONS, THREADS, DYNAMIC HEAP ACCESS …
• EXPLOITING ARCHITECTURAL FEATURES SUCH AS CACHE,
DISTRIBUTED PROCESSORS AND MEMORY
• EFFECTIVE USE OF A LARGE NUMBER OF PROCESSORS
34
COMPILER DESIGN CONSIDERATIONS
• DEBUGGING COMPILERS
• DESIGNED TO AID IN THE DEVELOPMENT AND DEBUGGING OF
PROGRAMS.

• OPTIMIZING COMPILERS
• DESIGNED TO PRODUCE EFFICIENT TARGET CODE

• RETARGETABLE COMPILERS
• A COMPILER WHOSE TARGET ARCHITECTURE CAN BE CHANGED
WITHOUT ITS MACHINE-INDEPENDENT COMPONENTS HAVING
TO BE REWRITTEN.

35
COMPILER-CONSTRUCTION TOOLS
• SOFTWARE DEVELOPMENT TOOLS ARE AVAILABLE TO
IMPLEMENT ONE OR MORE COMPILER PHASES
• SCANNER GENERATORS
• PARSER GENERATORS
• SYNTAX-DIRECTED TRANSLATION ENGINES
• AUTOMATIC CODE GENERATORS
• DATA-FLOW ENGINES
COMPILATION IN A NUTSHELL 1

Source code if (b == 0) a = b;
(character stream)
Lexical analysis

Token stream if ( b == 0 ) a = b ;

Parsing
if
== = ;
Abstract syntax tree
b 0 a b
(AST)
if Semantic Analysis
boolean int
== = ;
Decorated AST
int b int 0 int a int b 37
lvalue
COMPILATION IN A NUTSHELL 2
if
boolean int
== = ;
int b int 0 int a int b Intermediate Code Generation
lvalue

CJUMP ==
MEM CONST MOVE NOP
+ 0 MEM MEM
Optimization
fp 8 + +
fp 4 fp 8
CJUMP == Code generation
C CONS MOV NO
X CMP CX, 0
T E P
0 D C CMOVZ DX,CX
38

X X

You might also like