CH 1 Compiler Introduction
CH 1 Compiler Introduction
INTRODUCTION
1
OUTLINE
• 1.1 OVERVIEW AND HISTORY
• 1.2 WHAT DO COMPILERS DO?
• 1.3 THE STRUCTURE OF A COMPILER
• 1.4 THE SYNTAX AND SEMANTICS OF PROGRAMMING
LANGUAGES
• 1.5 COMPILER DESIGN AND PROGRAMMING
LANGUAGE DESIGN
• 1.7 COMPUTER ARCHITECTURE AND COMPILER
DESIGN
• 1.8 COMPILER DESIGN CONSIDERATIONS
2
OVERVIEW AND HISTORY (1)
• CAUSE
• SOFTWARE FOR EARLY COMPUTERS WAS WRITTEN IN
ASSEMBLY LANGUAGE
• THE BENEFITS OF REUSING SOFTWARE ON DIFFERENT
CPUS STARTED TO BECOME SIGNIFICANTLY GREATER
THAN THE COST OF WRITING A COMPILER
3
OVERVIEW AND HISTORY (2)
• COMPILER TECHNOLOGY
• IS MORE BROADLY APPLICABLE AND HAS BEEN
EMPLOYED IN RATHER UNEXPECTED AREAS
• TEXT-FORMATTING LANGUAGES,
LIKE NROFF AND TROFF; PREPROCESSOR PACKAGES
LIKE EQN, TBL, PIC
• SILICON COMPILER FOR THE CREATION OF VLSI
CIRCUITS
• COMMAND LANGUAGES OF OS
• QUERY LANGUAGES OF DATABASE SYSTEMS
4
COMPILERS AND INTERPRETERS
• “COMPILATION”
• TRANSLATION OF A PROGRAM WRITTEN IN A SOURCE
LANGUAGE INTO A SEMANTICALLY EQUIVALENT PROGRAM
WRITTEN IN A TARGET LANGUAGE
Input
Source Target
Compiler
Program Program
“INTERPRETATION”
• PERFORMING THE OPERATIONS IMPLIED BY THE SOURCE
PROGRAM
Source
Program
Interpreter Output
Input
Error messages
COMPILER
• READ AND ANALYZE ENTIRE PROGRAM
• TRANSLATE TO SEMANTICALLY EQUIVALENT
PROGRAM IN ANOTHER LANGUAGE
• PRESUMABLY EASIER TO EXECUTE OR MORE
EFFICIENT
• SHOULD “IMPROVE” THE PROGRAM IN SOME
FASHION
• OFFLINE PROCESS
• TRADEOFF: COMPILE TIME OVERHEAD
(PREPROCESSING STEP) VS EXECUTION
PERFORMANCE 7
INTERPRETERS & COMPILERS
• INTERPRETER
• A PROGRAM THAT READS A SOURCE PROGRAM AND PRODUCES
THE RESULTS OF EXECUTING THAT PROGRAM
• COMPILER
• A PROGRAM THAT TRANSLATES A PROGRAM FROM ONE
LANGUAGE (THE SOURCE) TO ANOTHER (THE TARGET)
8
WHAT DO COMPILERS DO (1)
• A COMPILER ACTS AS A TRANSLATOR,
TRANSFORMING HUMAN-ORIENTED PROGRAMMING
LANGUAGES
INTO COMPUTER-ORIENTED MACHINE LANGUAGES.
• IGNORE MACHINE-DEPENDENT DETAILS FOR
PROGRAMMER
Programming Machine
Language Compiler Language
(Source) (Target)
9
WHAT DO COMPILERS DO (2)
• COMPILERS MAY GENERATE THREE TYPES OF CODE:
• PURE MACHINE CODE
• MACHINE INSTRUCTION SET WITHOUT ASSUMING THE EXISTENCE
OF ANY OPERATING SYSTEM OR LIBRARY.
• MOSTLY BEING OS OR EMBEDDED APPLICATIONS.
• ABSOLUTE BINARY
• ABSOLUTE ADDRESS
• CAN BE EXECUTED DIRECTLY
11
THE STRUCTURE OF A COMPILER
(1)
• ANY COMPILER MUST PERFORM TWO MAJOR
TASKS
Compiler
Analysis Synthesis
Preprocessor
Source Program
Compiler
Target Assembly Program
Assembler
Relocatable Object Code
Linker Libraries and
Relocatable Object Files
Absolute Machine Code
THE STRUCTURE OF A COMPILER
(2)
Source
Program Tokens SyntacticSemantic
Scanner Parser
(Character StructureRoutines
Stream)
Intermediate
Representation
Code
Generator
22
Non-optimized
Scanner
[Lexical Analyzer] Intermediate Code
Tokens
Code Optimizer
Parser
[Syntax Analyzer]
Optimized Intermediate Cod
Parse
tree
Code Optimizer
Semantic Process
[Semantic analyzer] Target machine code
• COMPILER PASSES:
• A COLLECTION OF PHASES IS DONE ONLY ONCE
(SINGLE PASS) OR MULTIPLE TIMES (MULTI PASS)
• SINGLE PASS: USUALLY REQUIRES EVERYTHING TO BE
DEFINED BEFORE BEING USED IN SOURCE PROGRAM
• MULTI PASS: COMPILER MAY HAVE TO KEEP ENTIRE
PROGRAM REPRESENTATION IN MEMORY
FRONT-END source
Scanner
tokens
Parser
IR
• RESPONSIBILITIES
• TRANSLATE IR INTO TARGET MACHINE CODE
• SHOULD PRODUCE FAST, COMPACT CODE
• SHOULD USE MACHINE RESOURCES EFFECTIVELY
• REGISTERS
• INSTRUCTIONS
• MEMORY HIERARCHY
27
BACK END STRUCTURE
• TYPICALLY SPLIT INTO TWO MAJOR PARTS WITH SUB PHASES
• CODE GENERATION
• INSTRUCTION SELECTION & SCHEDULING
• REGISTER ALLOCATION
28
THE STRUCTURE OF A COMPILER
(9)
29
THE SYNTAX AND SEMANTICS OF
PROGRAMMING LANGUAGE (1)
• A PROGRAMMING LANGUAGE MUST INCLUDE THE
SPECIFICATION OF SYNTAX (STRUCTURE) AND SEMANTICS
(MEANING).
• SYNTAX TYPICALLY MEANS THE CONTEXT-FREE SYNTAX
BECAUSE OF THE ALMOST UNIVERSAL USE OF CONTEXT-
FREE-GRAMMAR (CFGS)
• EX.
• A = B + C IS SYNTACTICALLY LEGAL
• B + C = A IS ILLEGAL
30
THE SYNTAX AND SEMANTICS OF
PROGRAMMING LANGUAGE (2)
• THE SEMANTICS OF A PROGRAMMING LANGUAGE ARE
COMMONLY DIVIDED INTO TWO CLASSES:
• STATIC SEMANTICS
• SEMANTICS RULES THAT CAN BE CHECKED AT COMPILED TIME.
• EX. THE TYPE AND NUMBER OF A FUNCTION’S ARGUMENTS
• RUNTIME SEMANTICS
• SEMANTICS RULES THAT CAN BE CHECKED ONLY AT RUN TIME
31
COMPILER DESIGN AND
PROGRAMMING LANGUAGE
DESIGN (1)
32
COMPILER DESIGN AND
PROGRAMMING LANGUAGE DESIGN(2)
• OPTIMIZING COMPILERS
• DESIGNED TO PRODUCE EFFICIENT TARGET CODE
• RETARGETABLE COMPILERS
• A COMPILER WHOSE TARGET ARCHITECTURE CAN BE CHANGED
WITHOUT ITS MACHINE-INDEPENDENT COMPONENTS HAVING
TO BE REWRITTEN.
35
COMPILER-CONSTRUCTION TOOLS
• SOFTWARE DEVELOPMENT TOOLS ARE AVAILABLE TO
IMPLEMENT ONE OR MORE COMPILER PHASES
• SCANNER GENERATORS
• PARSER GENERATORS
• SYNTAX-DIRECTED TRANSLATION ENGINES
• AUTOMATIC CODE GENERATORS
• DATA-FLOW ENGINES
COMPILATION IN A NUTSHELL 1
Source code if (b == 0) a = b;
(character stream)
Lexical analysis
Token stream if ( b == 0 ) a = b ;
Parsing
if
== = ;
Abstract syntax tree
b 0 a b
(AST)
if Semantic Analysis
boolean int
== = ;
Decorated AST
int b int 0 int a int b 37
lvalue
COMPILATION IN A NUTSHELL 2
if
boolean int
== = ;
int b int 0 int a int b Intermediate Code Generation
lvalue
CJUMP ==
MEM CONST MOVE NOP
+ 0 MEM MEM
Optimization
fp 8 + +
fp 4 fp 8
CJUMP == Code generation
C CONS MOV NO
X CMP CX, 0
T E P
0 D C CMOVZ DX,CX
38
X X