0% found this document useful (0 votes)
35 views40 pages

Unit 1 (A)

Uploaded by

bahes80516
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views40 pages

Unit 1 (A)

Uploaded by

bahes80516
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Unit – I

Introduction to Compilers
Analp Pathak
Assistant Professor
Computer Science Engineering Department
SRM IST, NCR Campus
2

Compiler Course Outline


(Unit – 1 to Unit 5)
• Ch. 1: Introduction
• Ch. 3: Lexical Analysis and Lex/Flex
• Ch. 4: Syntax Analysis
• Ch. 5: Syntax-Directed Translation
• Ch. 6: Type Checking
• Ch. 7: Run-Time Environments
• Ch. 8: Intermediate Code Generation
• Ch. 9: Code Generation
• Ch.10: Code Optimization
• Reference:
Aho, Ullman, and Sethi, “Compilers: Principles, Techniques, and Tools”
3

Textbook
• Compilers: Principles, Techniques, and Tools, 2/E.
– Alfred V. Aho, Columbia University
– Ravi Sethi, Avaya Labs
– Jeffrey D. Ullman, Stanford University

Commonly Known as
Dragon Book

2nd Edition with


Monica S. Lam,
Stanford University
4

Contents
• Overview and History
• What Do Compilers Do
• Compiler and Interpreters
• Phases of a compiler
• Analysis of source program / Translation
• Tex
• Features of Good Programming Language
• The Syntax and Semantics of Programming Language
• Computer Architecture and Compiler Design
• Compiler Construction Tools
5

Overview and History


• Cause
– Software for early computers was written in assembly language
– The benefits of reusing software on different CPUs started to
become significantly greater than the cost of writing a compiler

• The first real compiler


– FORTRAN compilers of the late 1950s
– 18 person-years to build
6

Overview and History (Contd…)


• Compiler technology
– is more broadly applicable and has been employed in
rather unexpected areas.
• Text-formatting languages, like nroff and troff;
preprocessor packages like eqn, tbl, pic
• Silicon compiler for the creation of VLSI circuits
• Command languages of OS
• Query languages of Database systems
7

What Do Compilers Do
• Compilers may generate three types of code:
– Pure Machine Code
• Machine instruction set without assuming the existence
of any operating system or library.
• Mostly being OS or embedded applications.
– Augmented Machine Code
• Code with OS routines and runtime support routines.
• More often
– Virtual Machine Code
• Virtual instructions, can be run on any architecture with
a virtual machine interpreter or a just-in-time compiler
• Ex. Java
8

What Do Compilers Do (Contd…)

• Another way that compilers differ from one


another is in the format of the target machine
code they generate:
– Assembly or other source format
– Relocatable binary
• Relative address
• A linkage step is required
– Absolute binary
• Absolute address
• Can be executed directly
9

Compilers and Interpreters


• “Compilation”
– Translation of a program written in a source
language into a semantically equivalent
program written in a target language
Input

Source Target
Compiler
Program Program

Error messages Output


10

Compilers

• Source languages: Fortran, Pascal, C, etc.


• Target languages: another PL, machine Lang
• Compilers:
– Single-pass
– Multi-pass
– Load-and-Go
– Debugging
– Optimizing
11

Compilers and Interpreters


(cont’d)
• “Interpretation”
– Performing the operations implied by the
source program

Source
Program
Interpreter Output
Input

Error messages
12

Other Tools that Use the


Analysis-Synthesis Model
• Editors (syntax highlighting)
• Pretty printers (e.g. doxygen)
• Static checkers (e.g. lint and splint)
• Interpreters
• Text formatters (e.g. TeX and LaTeX)
• Silicon compilers (e.g. VHDL)
• Query interpreters/compilers (Databases)
13

Preprocessors, Compilers,
Assemblers, and Linkers
Skeletal Source Program

Preprocessor
Source Program
Try for example:
Compiler gcc -v myprog.c
Target Assembly Program
Assembler
Relocatable Object Code
Linker Libraries and
Relocatable Object Files
Absolute Machine Code
14

The Analysis-Synthesis Model of


Compilation
• There are two parts to compilation:
– Analysis determines the operations implied by the
source program which are recorded in a tree
structure
– Synthesis takes the tree structure and translates the
operations therein into the target program
15

Phases of Compiler
16

The Structure of a Compiler


Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines

Intermediate
Representation

Symbol and Optimizer


Attribute
Tables

(Used by all Phases of The Compiler)

Code
Generator

Target machine code


17

The Structure of a Compiler (Contd…)

Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines

Intermediate
Representation
Scanner
 The scanner begins the analysis of the source program by
Symbol and Optimizer
reading the input, character by character, and grouping
Attribute
characters into individual words and symbols (tokens)
Tables
 RE ( Regular expression )
 (Used
NFA ( Non-deterministic Finite by all )
Automata
 DFA ( Deterministic FinitePhases
Automataof
)
 LEX The Compiler) Code
Generator

Target machine code


18

The Structure of a Compiler (Contd…)

Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines

Intermediate
Parser Representation

 Given a formal syntax specification (typically as a context-


free grammar [CFG] ), the parse reads tokens and groups
Symbol and Optimizer
them into units as specified by the productions of the CFG
Attribute
being used.
Tables
 As syntactic structure is recognized, the parser either calls
corresponding semantic routines directly or builds a syntax
(Used by all
tree.
Phases
 CFG ( Context-Free Grammar ) of
 BNF ( Backus-Naur FormThe
) Compiler) Code
 GAA ( Grammar Analysis Algorithms ) Generator
 LL, LR, SLR, LALR Parsers
 YACC
Target machine code
19

The Structure of a Compiler (Contd…)

Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines

Intermediate
Semantic Routines Representation

 Perform two functions


 Check the static semantics of each construct
Symbol and
 Do the actual translation Optimizer
Attribute
 The heart of a compiler
Tables
 Syntax Directed Translation
 Semantic Processing (Used by all
Techniques
Phases of
 IR (Intermediate Representation)
The Compiler) Code
Generator

Target machine code


20

The Structure of a Compiler (Contd…)

Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines

Intermediate
Optimizer Representation

 The IR code generated by the semantic routines is


analyzed and transformed into functionally equivalent but
Symbol and Optimizer
improved IR code
Attribute
 This phase can be very complex and slow
Tables
 Peephole optimization
 loop optimization, register allocation, code scheduling
(Used by all
Phases
 Register and Temporary of
Management
 Peephole OptimizationThe Compiler) Code
Generator

Target machine code


21

The Structure of a Compiler (Contd...)

Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines

Intermediate
Code Generator Representation

 Interpretive Code Generation


 Generating Code from Tree/Dag
 Grammar-Based Code Generator Optimizer

Code
Generator
Target machine code
22

Symbol-table Management
• To record the identifiers in source program
– Identifier is detected by lexical analysis and then is
stored in symbol table
• To collect the attributes of identifiers
(not by lexical analysis)
– Storage allocation : memory address
– Types
– Scope (where it is valid, local or global)
– Arguments (in case of procedure names)
• Arguments numbers and types
• Call by reference or address
• Return types
23

Symbol-table Management

• Semantic analysis uses type information check


the type consistence of identifiers
• Code generating uses storage allocation
information to generate proper relocation
address code
24

Error Detection and Reporting


• Syntax and semantic analysis handle a large fraction
of errors
• Some of the Errors may occur in compilation phase
– Lexical phase: could not form any token
• e.g. Misspelling or Juxtaposing of characters
– Syntax phase: tokens violate structure rules
• e.g. Unbalanced parenthesis, Missing punctuation
operators or Undeclared variables.
– Semantic phase: no meaning of operations
• Add an array name and a procedure name, Truncation
of results or Unreachable Code.
25

Translation of A Statement
26

Translation of A Statement
27

Translation of A Statement
28

Properties and Advantage of


Intermediate Code
• Properties:
– It should be easy to produce.
– It should be easy to translate into the target machine.
• Advantage (In current scenario, the internet era):
– Consistent and Continuous Programming Model
– Develop once and run anywhere
– Simplified deployment
– Wide platform reach
– Programming language integration
– Simplified code reuse
– Interoperability
29

Intermediate Code Representation


• Two kinds of representations
– Trees
• Parse Trees, Syntax Trees
– Linear representations
• Three-address code
30

Code Optimization
• It attempts to improve the intermediate code, to
produce more efficient code that will result a faster-
running machine code.
• Compiler Optimizations must meet the following
objectives:
– The optimization must be correct, that is, preserve the
meaning of the compiled program,
– The optimization must improve the performance of many
programs,
– The compilation time must be kept reasonable, and
– The engineering effort required must be manageable.
31

Factors influenced Code Generation

• Register Allocation
• Register Scheduling
• Code Selection
• Addressing Modes
• Instruction format
• Power of instructions
• Optimization at the machine code level
• Back patching
32

Analysis in Text Formatters:


• TeX is a typesetting system designed and mostly written by Donald
Knuth, and released in 1978.
• TeX was designed with two main goals:
– To allow anybody to produce high-quality books using a reasonably
minimal amount of effort .
– To provide a system that would give exactly the same results on all
computers, now and in the future.
• TeX is a popular means by which to typeset complex mathematical
formulae.
• It noted as one of the most sophisticated digital typographical systems in
the world.
• TeX is popular in academia, especially in mathematics, computer
science, economics, engineering, physics, statistics, and quantitative
psychology.
• Tex many other typesetting tasks, LaTeX, ConTeXt and other packages.
33

Analysis in Text Formatters (Contd…)


• \hbox {<list of boxes>}

• \hbox {\vbox{! 1} \vbox{@ 2}}


34

The Grouping of Phases


• Compiler front and back ends:
– Analysis (machine independent front end)
– Synthesis (machine dependent back end)
• Passes
– A collection of phases may be repeated only once
(single pass) or multiple times (multi pass)
– Single pass: usually requires everything to be defined
before being used in source program
– Multi pass: compiler may have to keep entire program
representation in memory
35

Features of a Good Programming Language


• Easy to understand, and Expressive Power
• Interoperability and Portability
• Good turnaround time
• Automatic error recovery and Good Error reporting
• Efficient memory usage and Garbage Collection
• Provision of good run-time environment
– Support for virtual machine
– Support for concurrent operation
– Support for unblock operations
• Ability to interface with foreign functions
• Ability to model real-world problems
• Ability to expose the functions for usage in other languages
36

The Syntax and Semantics of Programming Language


• A programming language must include the
specification of syntax (structure) and semantics
(meaning).
• Syntax typically means the context-free syntax
because of the almost universal use of context-free-
grammar (CFGs)
• Ex.
– a = b + c is syntactically legal
– b + c = a is illegal
37

The Syntax and Semantics of Programming


Language (Contd…)
• The semantics of a programming language are
commonly divided into two classes:
– Static semantics
• Semantics rules that can be checked at compiled time.
• Ex. The type and number of a function’s arguments
– Runtime semantics
• Semantics rules that can be checked only at run time
38

Computer Architecture and Compiler Design


• Compilers should exploit the hardware-specific feature and
computing capability to optimize code.
• The problems encountered in modern computing platforms:
– Instruction sets for some popular architectures are highly nonuniform.
– High-level programming language operations are not always easy to
support.
• Ex. exceptions, threads, dynamic heap access …
– Exploiting architectural features such as cache, distributed processors
and memory
– Effective use of a large number of processors
39

Compiler Design Considerations


• Debugging Compilers
– Designed to aid in the development and debugging of
programs.
• Optimizing Compilers
– Designed to produce efficient target code
• Retargetable Compilers
– A compiler whose target architecture can be changed
without its machine-independent components having to be
rewritten.
40

Compiler-Construction Tools
• Software development tools are available to
implement one or more compiler phases
– Scanner generators
– Parser generators
– Syntax-directed translation engines
– Automatic code generators
– Data-flow engines
– Compiler-construction toolkits

You might also like