CH 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

DDU IT

Department of Computer Science


Compiler Design (COMP 464 )
Chapter 1: Introduction
Compiler and its various phases
What is a compiler?
A program that reads a program written in one
language (the source language) and translates it
into an equivalent program in another language
(the target language).
Target program
Source program Assembly or
High level Compiler machine
language language
Error messages
Target program
Input .exe Output
Why we design compiler?
Compilers provide an essential interface
between applications and architectures
To build a large, ambitious software system.
To learn how to build programming
languages.
To learn how programming languages work.
To learn tradeoffs in language design.
For new platforms
For new languages
The Structure of a Compiler
• There are two major parts of a compiler:
Analysis and Synthesis
Compiler

Analysis/ Front end Synthesis/Back end


Lexical Analyzer, Intermediate Code Generator
Syntax Analyzer and Code Optimizer
Semantic Analyzer Code Generator
Analysis and Synthesis
In analysis phase, an intermediate representation
is created from the given source program.
Lexical Analyzer, Syntax Analyzer and Semantic
Analyzer are the parts of this phase.
In synthesis phase, the equivalent target program
is created from this intermediate representation.
Intermediate Code Generator, Code Generator,
and Code Optimizer are the parts of this phase.
Analysis and Synthesis cont….
 Analysis: The analysis part breaks up the source program
into consistent pieces and creates an intermediate
representation of the source program.
 During analysis, the operations implied by the source
program are determined and recorded in a hierarchical
structure called a tree.
 Often a special kind of tree called a syntax tree is used, in
which each node represents an operation and the children of
a node represent the arguments of the operation.

 Synthesis: The synthesis part constructs the desired target


program from the intermediate representation of the two
parts; synthesis requires the most specialized techniques.
Phases of A Source
Compiler
Program
Lexical Analyzer
Syntax Analyzer
Semantic Analyzer

Symbol table Intermediate Code Error handlers


Generator

Code Optimizer

Code Generator
Target Program

Each phase transforms the source program from


one representation into another representation.
They communicate with error handlers.
They communicate with the symbol table.
Lexical Analyzer
• Lexical Analyzer reads the source program character by character
and returns the tokens of the source program.
• A token describes a pattern of characters having same meaning in
the source program. (such as identifiers, operators, keywords,
numbers, delimiters and so on)
Ex:
newval:= oldval+ 12 => tokens: newval identifier
:= assignment operator
oldval identifier
+ add operator
12 number

• Puts information about identifiers into the symbol table.


• Regular expressions are used to describe tokens (lexical constructs).
• A (Deterministic) Finite State Automaton can be used in the
implementation of a lexical analyzer.
Syntax Analyzer
• A Syntax Analyzer creates the syntactic structure
(generally a parse tree) of the given program.
• A syntax analyzer is also called as a parser.
• A parse tree describes a syntactic structure.
assgstmt

identifier := expression •In a parse tree, all terminals are


at leaves.
Newvl expression + expression
•All inner nodes are non-
terminals in a context free
identifier number
grammar.

oldval 12
Syntax Analyzer versus Lexical
Analyzer
• Which constructs of a program should be recognized by
the lexical analyzer, and which ones by the syntax
analyzer?
• Both of them do similar things; But the lexical analyzer
deals with simple non-recursive constructs of the
language.
• The syntax analyzer deals with recursive constructs of
the language.
• The lexical analyzer simplifies the job of the syntax
analyzer.
• The lexical analyzer recognizes the smallest meaningful
units (tokens) in a source program.
• The syntax analyzer works on the smallest meaningful
units (tokens) in a source program to recognize
meaningful structures in our programming language.
Semantic Analyzer
• A semantic analyzer checks the source program for
semantic errors and collects the type information for the
code generation.
• Type-checking is an important part of semantic
analyzer.
• Normally semantic information cannot be represented
by a context-free language used in syntax analyzers.
• Context-free grammars used in the syntax analysis are
integrated with attributes (semantic rules)
• the result is a syntax-directed translation,
• Attribute grammars
• Ex: newval:= oldval+ 12
• The type of the identifier newval must match with type
of the expression (oldval+12)
Intermediate Code Generation
• A compiler may produce an explicit intermediate codes
representing the source program.
• These intermediate codes are generally machine
(architecture independent). But the level of intermediate
codes is close to the level of machine codes.
• Ex: newval:= oldval* fact + 1

id1 := id2 * id3 + 1

MULT id2,id3,temp1 Intermediates Codes (Quadraples)


ADD temp1,#1,temp2
MOV temp2, ,id1
Code Optimizer (for Intermediate Code Generator)

• The code optimizer optimizes the code


produced by the intermediate code generator in
the terms of time and space.
• Ex:
MULT id2,id3,temp1
ADD temp1,#1,id1
Code Generator
• Produces the target language in a specific architecture.
• The target program is normally is a relocatable object file
containing the machine codes.
• Ex:
• ( assume that we have an architecture with instructions
whose at least one of its operands is a machine register)
MOVE id2,R1
MULT id3,R1
ADD #1,R1
MOVE R1,id1
The Structure of a Compiler More Example

Code Generator
[Intermediate Code Generator]

Non-optimized Intermediate Code


Scanner
Tokens [Lexical Analyzer]

Code Optimizer
Parser
[Syntax Analyzer] Optimized Intermediate Code

Parse tree

Code Optimizer
Semantic Process
[Semantic analyzer] Target machine code

Abstract Syntax Tree w/ Attributes


Compiler Construction Tools
• Programs to be discussed:
•Lex/Flex– Programming utility that generates a
lexical analyzer
•Yacc/Bison– Parser generator
•gcc- The GNU Compiler CollectionC compiler
General Compiler Infra-structure
Lex Programming Utility
• General Information:
•Input is stored in a file with *.l extension
•File consists of three main sections
•lex generates C function stored in lex.yy.c
Using lex:
1)Specify words to be used as tokens (Extension of
regular expressions)
2)Run the lex utility on the source file to generate
yylex( ), a C function
3)Declares global variables char* yytext and int
yyleng
Lex Programming Utility
• Three sections of a lexinput file:
/* C declarations and #includes lex definitions */
%{
#include “header.c”
int i;
}%

%%
/* lex patterns and actions */
{INT} {sscanf (yytext, “%d”, &i);
printf (“INTEGER\n”);}
%%
/* C functions called by the above actions */
{ yylex(): }
Yacc Parser Generator
General Information:
•Input is specification of a language
•Output is a compiler for that language
•yacc generates C function stored in y.tab.c
•Public domain version avalable Bison
using yacc:
1)Generates a C function called yyparse()
2)yyparse() may include calls to yylex()
3)Compile this function to obtain the compiler
Yacc Parser Generator

•Input source file –similar to lex input file


•Declarations, Rules, Support routines
•Four parts of output atom:(Operation, Left
Operand, Right Operand, Result)
Lex & Yacc
Gcc Compiler
General Information:
• Gcc is the GNU Project C compiler
• A command-line program
• Gcc takes C source files as input
• Outputs an executable: a.out
• You can specify a different output filename

To compile simply type: gcc –o hello hello.c –g -Wall


• „-o‟ option tells the compiler to name the executable
„HelloProg‟
• „-g‟ option adds symbolic information to Hello for
debugging
• „–Wall‟ tells it to print out all warnings (very useful!!!)
• Can also give „-O6‟ to turn on full optimization

You might also like