0% found this document useful (0 votes)
462 views

PCD Notes - Unit - 1

The document discusses the phases of a compiler and related concepts. It describes the six main phases of a compiler as lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization, and code generation. It provides examples to illustrate each phase and explains key concepts like symbol table management and error handling. The overall goal of a compiler is to translate a program written in a source language into an equivalent program in a target language.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
462 views

PCD Notes - Unit - 1

The document discusses the phases of a compiler and related concepts. It describes the six main phases of a compiler as lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization, and code generation. It provides examples to illustrate each phase and explains key concepts like symbol table management and error handling. The overall goal of a compiler is to translate a program written in a source language into an equivalent program in a target language.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Anna University B.

E -VI Sem CSE CS2352 Principles of Compiler Design

D. Jagadeesan, B.E., M.Tech., (Ph.D)., MISTE.,


Asst. Professor in CSE,APCE

Unit I Syllabus : Unit I : LEXICAL ANALYSIS Introduction to Compiling- Compilers-Analysis of the source program- The phasesCousins-The grouping of phases-Compiler construction tools. The role of the lexical analyzer- Input buffering-Specification of tokens-Recognition of tokens-A language for specifying lexical analyzer.

Compiler :
A Compiler is a program that reads a program written in one language (Source Language like C,C++,etc) and translate it into an equivalent program in another language (Target Language like Machine Language) and the Compiler reports to its user the presence of errors in the source program. Source Program (High Level Language) Compiler Target Program (Low Level Language)

Error Message Classification of Compiler : 1. Single Pass Compiler (narrow) - traverse the source program in only once. Faster, has limited scope of passes, eg. Pascal 2. Multi-Pass Compiler (wide) processes the source program in several times. Slower, has wide scope of passes, eg. Java 3. Load and Go Compiler generates machine code and then immediately executes it. 4. Debugging or Optimizing Compiler - tries to minimize or maximize some attributes of an executable computer program

Software Tools :
Many software tools that manipulate source programs first perform some kind of analysis. Some examples of such tools include: Structure Editors : A structure editor takes as input a sequence of commands to build a source program. The structure editor not only performs the text-creation and modification functions of an ordinary text editor, but it also analyzes the program text, putting an appropriate hierarchical structure on the source program. Example while . do and begin.. end. Pretty printers : A pretty printer analyzes a program and prints it in such a way that the structure of the program becomes clearly visible. 1 / 15

Anna University B.E -VI Sem CSE CS2352 Principles of Compiler Design

D. Jagadeesan, B.E., M.Tech., (Ph.D)., MISTE.,


Asst. Professor in CSE,APCE

Unit I Static Checkers : A static checker reads a program, analyzes it, and attempts to discover potential bugs without running the program. Interpreters : Translate from high level language ( BASIC, FORTRAN, etc..) into assembly or machine language. Interpreters are frequently used to execute command language, since each operator executed in a command language is usually an invocation of a complex routine such as an editor or Compiler. The analysis portion in each of the following examples is similar to that of a conventional Compiler. Text formatters. Silicon Compiler. Query interpreters.

Analysis of Source Program :


The analysis phase breaks up the source program into constituent pieces and creates an intermediate representation of the source program. Analysis consists of three phases: Linear analysis (Lexical analysis or Scanning)) : The lexical analysis phase reads the characters in the source program and grouped into them tokens that are sequence of characters having a collective meaning. Example : position : = initial + rate * 60 Identifiers position, initial, rate. Assignment symbol - : = Operators - + , * Number - 60 Blanks eliminated. Hierarchical analysis (Syntax analysis or Parsing) : It involves grouping the tokens of the source program into grammatical phrases that are used by the Compiler to synthesize output. Example : position : = initial + rate * 60 Assignment statement | := Identifier | position Expression | identifier | initial Expression | + Expression | * Expression | identifier | rate Expression | number | 2 / 15 60

Anna University B.E -VI Sem CSE CS2352 Principles of Compiler Design

D. Jagadeesan, B.E., M.Tech., (Ph.D)., MISTE.,


Asst. Professor in CSE,APCE

Unit I Semantic analysis : In this phase checks the source program for semantic errors and gathers type information for subsequent code generation phase. An important component of semantic analysis is type checking. Example : int to real conversion. Expression | * Expression | identifier | rate Expression | number | inttoreal | 60

Phases of Compiler:
A Compiler operates in phases, each of which transforms the source program from one representation to another. Two parts (Six Phases) of compilation. They are, Analysis Phase ( Three Phases) Lexical Analysis Syntax Analysis Semantic Analysis Synthesis Phase ( Three Phases) Intermediate Code Generation Code Optimizer Code Generator Two other activities are Symbol Table Management Error Handler Lexical Analysis : It is also called scanner. The lexical analysis phase reads the characters in the source program and grouped into them tokens that are sequence of characters having a collective meaning. Such as an Identifier, a Keyword, a Punctuation, an operator or multi character operator like ++. The character sequence forming a token is called the lexeme for the token. Certain tokens will be augmented by a lexical value. Example : position : = initial + rate * 60 id1 := id2 + id3 * 60 Blanks eliminated. 3 / 15

Anna University B.E -VI Sem CSE CS2352 Principles of Compiler Design

D. Jagadeesan, B.E., M.Tech., (Ph.D)., MISTE.,


Asst. Professor in CSE,APCE

Unit I

Source Program Lexical Analyzer

Syntax Analyzer Error Handler

Symbol Table Management

Semantic Analyzer

Intermediate Code Generator Code Optimizer

Code Generator

Target Program Syntax analysis: It processes the string of descriptors (tokens), synthesized by the lexical analyzer to determine the syntactic structure of an input statement. This process is known as parsing. Output of the parsing step is a representation of the syntactic structure of a statement. A convenient representation is in the form of a syntax tree. Example : position : = initial + rate * 60 := id1 id2 id3 Semantic analysis : In this phase checks the source program for semantic errors and gathers type information for subsequent code generation phase. 4 / 15 +

*
60

Anna University B.E -VI Sem CSE CS2352 Principles of Compiler Design

D. Jagadeesan, B.E., M.Tech., (Ph.D)., MISTE.,


Asst. Professor in CSE,APCE

Unit I An important component of semantic analysis is type checking. Example : int to real conversion. := id1 id2 id3 +

*
inttoreal | 60

Intermediate Code Generation: It should be easy to produce. It should be easy to translate into the target program. Three address codes consist of a sequence of instructions, each of which has at most three operands. Example id1 := id2 + id3 * 60 Three address code as temp1 := inttoreal (60) temp2 := id3 * temp1 temp3 := id2 + temp2 id1 := temp3 Code Optimization: To improve the intermediate code, so that faster running machine code will result. Example Three address code after optimization as temp1 := id3 * 60.0 id1 := id2 + temp1 Code Generation: Final phase of the Compiler is the generation of target code, consisting or relocatable machine code or assembly code. Example for 8086 conversion code MOVF id3, R2 MULF #60.0, R2 MOVF id2, R1 ADDF R2, R1 MOVF R1, id1

5 / 15

Anna University B.E -VI Sem CSE CS2352 Principles of Compiler Design

D. Jagadeesan, B.E., M.Tech., (Ph.D)., MISTE.,


Asst. Professor in CSE,APCE

Unit I

Symbol Table Management: A Symbol table is data structure containing a record for each identifier with fields for the attributes of an identifier. When an identifier in the source program is detected by the lexical analyzer, the identifier is entered into the symbol table. However, the attributes of an identifier cannot normally be determined during lexical analyzer. The remaining phases enter information about identifiers into the symbol table and then use this information in various ways. Error Handler: Each phase can encounted errors The lexical phase can detect errors where the characters remaining in the input do not form any token of the language. The syntax analysis phase can detect errors where the token stream violates the structure rules of language. During semantic analysis, the compiler tries to detect construct that have the right syntactic structure but no meaning to the operation involved. An intermediate code generator may detect an operator whose operands have incompatible. The code optimizer, doing control flow analysis may detect that certain statements can never be reached. While entering information into the symbol table, the book keeping routine may discover an identifier that has been multiply declared with contradicting attributes.

6 / 15

Anna University B.E -VI Sem CSE CS2352 Principles of Compiler Design

D. Jagadeesan, B.E., M.Tech., (Ph.D)., MISTE.,


Asst. Professor in CSE,APCE

Unit I

Write down the output of each phase for expression position : = initial + rate * 60 Source Program position : = initial + rate * 60

Lexical Analyzer id1 := id2 + id3 * 60 Syntax Analyzer


:= id1 id2 id3 +

*
60

Semantic Analyzer
:= id1 + id2 id3

Symbol Table Management

*
inttoreal | 60

Error Handler

Intermediate Code Generator


temp1 : = inttoreal (60) temp2 := id3 * temp1 temp3 := id2 + temp2 id1 := temp3

Code Optimizer
temp1 := id3 * 60.0 id1 := id2 + temp1

Code Generator

Target Program
MOVF id3, R2 MULF #60.0, R2 MOVF id2, R1 ADDF R2, R1 MOVF R1, id1

7 / 15

Anna University B.E -VI Sem CSE CS2352 Principles of Compiler Design

D. Jagadeesan, B.E., M.Tech., (Ph.D)., MISTE.,


Asst. Professor in CSE,APCE

Unit I

Cousins of the Compiler (Language Processing System) :


Skeletal source Program Preprocessor Source Program Compiler Target Assembly Program Assembler Relocatable Machine Code Load/Link-editor Library, Relocatable Object Files

Absolute Machine Code Preprocessors : It produces input to Compiler. They may perform the following functions. Macro Processing : A preprocessor may allow a user to define macros that are shorthands for longer constructs. File inclusion : A preprocessor may include header files into the program text. Rational preprocessors : These preprocessors augment older language with more modern flow of control and data structuring facilities. Language extensions : These preprocessor attempts to add capabilities to the language by what amounts to built in macros. Compiler : It converts the source program (HLL) into target program (LLL). Assembler : It converts an assembly language (LLL) into machine code. Loader and Link Editors : Loader : The process of loading consists of taking relocatable machine code, altering the relocatable addresses and placing the altered instructions and data in memory at the proper locations. Link Editor : It allows us to make a single program from several files of relocatable machine code. 8 / 15

Anna University B.E -VI Sem CSE CS2352 Principles of Compiler Design

D. Jagadeesan, B.E., M.Tech., (Ph.D)., MISTE.,


Asst. Professor in CSE,APCE

Unit I

Grouping of Compiler :
A Symbol table is data structure containing a record for each identifier with fields for the attributes of an identifier. When an identifier in the source program is detected by the lexical analyzer, the

9 / 15

Anna University B.E -VI Sem CSE CS2352 Principles of Compiler Design

D. Jagadeesan, B.E., M.Tech., (Ph.D)., MISTE.,


Asst. Professor in CSE,APCE

Unit I

COMPILER CONSTRUCTION TOOLS:


The compiler writer like any programmer can profitably use software tools such as debuggers, version managers, profilers and so on. Compiler construction tools are Parser generators Scanner generators Syntax-directed translations engines Automatic code generators Dataflow engines Parser Generators: These produce syntax analyzers, normally from input that is based on CFG. In early Compilers, syntax analysis consumed not only a large fraction of the scanning time of a Compiler, but a large fraction of the intellectual effort of writing a Compiler. Eg: PIC, EQM Scanner Generator: These automatically generate lexical analyzers, normally from a specification based on regular expressions. The basic organization of the resulting lexical analyzer is in effect of finite automation. Syntax-Directed Translation Engines: These produce intermediate code with three address format, normally from input that is based on the parse tree. Automatic Code Generator: It takes a collection of rules that define the translation of each operation of the intermediate language into the machine language for the target machine. The input specification for these systems may contain: 1. A description of the lexical and syntactic structure of the source language. 2. A description of what output is to be generated for each source language construct. 3. A description of the target machine. Dataflow Engines: Much of the information needed to perform good code optimization involves dataflow analysis, the gathering of information about how values are transmitted from one part of a program to each other part. These systems have often been referred as, Compiler- compilers. Compiler-generators Translator-writing systems

10 / 15

Anna University B.E -VI Sem CSE CS2352 Principles of Compiler Design

D. Jagadeesan, B.E., M.Tech., (Ph.D)., MISTE.,


Asst. Professor in CSE,APCE

Unit I

ROLE OF LEXICAL ANALYSER:


Source Program Tokens Lexical analyzer Get next token Parser

Symbol table Management Its main task is to read the input characters and produce as output a sequence of tokens that the parser uses for syntax analysis. Receiving a get next token command from the parser, the lexical analyzer reads input characters until it can identify the next token. Its secondary takes are, 1. One task is stripping out from the source program comments and while space in the form of blank, tab, new line characters. 2. Another task is converting error messages from the compiler with the source program. Two phases 1. Scanning 2. Lexical analysis The scanner is responsible for doing simple tasks, while the lexical analyzer proper does the more complex operations. FUNCTIONS: 1. It produces the stream of tokens. 2. It eliminates blank and commands. 3. It generates symbol table which stores the information about ID, constants encountered in the input. 4. It keeps track of line number. 5. It reports the error encountered while interrupting the tokens. ISSUES IN LEXICAL ANALYSIS: There are several reasons for separating the analysis phase of compiling into lexical analysis and parsing. Simpler design. Compiler efficiency is improved. Compiler portability is enhanced. 11 / 15

Anna University B.E -VI Sem CSE CS2352 Principles of Compiler Design

D. Jagadeesan, B.E., M.Tech., (Ph.D)., MISTE.,


Asst. Professor in CSE,APCE

Unit I

TOKEN: It is a sequence of character that can be treated as a single logical entity. Typical tokens are, 1. Identifiers 2. Keywords 3. Operators 4. Special symbols 5. Constants PATTERN: A set of strings in the input for which the same token is produced as output. This set of strings is described by a rule called a pattern associated with the token. LEXEME: It is sequence of characters in the source program that is matched by the pattern foe a token.

INPUT BUFFERING :
During the analysis, the scanner scans the input string from left to right one character at a time to identify tokens. It uses two pointers for doing this analysis 1. Begin pointer (to keep track of first character for each token). 2. Forward pointer(to keep track of next character) bp f fp Steps in Scanning the Input: 1. Initially, both begin pointer and forward pointer points to the first character of the lexeme. 2. The fp scans the buffer until there is a match with the described token is found. 3. Once the lexeme is found (either a space or a delimiter), the fp will represent the right end to the lexeme. l o a t a , b ; a = A + 2 ;

12 / 15

Anna University B.E -VI Sem CSE CS2352 Principles of Compiler Design

D. Jagadeesan, B.E., M.Tech., (Ph.D)., MISTE.,


Asst. Professor in CSE,APCE

Unit I bp f l o a t fp 4. After processing the lexeme, both pointers will be set to point the character immediately after the lexeme. bp f l o a t a fp 5. This procedure is represented for the entire source program. Input strings are usually stored in buffer. Two Types: 1. One buffer scheme 2. Two buffer scheme One Buffer Scheme: Only one buffer of size N is used. First N characters of the input string are read into the buffer. When the fp reaches the end into the buffer, it will be filled with the next set of N characters. Drawbacks: The problem with this implementation is that when the size of the token is greater than N this scheme fails to produce the tokens. Two Buffer Scheme: bp f L o fp First half N Size Two N character buffers are used. 13 / 15 Second half N Size a t eof a , b ; a = a + 2 eof , b ; a = A + 2 ; a , b ; a = A + 2 ;

Anna University B.E -VI Sem CSE CS2352 Principles of Compiler Design

D. Jagadeesan, B.E., M.Tech., (Ph.D)., MISTE.,


Asst. Professor in CSE,APCE

Unit I First N characters are read into the first half of the buffer. If the buffer hasnt filled (<N) then a special character called EOF will be inserted to indicate the end. When the pointer reaches the end of first half, then the second half will be loaded with next N characters of the same program. When the pointer is about to reach the end of second half, then the first half will be loaded with next N characters of the input. Algorithm for advancing the fp : if fp is at the end of first half then begin Load second half; Increment fp by 1; end else if fp at the end of second half then begin Load first half; Set fp to first character of first half; end else increment fp by 1; end Every time to check whether it has reached its end or not. To reduce the number of comparisons, a special character called sentinel character (usually EOF) is introduced at ends of the buffer halves.

Algorithm for advancing the fp using Sentinel: fp = fp+1; begin if fp = eof then if fp at the end of first half then begin Load second half; Fp by 1; end else if fp at the end of second half then begin Load first half; Set fp to first character of first half; end else Terminate lexical analysis; End 14 / 15

Anna University B.E -VI Sem CSE CS2352 Principles of Compiler Design

D. Jagadeesan, B.E., M.Tech., (Ph.D)., MISTE.,


Asst. Professor in CSE,APCE

Unit I Refer the following from Theory of Computation 1. 2. 3. 4. 5. 6. 7. Finite Automata DFA NFA Regular Expression Converting R.E into NFA Converting NFA with into NFA and DFA Minimization of DFA.

15 / 15

You might also like