0% found this document useful (0 votes)
24 views42 pages

Compiler Design CSE - 353: UNIT-1

Compiler Design Notes

Uploaded by

Ayush Jindal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views42 pages

Compiler Design CSE - 353: UNIT-1

Compiler Design Notes

Uploaded by

Ayush Jindal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Compiler Design

CSE - 353

UNIT-1
Syllabus

Unit 1: Introduction
Introduction to Compiler, Phases and passes, Bootstrapping,
Cross-Compiler
Finite state machines and regular expressions and their
applications to lexical analysis

lexical-analyzer generator, Lexical Phase errors


Definition of
Compiler:

Compiler is a software which


converts a program written in high
level language (Source Language)
to low level language (Object/
Target/ Machine Language).
Types of Compiler

 Cross Compiler that runs on a machine „A‟ and produces a code


for another machine „B‟. It is capable of creating code for a
platform other than the one on which the compiler is running.

 Source-to-source Compiler or transcompiler or transpiler is a


compiler that translates source code written in one programming
language into source code of another programming language.
Language processing systems
(using Compiler)
Continued..
 High Level Language
• If a program contains #define or #include directives such as #include or
#define it is called HLL.
• They are closer to humans but far from machines.
• These (#) tags are called pre-processor directives. They direct the pre-
processor about what to do.
 Pre-Processor
 The pre-processor removes all the #include directives by including the
files called file inclusion and all the #define directives using macro
expansion.
 It performs file inclusion, augmentation, macro-processing etc.
Continued..

 Assembly Language
 Its neither in binary form nor high level.
 It is an intermediate state that is a combination of machine instructions
and some other useful data needed for execution.
 Assembler
 For every platform (Hardware + OS) we will have a assembler.
 They are not universal since for each platform we have one.
 The output of assembler is called object file. Its translates assembly
language to machine code.
Continued..
 Interpreter
 An interpreter converts high level language into low level machine
language, just like a compiler.
 But they are different in the way they read the input.
 The Compiler in one go reads the inputs, does the processing and executes
the source code whereas the interpreter does the same line by line.
 Compiler scans the entire program and translates it as a whole into
machine code whereas an interpreter translates the program one statement
at a time.
 Interpreted programs are usually slower with respect to compiled ones.
Continued..

 Relocatable Machine Code


 It can be loaded at any point and can be run.
 The address within the program will be in such a way that it will
cooperate for the program movement.
 Loader/Linker
 Linker loads a variety of object files into a single file to make it
executable. Then loader loads it in memory and executes it.
Phases of Compiler
Continued..

1. Lexical analyser
 It is also called scanner.
 It takes the output of preprocessor (which performs file inclusion and
macro expansion) as the input which is in pure high level language.
 It reads the characters from source program and groups them into
lexemes (sequence of characters that “go together”).
 Each lexeme corresponds to a token. Tokens are defined by regular
expressions which are understood by the lexical analyzer.
 It also removes lexical errors (for e.g., erroneous characters),
comments and white space.
Continued..

2. Syntax Analyser
 It is sometimes called as parser.
 It constructs the parse tree.
 It takes all the tokens one by one and uses Context Free
Grammar to construct the parse tree.
 Syntax error can be detected at this level if the input is not in
accordance with the grammar.
.
Continued..

3. Semantic Analyser
 It verifies the parse tree, whether it‟s meaningful or not.
 It does type checking, Label checking and Flow control
checking.
Continued..
4. Intermediate Code Generator
 It generates intermediate code, that is a form which can be readily
executed by machine
 Example – Three address code
5. Code Optimizer
 It transforms the code so that it consumes fewer resources and produces
more speed.
6. Target Code Generator
 The main purpose is to write a code that the machine can understand and
also register allocation, instruction selection etc.
 The optimized code is converted into relocatable machine code which
then forms the input to the linker and loader.
.
Compiler Construction Tools

 The compiler writer can use some specialized tools that


help in implementing various phases of a compiler.

 These tools assist in the creation of an entire compiler or


its parts.
Continued..
1. Parser Generator
 It produces syntax analyzers (parsers) from the input that is based on a
grammatical description of programming language or on a context-free
grammar.
 It is useful as the syntax analysis phase is highly complex and
consumes more manual and compilation time.
Continued..
2. Scanner Generator
 It generates lexical analyzers from the input that consists of regular
expression description based on tokens of a language.
 It generates a finite automaton to recognize the regular expression.
Difference between Compiler and Interpreter
Compiler Interpreter

Coverts the entire source code of a


Takes a source program and runs it line by
programming language into executable
line, translating each line as it comes to it.
machine code for a CPU.

Compiler takes large amount of time to


Interpreter takes less amount of time to
analyze the entire source code but the
analyze the source code but the overall
overall execution time of the program is
execution time of the program is slower.
comparatively faster.

Compiler generates the error message only Its Debugging is easier as it continues
after scanning the whole program, so translating the program until the error is
debugging is comparatively hard. met

Generates intermediate object code. No intermediate object code is generated.

Examples: C, C++, Java Examples: Python, Perl


Bootstrapping and Cross Compiler

Bootstrapping is a process in which simple language is used to


translate more complicated program which in turn may handle far more
complicated program. This complicated program can further handle
even more complicated program and so on.

A cross compiler is a compiler capable of creating executable code for


a platform other than the one on which the compiler is running. For
example, a compiler that runs on a Windows 7 PC but generates code
that runs on Android smartphone is a cross compiler.
Dr. Bharat Bhushan, School of Engineering and Technology, Sharda University,
Greater Noida, India.
Continued..
 Suppose we want to write a cross
compiler for new language X.
 The implementation language of this
compiler is say Y and the target
code being generated is in language
Z. That is, we create XYZ.
 Now if existing compiler Y runs on
machine M and generates code for
M then it is denoted as YMM.
 Now if we run XYZ using YMM
then we get a compiler XMZ.
 That means a compiler for source
language X that generates a target
code in language Z and which runs
onBhushan,
Dr. Bharat machine M. and Technology, Sharda University,
School of Engineering
Greater Noida, India.
Continued..

Dr. Bharat Bhushan, School of Engineering and Technology, Sharda University,


Greater Noida, India.
Difference between Native Compiler and
Cross Compiler

Native Compiler Cross Compiler


Translates program for same Translates program for different
hardware/platform/machine on it is hardware/platform/machine other than
running. the platform which it is running.

It is used to build programs for same It is used to build programs for other
system/machine & OS it is installed. system/machine like AVR/ARM.

It is dependent on System/machine and It is independent of System/machine and


OS OS

It can generate executable file like .exe It can generate raw code .hex

TurboC or GCC is native Compiler. Keil is a cross compiler.


Dr. Bharat Bhushan, School of Engineering and Technology, Sharda University,
Greater Noida, India.
Finite state machine

 Finite state machine is used to recognize patterns.

 Finite automata machine takes the string of symbol as input and changes
its state accordingly. In the input, when a desired symbol is found then
the transition occurs.

 While transition, the automata can either move to the next state or stay in
the same state.

 FA has two states: accept state or reject state. When the input string is
successfully processed and the automata reached its final state then it will
accept.
Dr. Bharat Bhushan, School of Engineering and Technology, Sharda University,
Greater Noida, India.
Continued..

Dr. Bharat Bhushan, School of Engineering and Technology, Sharda University,


Greater Noida, India.
Continued.. (From PDF-1)

 DFA
 NFA
 Regular Expression

Dr. Bharat Bhushan, School of Engineering and Technology, Sharda University,


Greater Noida, India.
Lexical Analysis

 Lexical analysis is the process of converting a sequence of characters


from source program into a sequence of tokens.

 A program which performs lexical analysis is termed as a lexical


analyzer (lexer), tokenizer or scanner.

 Lexical analysis consists of two stages of processing which are as


follows:

• Scanning

• Tokenization
Dr. Bharat Bhushan, School of Engineering and Technology, Sharda University,
Greater Noida, India.
Continued..

Token:
Token is a valid sequence of characters which are given by
lexeme. In a programming language,
• keywords,
• constant,
• identifiers,
• numbers,
• operators and
• punctuations symbols
are possible tokens to be identified.
Dr. Bharat Bhushan, School of Engineering and Technology, Sharda University,
Greater Noida, India.
Continued..

Pattern

Pattern describes a rule that must be matched by sequence of characters


(lexemes) to form a token. It can be defined by regular expressions or
grammar rules.

Lexeme

Lexeme is a sequence of characters that matches the pattern for a token

Dr. Bharat Bhushan, School of Engineering and Technology, Sharda University,


Greater Noida, India.
Continued..

Dr. Bharat Bhushan, School of Engineering and Technology, Sharda University,


Greater Noida, India.
Role of Lexical Analyzer

Dr. Bharat Bhushan, School of Engineering and Technology, Sharda University,


Greater Noida, India.
Continued..

Lexical analyzer performs the following tasks:

• Reads the source program, scans the input characters, group them into
lexemes and produce the token as output.

• Enters the identified token into the symbol table.

• Strips out white spaces and comments from source program.

• Correlates error messages with the source program i.e., displays error
message with its occurrence by specifying the line number.

• Expands the macros if it is found in the source program.

Dr. Bharat Bhushan, School of Engineering and Technology, Sharda University,


Greater Noida, India.
Issues in Lexical Analysis

Lexical analysis is the process of producing tokens from the


source program. It has the following issues:

• Lookahead

• Ambiguities

Dr. Bharat Bhushan, School of Engineering and Technology, Sharda University,


Greater Noida, India.
Continued..
Lookahead

Lookahead is required to decide when one token will end and the next token
will begin. The simple example which has lookahead issues are i vs. if,
= vs. ==. Therefore a way to describe the lexemes of each token is required.

A way needed to resolve ambiguities

• Is if it is two variables i and f or if?

• Is == is two equal signs =, = or ==?

Hence, the number of lookahead to be considered and a way to describe the


lexemes of each token is also needed.
Dr. Bharat Bhushan, School of Engineering and Technology, Sharda University,
Greater Noida, India.
Continued..
Ambiguities

The lexical analysis programs written with lex accept ambiguous


specifications and choose the longest match possible at each input point.

Lex can handle ambiguous specifications. When more than one expression
can match the current input, lex chooses as follows:

• The longest match is preferred.

• Among rules which matched the same number of characters, the rule given
first is preferred.

Dr. Bharat Bhushan, School of Engineering and Technology, Sharda University,


Greater Noida, India.
Example of Lexical Analysis, Tokens, Non-Tokens

Dr. Bharat Bhushan, School of Engineering and Technology, Sharda University,


Greater Noida, India.
Continued..

Dr. Bharat Bhushan, School of Engineering and Technology, Sharda University,


Greater Noida, India.
Lexical Errors

A character sequence which is not possible to scan into any valid


token is a lexical error. Important facts about the lexical error:

 Lexical errors are not very common, but it should be managed by a


scanner

 Misspelling of identifiers, operators, keyword are considered as


lexical errors

 Generally, a lexical error is caused by the appearance of some


illegal character, mostly at the beginning of a token.

Dr. Bharat Bhushan, School of Engineering and Technology, Sharda University,


Greater Noida, India.
Error Recovery in Lexical Analyzer

Most common error recovery techniques:

 Removes one character from the remaining input

 In the panic mode, the successive characters are always ignored


until we reach a well-formed token

 By inserting the missing character into the remaining input

 Replace a character with another character

 Transpose two serial characters

Dr. Bharat Bhushan, School of Engineering and Technology, Sharda University,


Greater Noida, India.
Lexical Analyzer vs. Parser

Lexical Analyser Parser

Scan Input program Perform syntax analysis

Identify Tokens Create an abstract


representation of the code

Insert tokens into Symbol Update symbol table entries


Table
It generates lexical errors It generates a parse tree of the
source code
Dr. Bharat Bhushan, School of Engineering and Technology, Sharda University,
Greater Noida, India.
.

THANK
YOU
Dr. Bharat Bhushan, School of Engineering and Technology, Sharda University,
Greater Noida, India.

You might also like