Compiler Design CSE - 353: UNIT-1
Compiler Design CSE - 353: UNIT-1
CSE - 353
UNIT-1
Syllabus
Unit 1: Introduction
Introduction to Compiler, Phases and passes, Bootstrapping,
Cross-Compiler
Finite state machines and regular expressions and their
applications to lexical analysis
Assembly Language
Its neither in binary form nor high level.
It is an intermediate state that is a combination of machine instructions
and some other useful data needed for execution.
Assembler
For every platform (Hardware + OS) we will have a assembler.
They are not universal since for each platform we have one.
The output of assembler is called object file. Its translates assembly
language to machine code.
Continued..
Interpreter
An interpreter converts high level language into low level machine
language, just like a compiler.
But they are different in the way they read the input.
The Compiler in one go reads the inputs, does the processing and executes
the source code whereas the interpreter does the same line by line.
Compiler scans the entire program and translates it as a whole into
machine code whereas an interpreter translates the program one statement
at a time.
Interpreted programs are usually slower with respect to compiled ones.
Continued..
1. Lexical analyser
It is also called scanner.
It takes the output of preprocessor (which performs file inclusion and
macro expansion) as the input which is in pure high level language.
It reads the characters from source program and groups them into
lexemes (sequence of characters that “go together”).
Each lexeme corresponds to a token. Tokens are defined by regular
expressions which are understood by the lexical analyzer.
It also removes lexical errors (for e.g., erroneous characters),
comments and white space.
Continued..
2. Syntax Analyser
It is sometimes called as parser.
It constructs the parse tree.
It takes all the tokens one by one and uses Context Free
Grammar to construct the parse tree.
Syntax error can be detected at this level if the input is not in
accordance with the grammar.
.
Continued..
3. Semantic Analyser
It verifies the parse tree, whether it‟s meaningful or not.
It does type checking, Label checking and Flow control
checking.
Continued..
4. Intermediate Code Generator
It generates intermediate code, that is a form which can be readily
executed by machine
Example – Three address code
5. Code Optimizer
It transforms the code so that it consumes fewer resources and produces
more speed.
6. Target Code Generator
The main purpose is to write a code that the machine can understand and
also register allocation, instruction selection etc.
The optimized code is converted into relocatable machine code which
then forms the input to the linker and loader.
.
Compiler Construction Tools
Compiler generates the error message only Its Debugging is easier as it continues
after scanning the whole program, so translating the program until the error is
debugging is comparatively hard. met
It is used to build programs for same It is used to build programs for other
system/machine & OS it is installed. system/machine like AVR/ARM.
It can generate executable file like .exe It can generate raw code .hex
Finite automata machine takes the string of symbol as input and changes
its state accordingly. In the input, when a desired symbol is found then
the transition occurs.
While transition, the automata can either move to the next state or stay in
the same state.
FA has two states: accept state or reject state. When the input string is
successfully processed and the automata reached its final state then it will
accept.
Dr. Bharat Bhushan, School of Engineering and Technology, Sharda University,
Greater Noida, India.
Continued..
DFA
NFA
Regular Expression
• Scanning
• Tokenization
Dr. Bharat Bhushan, School of Engineering and Technology, Sharda University,
Greater Noida, India.
Continued..
Token:
Token is a valid sequence of characters which are given by
lexeme. In a programming language,
• keywords,
• constant,
• identifiers,
• numbers,
• operators and
• punctuations symbols
are possible tokens to be identified.
Dr. Bharat Bhushan, School of Engineering and Technology, Sharda University,
Greater Noida, India.
Continued..
Pattern
Lexeme
• Reads the source program, scans the input characters, group them into
lexemes and produce the token as output.
• Correlates error messages with the source program i.e., displays error
message with its occurrence by specifying the line number.
• Lookahead
• Ambiguities
Lookahead is required to decide when one token will end and the next token
will begin. The simple example which has lookahead issues are i vs. if,
= vs. ==. Therefore a way to describe the lexemes of each token is required.
Lex can handle ambiguous specifications. When more than one expression
can match the current input, lex chooses as follows:
• Among rules which matched the same number of characters, the rule given
first is preferred.
THANK
YOU
Dr. Bharat Bhushan, School of Engineering and Technology, Sharda University,
Greater Noida, India.