Compiler Construction Tools & Introduction To LA
Compiler Construction Tools & Introduction To LA
B∈V*(Any string).
Example –
S –> AB
A –> a
B –> b
2. Regular Grammar :
It is accepted by Finite State Automata.
It is a subset of Type 0 ,Type 1 and Type 2 grammar.
The language it generates is called Regular Language.
Regular languages are closed under operations like Union, Intersection, Complement etc.
They are the most restricted form of grammar.
Productions are in the form –
V –> VT / T (left-linear grammar) (or)
Example –
1. S –> ab.
2. S -> aS | bS | ∊
Difference Between Context Free Grammar and Regular Grammar:
Restriction Less than Regular Grammar More than any other grammar
Set Property Super Set of Regular Grammar Subset of Context Free Grammar
The range of languages that The range of languages that come under
Range come under CFG is wide. RG is less than CFG.
1. Parser Generator –
It produces syntax analyzers (parsers) from the input that is based on a grammatical
description of programming language or on a context-free grammar. It is useful as the syntax
analysis phase is highly complex and consumes more manual and compilation time.
Example: PIC, EQM
2. Scanner Generator –
It generates lexical analyzers from the input that consists of regular expression description
based on tokens of a language. It generates a finite automaton to recognize the regular
expression.
Example: Lex
3. Syntax directed translation engines –
It generates intermediate code with three address format from the input that consists of a
parse tree. These engines have routines to traverse the parse tree and then produces the
intermediate code. In this, each node of the parse tree is associated with one or more
translations.
4. Automatic code generators –
It generates the machine language for a target machine. Each operation of the intermediate
language is translated using a collection of rules and then is taken as an input by the code
generator. A template matching process is used. An intermediate language statement is
replaced by its equivalent machine language statement using templates.
5. Data-flow analysis engines –
It is used in code optimization. Data flow analysis is a key part of the code optimization that
gathers the information, that is the values that flow from one part of a program to another.
6. Compiler construction toolkits –
It provides an integrated set of routines that aids in building compiler components or in the
construction of various phases of compiler.
Introduction of Lexical Analysis
Lexical Analysis is the first phase of the compiler also known as a scanner. It converts the
High level input program into a sequence of Tokens.
Lexical Analysis can be implemented with the Deterministic finite Automata.
The output is a sequence of tokens that is sent to the parser for syntax analysis
What is a token?
A lexical token is a sequence of characters that can be treated as a unit in the grammar of the
programming languages.
Example of tokens:
Type token (id, number, real, . . . )
Punctuation tokens (IF, void, return, . . . )
Alphabetic tokens (keywords)
Keywords; Examples-for, while, if etc.
Example of Non-Tokens:
Comments, preprocessor directive, macros, blanks, tabs, newline, etc.
Lexeme: The sequence of characters matched by a pattern to form
the corresponding token or a sequence of input characters that comprises a single token is
called a lexeme.
How Lexical Analyzer functions
1. Tokenization i.e. Dividing the program into valid tokens.
2. Remove white space characters.
3. Remove comments.
4. It also provides help in generating error messages by providing row numbers and column
numbers.
The lexical analyzer identifies the error with the help of the automation machine and the
grammar of the given language on which it is based like C, C++, and gives row number and
column number of the error.
Suppose we pass a statement through lexical analyzer –
int main()
{ // 2 variables
int a, b;
a = 10;
return 0;
'int' 'main' '(' ')' '{' 'int' 'a' ',' 'b' ';'
Exercise 1:
Count number of tokens :
int main()
printf("sum is :%d",a+b);
return 0;