0% found this document useful (0 votes)
72 views95 pages

Noida Institute of Engineering and Technology, Greater Noida Noida Institute of Engineering and Technology, Greater Noida

The document discusses a course on compiler design and lexical analysis being taught by Nishant Kumar Hind. It includes details about the course content such as an introduction to compilers and their structure, phases of compilers, language processing systems, and lexical analysis. The objectives are to introduce concepts underlying language processor design and implementation. Topics will include regular expressions, finite automata, scanners, and parsers.

Uploaded by

techUpdate
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views95 pages

Noida Institute of Engineering and Technology, Greater Noida Noida Institute of Engineering and Technology, Greater Noida

The document discusses a course on compiler design and lexical analysis being taught by Nishant Kumar Hind. It includes details about the course content such as an introduction to compilers and their structure, phases of compilers, language processing systems, and lexical analysis. The objectives are to introduce concepts underlying language processor design and implementation. Topics will include regular expressions, finite automata, scanners, and parsers.

Uploaded by

techUpdate
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 95

Noida Institute of Engineering and Technology, Greater Noida

Introduction of Compiler Design


&
Lexical Analysis

Unit: 1

Subject:
Compiler Design (RCS-602)
Nishant Kumar Hind
Course Details Assistant Professor
B Tech CSE 6th Sem CSE Department

Nishant Kumar Hind KCS-502 CD Unit -1


1
08/11/2021
Content
• Introduction • Finite Automata
– Translator • Regular Expression
– Compiler • Thompson’s Method
• Simple structure of compiler • Subset Construction Method
• Phases of Compiler • Lexical Analysis
– The structure of compiler • Context Free Grammar
– Analogy
– An example
• Language Processing System
• Pass of Compiler
• Front end and back end of
compiler
• Bootstrapping and Cross Compiler

Nishant Kumar Hind KCS-502 CD Unit -1


08/11/2021 2
Course Objectives

Introduce students to the concepts underlying the design and


implementation of language processors. More specifically, by the end
of the course, students will be able to answer these questions:
• What language processors are, and what functionality do they
provide to their users?
• What core mechanisms are used for providing such functionality?
• How are these mechanisms implemented?
• Apart from providing a theoretical background, the course places a
special emphasis in practical issues in designing language
processors.

Nishant Kumar Hind KCS-502 CD Unit -1


08/11/2021 3
Course Outcomes

• CO1: To have the knowledge of patterns, tokens, regular


expressions and finite automata to develop a scanner or lexical
analyzer.
• CO2: To design and develop various parser by parsing LL parser and
LR parser
• CO3: To apply various design & conduct experiments for
Intermediate Code Generation in compiler.
• CO4: To design and develop various Data structure for symbols
tables and Error Detection & Recovery at every phases
• CO5: To apply various new code optimization techniques to improve
the performance of a program in terms of speed & space.

Nishant Kumar Hind KCS-502 CD Unit -1


08/11/2021 4
Unit Objective(CO1)

• Introduce students to the concepts of simple structure of Compiler.


• Introduce students to the concepts of Token through finite
automata
• Introduce students to the concepts of scanner.

Nishant Kumar Hind KCS-502 CD Unit -1


08/11/2021 5
Program Outcomes (PO)
• PO1: Engineering Knowledge
• PO2: Problem Analysis
• PO3: Design/Development of solutions
• PO4: Conduct Investigations of complex problems
• PO5: Modern tool usage
• PO6: The engineer and society
• PO7: Environment and sustainability
• PO8: Ethics
• PO9: Individual and team work
• PO10: Communication
• PO11: Project management and finance
• PO12: Life-long learning

Nishant Kumar Hind KCS-502 CD Unit -1


08/11/2021 6
CO-PO Mapping

PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10PO11PO12
 

RCS-602.1 3 3 3 3 3 1 1 1 3 1 2 2

RCS-602.2 3 3 3 3 3 1 1 1 3 1 2 2

RCS-602.3 3 3 3 3 3 1 1 1 3 1 2 2

RCS-602.4 3 2 3 3 3 1 1 1 3 1 2 2

RCS-602.5 3 2 3 3 3 1 1 1 3 1 2 2

AVG 3 2.6 3 3 3 1 1 1 3 1 2 2

Nishant Kumar Hind KCS-502 CD Unit -1


08/11/2021 7
Program Specific Outcomes (PSO)
• PSO1: The ability to identify, analyze real world problems and design their
ethical solutions using artificial intelligence, robotics, virtual/augmented
reality, data analytics, block chain technology, and cloud computing.
•  
• PSO2:The ability to design and develop the hardware sensor devices and
related interfacing software systems for solving complex engineering
problems.
•  
• PSO 3:The ability to understand inter disciplinary computing techniques
and to apply them in the design of advanced computing.
•  
• PSO 4:The ability to conduct investigation of complex problem with the
help of technical, managerial, leadership qualities, and modern
engineering tools provided by industry sponsored laboratories.

Nishant Kumar Hind KCS-502 CD Unit -1


08/11/2021 8
CO-PSO Mapping

PSO1 PSO2 PSO3 PSO4


RCS602.1 3 3 3 1
RCS602.2 3 3 3 1
RCS602.3 3 3 3 1
RCS602.4 3 3 3 1
RCS602.5 3 3 3 1
AVG 3 3 3 1

Nishant Kumar Hind KCS-502 CD Unit -1


08/11/2021 9
Prerequisite and Recap

• Theory of Automata
• Algorithms
• Languages and machines
• Operating systems
• Computer architectures

Nishant Kumar Hind KCS-502 CD Unit -1


08/11/2021 10
Topics Mapping with Course Outcome
S.No. Topic Course Outcome
1 Introduction of Compiler CO1
2 Simple structure of compiler CO1
3 Phases of Compiler CO1
4 Language Processing System CO1
5 Pass of Compiler CO1
6 Front end and back end of compiler CO1
7 Bootstrapping and Cross Compiler CO1
8 Finite Automata CO1
9 Regular Expression CO1
10 Thompson’s Method CO1
11 Subset Construction Method CO1
12 Lexical Analysis CO1
13 Context Free Grammar CO2

Nishant Kumar Hind KCS-502 CD Unit -1


08/11/2021 11
Introduction

• Translator: Source language Program


• A program that translates a
source program written in
one language into an
executable target program
in another language. Translator

Target language Program

Nishant Kumar Hind KCS-502 CD


08/11/2021 12
Unit -1
Introduction

• Target program is a running Input


process on any computer
which accepts some input and
in response generates desired
output.
Target Program

Output

Nishant Kumar Hind KCS-502 CD


08/11/2021 13
Unit -1
Introduction

• Compiler:
– A program that translates a
source program written in high High level language
level language into
target/executable program
written in low level language is
called compiler.
• High Level Language: Compiler
– Which closer to human
understanding. Ex- C, C++, Java,
Pascal etc
• Low Level Language: Low level language
– Which is closer to computer
understanding. Ex- Machine
Language and assembly
language.

Nishant Kumar Hind KCS-502 CD


08/11/2021 14
Unit -1
Simple Structure of a Compiler

• Process of compilation
involves two major phases.
Source program
– Analysis Phase:
Where Source program is Source program
analyzed for errors. Analysis
– Synthesis Phase: Compiler
Where source is Synthesis
converted into target Target program
program. Target program

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 15


Phases of Compiler: Structure of a Compiler
Source Program

Lexical Analysis

Syntax Analysis

Symbol Table Semantic Analysis


Error Handler
management
Intermediate Code Generation

Code Optimization

Target Code Generation

Target Program
Nishant Kumar Hind KCS-502 CD Unit -1
08/11/2021 16
Phases of Compiler : Analogy of Natural language

• Si a htis omipcerl aslsc.  Tokens(words) are not correct Lexical Error


• Is a class this compiler.  Valid tokens but sentence is not correct syntax
error
• This is a compiler class.  correct syntax but semantic error
• This is a compiler design class.  correct statement
• Converting each word individually into its meaning is:
– Intermediate code generation.
• Code optimization is optional phase.
• Complete translation of the sentence by using individual meaning of each
word is:
– Target code generation

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 17


Phases of Compiler: An Example
Statement: P = I + R * 40
Intermediate Code Generation
Lexical Analysis
t1 = inttofloat (40)
Id1 = id2 + id3 * 40 t2 = id3 * t1
t3 = id2+t2
Syntax Analysis id1 = t3
= Code Optimization
id1 +
t1 = id3 * 40.0
id2 * id1 = id2 + t1
id3 40
Target Code Generation
Semantic Analysis
= LDF R2, id3
id1 + MULF R2, #40.0
LDF R1, id2
id2 * ADDF R1, R2
id3 int to float STF id1, R1
Nishant Kumar Hind KCS-502 CD Unit -1
08/11/2021 40 18
Language Processing System
Source Program

Pre-Processor
Pre-Processed Code {#include, #define etc
Compiler
Target Assembly Code
Assembler
Re-locatable machine Code
Linker
Executable Machine Code {More obj. files
Loader

Memory

Processor
Nishant Kumar Hind KCS-502 CD Unit -1
08/11/2021 19
Pass of Compiler
• Pass: A Compiler Pass refers to the traversal of a compiler through the
entire program.
• Pass also refers to the grouping of phases in different module.
• Compiler pass are of two types:
– Single Pass Compiler
– Two Pass Compiler or Multi Pass Compiler
• Single Pass: If we combine or group all the phases of compiler design
in a single module known as single pass compiler.
• Multi Pass: A Two pass/multi-pass Compiler is a type of compiler that
processes the source code or abstract syntax tree of a program
multiple times. In multi pass Compiler we divide phases in two or
more Module.

Nishant Kumar Hind KCS-502 CD


08/11/2021 20
Unit -1
Single Pass Compiler

Lexical Analysis

Syntax Analysis

Semantic Analysis
Single Pass: All the
units are in one
Intermediate Code Generation module

Code Optimization

Target Code Generation

Nishant Kumar Hind KCS-502 CD Unit -1


08/11/2021 21
Two Pass or Multi-Pass Compiler

Lexical Analysis

Syntax Analysis

Semantic Analysis First Pass: Front End

Intermediate Code Generation

Code Optimization
Second Pass: Back
End
Target Code Generation

Nishant Kumar Hind KCS-502 CD Unit -1


08/11/2021 22
Front-End & Back- End of Compiler

Lexical Analysis

Syntax Analysis Front End ,Related


to Source Language
Semantic Analysis (Machine
Independent)
Intermediate Code Generation

Code Optimization Back End, Related


to Target Language
( Machine
Target Code Generation
Dependent)

Nishant Kumar Hind KCS-502 CD Unit -1


08/11/2021 23
Bootstrapping
• Bootstrapping is a process in which simple language is used to
translate more complicated program which in turn may handle for
more complicated program.
• This complicated program can further handle even more
complicated program and so on.
• Using facilities provided by compiler to compile itself is essential
feature of bootstrapping concept

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 24


Bootstrapping…

• Language Associated with a Compiler:

– Source Language(S): Language for which compiler is designed.


– Target Language(T): Language in which compiler generates final
code
– Implementation Language(I): Language in which compiler itself
is written.

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 25


Bootstrapping…
• T Diagram Representation of Compiler:
– Source Language: S
– Target Language: T
– Implementation Language: I

S T
I

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 26


Bootstrapping
Example:
• Design a compiler for language L on machine M.
• Use python programming language for designing the compiler.

L M
P P M
C C M
A A M
M

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 27


Cross Compiler
• A Compiler which runs on one machine and generates target code
for Another machine is called Cross Compiler.

S N
M Cross Compiler

• Above compiler compiles program written in language S on machine


M but
• will generate target code for machine N.

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 28


Bootstrapping and Cross Compiler
• Question:
• Create a compiler for language S on machine N by using an existing
compiler for same language on machine M.
• Solution:
– Given:

S M
M

– Desired: S N
N

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 29


Bootstrapping and Cross Compiler
• Solution:
• Step1: Write a program to design a compiler using language S on
machine M which generates target as N.
S N
S
• Step2: Run Designed compiler on existing one.
S N S N
S S M M
A Cross Compiler is created
M
• Step3: Run Designed compiler on new cross compiler.

S N S N
S S N N
Desired Compiler is created
M

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 30


Bootstrapping and Cross Compiler
• Question:
• Create a compiler for language P on machine N by using an existing
compiler for language S on machine M.
• Solution:
– Given:

S M
M

– Desired: P N
N

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 31


Bootstrapping and Cross Compiler
• Solution:
• Step1: Write a program to design a compiler using language S on
machine M for source language P which generates target as N.
P N
S
• Step2: Run Designed compiler on existing one.
P N P N
S S M M
A Cross Compiler is created
M
• Step3: Write a program to design a compiler using language P on
machine M for source language P which generates target as N

P N
P

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 32


Bootstrapping and Cross Compiler
• Solution:
• Step4: Run Designed compiler in step3 on new cross compiler.

P N P N
P P N N
Desired Compiler is created
M

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 33


Finite Automata
• A finite automaton  is a machine used to recognize patterns within
input taken from some character set (or alphabet) .
• The job of an FA is to accept or reject an input depending on
whether the pattern defined by the FA occurs in the input.
• Finite Automata (M) is defined by 5-tuples.
– M = (Q, ∑, q0 , δ ,F)
• Q: Finite set of non empty states.
• ∑: Finite set of input characters or alphabets.
• q0: Initial state.
• δ: State transition function.
• F: Non empty set of final states.

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 34


Types of Finite Automata

• There are two types of finite automata.


– Deterministic Finite Automata (DFA)
• One transition per input per state
• No -moves (null moves)
– Nondeterministic Finite Automata (NFA)
• Can have multiple transitions for one input in a given state
• Can have -moves

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 35


Deterministic Finite Automata (DFA)

• State transition function of DFA is:


δ: Q X ∑  Q
– Example: Let Q = {q0, q1}, ∑= {a, b} then
– Q X ∑ = { (q0,a), (q0, b), (q1, a), (q1, b) }

δ: Q X ∑  Q
QX∑ Q

(q0,a)
q0
(q0, b) One transition per input
q1 per state and no -moves
(q1, a)

(q1, b)
08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 36
Nondeterministic Finite Automata (NFA)

• State transition function of NFA is:


δ: Q X ∑  P(Q)
– Example: Let Q = {q0, q1}, ∑= {a, b} then
– Q X ∑ = { (q0,a), (q0, b), (q1, a), (q1, b) }
– P(Q) (Power set of Q) = { {}, {q0}, {q1}, {q0, q1} }
δ: Q X ∑  P(Q)
QX∑ P(Q)

(q0,a)
{}
(q0, b) {q0} Multiple transitions for
{q1} one input in a given state
(q1, a) {q0, q1}
(q1, b)
08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 37
Representation of Finite Automata

• State Transition Diagram

– States are represented by: q


– Inputs are represented by:
– Initial state: q
– Final States: q

DFA of string ending with 0 over input set {0, 1}:


1 0

0
q0 q1
1

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 38


Representation of Finite Automata

• State Transition Table: A 2-D Array is used

– Rows are represented by states (Q)


– Columns are represented by input characters (∑)

DFA of string ending with 0 over input set {0, 1}:

1 0 Q ∑ 0 1
0
q0 q1 q0 q1 q0
1

q1 q1 q0

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 39


Regular Expression

• A regular expression is a mathematical representation of a pattern which is


used to formulate tokens of any high level programming language.
• Regular Expressions can be:
–  is a regular Expression (null move)
– Every ‘a’ belonging to ∑ is a regular expression
– Let R is a regular expression then
• R* (Kleen’s Closure): Zero or more occurrence of R is a regular
expression.
• R+ : one or more occurrence of R is a regular expression.
• R?: at most one occurrence of R is a regular expression.
– Let R and S are regular expressions then
• R/S or R+S: R or S is a regular expression.
• R.S: concatenation is a regular expression.

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 40


Regular Expression to NFA Conversion
• Thompson’s Method: 
•  is a regular Expression (null move)
• Every ‘a’ belonging to ∑ is a regular expression a


• R.S is a regular expression R S

• R+S is a regular expression  R 


 S 

• R* is a regular expression  R 


08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 41
Precedence of Operators in Regular
Expression

S.No. Operator Operator Precedence


Symbol Name Priority
1 () Parenthesis 1
2 *, +, ? Unary 2
Operator
3 . Concatenation 3
4 / OR 4

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 42


Example of Regular Expression to NFA

• Consider the regular expression (1 + 0)*.1 and construct equivalent


NFA.
• By precedence:
– (1+0)
– (1+0)*
– 1
– (1+0)*.1
ε
1
ε
ε ε ε 1
ε
0 ε
ε
ε

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 43


NFA With Null Moves to DFA Conversion

Some definition used in the method


•ε-Closure (S): Set of states reachable from state S via epsilon.
•ε-Closure (T): Set of states reachable from any state in set T via
epsilon.
•move (T, a): Set of states to which there is an NFA transition from
states in T on a symbol a.
•D_states: Set of states in equivalent DFA.
•D_Tran: Transition table of equivalent DFA.

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 44


NFA With Null Moves to DFA Conversion

• Subset Construction Method: (ε-Closure Method)


– Input: NFA with ε moves.
– Output: Equivalent DFA
– Algorithm:
Begin
Initially, add ε-Closure (S0) in D_Trans { Where S0 is initial state of NFA}
   for every unmarked state T in D_states
    mark T
 for each input symbol 'a‘ belonging to ∑
        do U = ε-Closure (T, a)
           If U is not in D_states
                then add U to D_states
           add D_Trans [T, a] = U
End
08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 45
Example of NFA with ε moves to DFA Conversion
ε
1
2 3 ε
ε ε ε 1
ε 6 7 8 9
0 1
0 ε
ε 4 5
ε D_Tran

D_States ∑ 0 1
Move(A,0) = { 5 }
Move(A,1) = { 3,9 } ε-Closure(0) = {0,1,2,4,7,8} =A B C
Move(B,0) = { 5 }
Move(B,1) = { 3 ,9} ε-Closure(5) = {1,2,4,5,6,7,8} =B B C
Move(C,0) = { 5 } B C
ε-Closure(3,9)= {1,2,3,4,6,7,8,9} = C
Move(C,1) = { 3 ,9}

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 46


Regular Expressions to Finite Automata

NFA

DFA
Regular
expressions

Table-driven
Lexical Implementation of DFA
Specification

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 47


Lexical Analysis

• Interface diagram of Lexical Analysis

token
Source Lexical Syntax To semantic
program Analyzer Analysis analysis
getNextToken

Symbol
table

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 48


Tokens, Patterns and Lexemes

• Tokens- Sequence of characters that have a collective meaning.


• Patterns- There is a set of strings in the input for which the same
token is produced as output. This set of strings is described by a
rule called a pattern associated with the token
• Lexeme- A sequence of characters in the source program that is
matched by the pattern for a token.

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 49


Example of Tokens, Patterns and Lexemes

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 50


Designing of Lexical Analyzer

• List out all the alphabets, characters and tokens with their pattern
allowed in the language:
digit -> [0-9]
Digits -> digit+
number -> digit(.digits)? (E[+-]? Digit)?
letter -> [A-Za-z_]
id -> letter (letter|digit)*
If -> if
Then -> then
Else -> else
Relop -> < | > | <= | >= | = | <>
White_space -> (blank | tab | newline)+

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 51


Designing of Lexical Analyzer
• Construction of state diagram for every token according to
pattern:
• Transition diagrams of relop

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 52


Designing of Lexical Analyzer

• Transition diagram for identifiers

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 53


Designing of Lexical Analyzer

• Transition diagram for unsigned numbers

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 54


Designing of Lexical Analyzer
• Implementation of relop transition Diagram:

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 55


Lexical Analyzer Generator - Lex

• Lex Source
program
• LEX • lex.yy.c
• lex.l Compiler

• lex.yy.c • C • a.out
• compiler

• Sequen
• Input • a.out ce of
stream
tokens

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 56


Lexical Analyzer Generator - Lex

Structure of Lex programs

declarations
%%
translation rules Pattern {Action}
%%
auxiliary functions

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 57


Context free grammars

• Terminals
• Non terminals expression -> expression + term
• Start symbol expression -> expression – term
• productions expression -> term
term -> term * factor
term -> term / factor
term -> factor
factor -> (expression)
factor -> id

Nishant Kumar Hind KCS-502 CD Unit -1


08/11/2021 58
Derivation

• Productions are treated as rewriting rules to generate a string


• Rightmost and leftmost derivations
– E -> E + E | E * E | -E | (E) | id
– Derivations for –(id+id)
• E => -E => -(E) => -(E+E) => -(id+E)=>-(id+id)

Nishant Kumar Hind KCS-502 CD Unit -1


08/11/2021 59
Parse trees

• -(id+id)
• E => -E => -(E) => -(E+E) => -(id+E)=>-(id+id)

Nishant Kumar Hind KCS-502 CD Unit -1


08/11/2021 60
Ambiguity

• For some strings there exist more than one parse tree
• Or more than one leftmost derivation
• Or more than one rightmost derivation
• Example: id+id*id

Nishant Kumar Hind KCS-502 CD Unit -1


08/11/2021 61
BNF

• Backus–Naur form or Backus normal form (BNF) is


a notation technique for context-free grammars.
• It is often used to describe the syntax of  computer programming
languages.
• When describing languages, Backus-Naur form (BNF) is a formal
notation for encoding grammars intended for human consumption.
• Many programming languages, protocols or formats have a BNF
description in their specification.
• Every rule in Backus-Naur form has the following structure:
name ::= expansion
• The symbol ::= means "may expand into" and "may be replaced
with.”

Nishant Kumar Hind KCS-502 CD Unit -1


08/11/2021 62
BNF

• In some texts, a name is also called a non-terminal symbol.


• Every name in Backus-Naur form is surrounded by angle
brackets, < >, whether it appears on the left- or right-hand side of
the rule.
• An expansion is an expression containing terminal symbols and non-
terminal symbols, joined together by sequencing and choice.
• A terminal symbol is a literal like ("+" or "function") or a class of
literals (like integer).
• Simply juxtaposing expressions indicates sequencing.
• A vertical bar | indicates choice.

Nishant Kumar Hind KCS-502 CD Unit -1


08/11/2021 63
BNF
• For example, in BNF, the classic expression grammar is:
<expr> ::= <term> "+" <expr> | <term>
<term> ::= <factor> "*" <term> | <factor>
<factor> ::= "(" <expr> ")" | <const>
<const> ::= integer
• Naturally, we can define a grammar for rules in BNF:
• rule → name ::= expansion
name → < identifier >
expansion → expansion expansion
expansion → expansion | expansion
expansion → name
expansion → terminal
• We might define identifiers as using the regular expression [-A-Za-z_0-9]+.
• A terminal could be a quoted literal (like "+", "switch" or "<<=") or the name of a
class of literals (like integer).
• The name of a class of literals is usually defined by other means, such as a regular
expression or even prose.

Nishant Kumar Hind KCS-502 CD Unit -1


08/11/2021 64
Syntax Analyzer Generator (YACC)

Nishant Kumar Hind KCS-502 CD Unit -1


08/11/2021 65
YACC Specifications

Nishant Kumar Hind KCS-502 CD Unit -1


08/11/2021 66
Faculty Video Links, Youtube & NPTEL Video Links and Online
Courses Details

• Youtube/other Video Links


• https://www.youtube.com/watch?v=WccZQSERfCM
• https://www.youtube.com/watch?v=e-WJJl1Wzc4
• https://www.youtube.com/watch?v=ZMgiwh_Aimw
• https://www.youtube.com/watch?v=jN8zvENdjBg

• NPTEL

• https://youtu.be/trocRZqxZFM
• https://youtu.be/-Ut1b1xEbCo
• https://youtu.be/UMnllso8znw

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 67


Daily Quiz

• What is the output of lexical analyzer?


a. A list of tokens
b. A parse tree
c. Intermediate code
d. Machine code
• which is the permanent data base in the general model of Compiler ?
a. identifier table
b. literal table
c. terminal table
d. source code
• A _________ is a software utility that translates code written in higher
language into a low level language.
a. Text editor
b. Compiler
c. Converter
d. Code optimizer
08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 68
Daily Quiz

• Compiler can check ________ error.


a. Syntax
b. Content
c. Logical
d. Both A and B
• Compiler translates the source code to
a. Machine code
b. Binary code
c. Executable code
d. Both 1 and 2

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 69


Weekly Assignment

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 70


MCQ s
• In a compiler, keywords of a language are recognized during -
a. the code generation
b. parsing of the program
c. the lexical analysis of the program
d. dataflow analysis
• Which of the following is used for grouping of characters into tokens?
a. Parser
b. Code generator
c. Lexical analyser
d. Code generator
• Given the language L = {ab, aa, baa}, which of the following strings are in L*?
1)abaabaaabaa 2) aaaabaaaa 3) baaaaabaaaab 4) baaaaabaa
a. 1,2,3
b. 2,3,4
c. 1,2,4
d. 1,3,4

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 71


MCQ s

• Which one of the following is FALSE?


a. Every NFA can be converted to DFA
b. Every subset of a recursively enumerable set is recursive
c. NFA is a machine.
d. DFA is also a type of NFA.
e. All of the mentioned
f. None of the mentioned
• Number of states of FSM required simulating behaviour of a computer with a
memory capable of storing “m” words, each of length ‘n’
a. m x 2n
b. 2mn
c. 2(m+n)
d. All of the mentioned

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 72


Old Question Papers

Nishant Kumar Hind KCS-502 CD


08/11/2021 73
Unit -1
Old Question Papers

Nishant Kumar Hind KCS-502 CD


08/11/2021 74
Unit -1
Old Question Papers

Nishant Kumar Hind KCS-502 CD


08/11/2021 75
Unit -1
Old Question Papers

Nishant Kumar Hind KCS-502 CD


08/11/2021 76
Unit -1
Old Question Papers

Nishant Kumar Hind KCS-502 CD


08/11/2021 77
Unit -1
Old Question Papers

Nishant Kumar Hind KCS-502 CD


08/11/2021 78
Unit -1
Old Question Papers

Nishant Kumar Hind KCS-502 CD


08/11/2021 79
Unit -1
Old Question Papers

Nishant Kumar Hind KCS-502 CD


08/11/2021 80
Unit -1
Old Question Papers

Nishant Kumar Hind KCS-502 CD


08/11/2021 81
Unit -1
Old Question Papers

Nishant Kumar Hind KCS-502 CD


08/11/2021 82
Unit -1
Old Question Papers

Nishant Kumar Hind KCS-502 CD


08/11/2021 83
Unit -1
Old Question Papers

Nishant Kumar Hind KCS-502 CD


08/11/2021 84
Unit -1
Old Question Papers

Nishant Kumar Hind KCS-502 CD


08/11/2021 85
Unit -1
Old Question Papers

Nishant Kumar Hind KCS-502 CD


08/11/2021 86
Unit -1
Old Question Papers

Nishant Kumar Hind KCS-502 CD


08/11/2021 87
Unit -1
Old Question Papers

Nishant Kumar Hind KCS-502 CD


08/11/2021 88
Unit -1
Old Question Papers

Nishant Kumar Hind KCS-502 CD


08/11/2021 89
Unit -1
Old Question Papers

Nishant Kumar Hind KCS-502 CD


08/11/2021 90
Unit -1
Old Question Papers

Nishant Kumar Hind KCS-502 CD


08/11/2021 91
Unit -1
Expected Questions for University Exam

• Create a new compiler for machine B which accepts a language L by


using an existing compiler on machine A for language other than L.
• Convert regular expression (a,b)*aba in a NFA by using Thomson
rule and reduce it in equivalent DFA using ε-closure function.
• Explain the implementation of lexical analyzer by taking the
example of identifier.
• What do you mean by translators? Discuss the structure of a
compiler.
• What is LEX compiler? Explain the working and advantage of LEX
compiler.
• Write short note on the following.
(i) Formal Grammar (ii) Left Recursion (iii) Left factoring

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 92


Summary

• Lexical analysis turns input characters into tokens.


• Lexical syntax is described by regular expressions.
• Lexical analysis is the very first phase in the compiler designing
• A lexeme is a sequence of characters that are included in the source program
according to the matching pattern of a token
• Lexical analyzer is implemented to scan the entire source code of the program
• Lexical analyzer helps to identify token into the symbol table
• A character sequence which is not possible to scan into any valid token is a
lexical error
• Removes one character from the remaining input is useful Error recovery
method
• Lexical Analyser scan the input program.
• It eases the process of lexical analysis and the syntax analysis by eliminating
unwanted tokens

Nishant Kumar Hind KCS-502 CD


08/11/2021 93
Unit -1
References

• Principles of Compiler Design Textbook by Alfred Aho and Jeffrey


Ullman
• Principle of Compiler Design, A.V.Aho, Rabi Sethi, J.D.Ullman.
• Compilers: Principles, Techniques and Tools  A.V.Aho, Monica S.
Lam, Rabi Sethi, J.D.Ullman
• https://www.geeksforgeeks.org/compiler-design-tutorials/
• https://www.javatpoint.com/compiler-tutorial
• https://www.tutorialspoint.com/compiler_design/index.htm
• https://nptel.ac.in/courses/106105190/

Nishant Kumar Hind KCS-502 CD


08/11/2021 94
Unit -1
Noida Institute of Engineering and Technology, Greater Noida

Thank You

08/11/2021 Nishant Kumar Hind KCS-502 CD Unit -1 95

You might also like