0% found this document useful (0 votes)

37 views5 pages

Lexical and Syntax Analysis in Compiler Design by Vishal Trivedi

This document discusses lexical and syntax analysis in compiler design. It provides details on how the source code is evaluated in two phases: lexical analysis and syntax analysis. Lexical analysis involves scanning the code and dividing it into tokens. Syntax analysis then analyzes the tokens to determine the syntactic structure of the code according to the rules of the language. The roles of the lexical analyzer or scanner and use of finite automata in lexical analysis are also summarized.

Uploaded by

Itiel López

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views5 pages

Lexical and Syntax Analysis in Compiler Design by Vishal Trivedi

Uploaded by

Itiel López

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

www.ijcrt.

org © 2018 IJCRT | Volume 6, Issue 1 January 2018 | ISSN: 2320-2882

Lexical and Syntax Analysis in Compiler Design

Vishal Trivedi
Gandhinagar Institute of Technology, Gandhinagar, Gujarat, India

Abstract — This Research paper gives brief information on how III. COMPILERS
the source program gets evaluated in Lexical analysis phase of
Compiler reads whole program at a time and generate errors (if
compiler and Syntax analysis phase of compiler. In addition to
that, this paper also explains the concept ofCompiler and Phases of occurred). Compiler generates intermediate code in order to
Compiler. Mainly this paper concentrates on Lexical analysis and generate target code. Once the whole program is checked, errors
Syntax analysis. are displayed. Example of compilers are Borland Compiler,
Turbo C Compiler. Generated target code is easy to understand
Keywords —Token, Lexeme, Identifier, Operator, Operand, after the process of compilation. The process of compilation
Sentinel, Prefix, Derivation, Kleene closure, Positive closure, must be done efficiently. There are mainly two parts of
Terminal, Production rule, Non-terminal, Sentential. compilation process.
[1] Analysis Phase: This phase of compilation process is
I. INTRODUCTION machineindependent. The main objective of analysis phase
is to divide to source code into parts and rearrange these
Whenever we create a source code and start the process of
parts into meaningful structure. The meaning of source
evaluating it, computer only shows the output and errors (if
code is determined and then intermediate code is created
occurred). We don’t know the actual process behind it. In this
from the source program. Analysis phase contains mainly
research paper, the exact procedure and step by step
three sub-phases named lexicalanalysis, syntaxanalysis and
evaluation of source code in Lexical and Syntax Analysis are
semanticanalysis.
explained. In addition to that touched topics are Index Terms,
[2] Synthesis Phase: This phase of compilation process is
Compilers, Phases of Compiler, Operations on grammar,
machinedependent. The intermediate code is taken and
Lexical analysis, Roll of Scanner, Finite automata, Syntax
converted into an equivalent target code. Synthesis phase
analysis, Types of Derivation, Ambiguous grammar, Left
contains mainly three sub-phases named intermediatecode,
recursion, Left factoring, Types of Parsing, Top Down
codeoptimization and codegeneration.
Parsing, Bottom Up Parsing, Error Handling.

II. INDEX TERMS

Token refers to sequence of character having a collective

meaning. Token describes the class or category of input string.
Typical Tokens are Identifiers, Operators, Special symbols,
Constants etc. Pattern refers to the set of rules associated with a
token.Lexeme refers to the sequence of characters in source
code that are matched with the pattern of tokens. Example: int, Fig. 1Compilers
i, num etc. Sentinel refers to the end of buffer or end of token.
Regular expressions used to construct finite automata which is IV. PHASESOFCOMPILER
used to Token recognition. As mentioned above, compiler contains lexical analysis,
syntax analysis, semantic analysis, intermediate code, code
optimization and code generation phases.

IJCRT1801085 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org 634

 There are two pointers in lexical analysis named

Lexemepointer and Forwardpointer.
 In order to perform tokenrecognition, RegularExpressions
are used to construct Finiteautomata which is separate topic
itself.
 Input is sourcecode and output is token.
 Consider an Example:
Input: a=a+b*c*2;
Output: Tokens or tables of tokens

= a
+ b
* c
2

Fig. 2Phases of Compiler

VII. ROLL OF SCANNER

V. OPERATIONS The lexical analyzer is the first phase of compiler. It’s main task
is to read the input characters and produces a sequence of tokens
 Єrefers to Empty string.
as output that parser uses for syntax analysis.
 Λ or ∅refer to Empty set of string.

 | s | refers to Length of a string.

 Union of L and M written as L U M or L + M

refer to {s | s is in L or s is in M}.

 Concatenation of L and M written as L M Fig. 3Roll of Lexical Analyzer

refers to {st | s is in L and t is in M}.
VIII. FINITE AUTOMATA
 Kleeneclosure of L written as L*
refers to Zero or More occurrences of L. We compile a regular expression into a recognizer by
 Positiveclosure of L written as L+ constructing a generalized transition diagram called a
refers to One or More occurrences of L. finiteautomaton. A finite automata or finitestatemachine is a 5-
tuple (S, ∑, S0, F, δ) where S is finite set of states, ∑ is finite
VI. LEXICAL ANALYSIS alphabet of input symbol, S0is initial state, Fis set of accepting

 Lexical Analysis is first phase of compiler. states, δ is a transition function. There are two types of finite

 Lexical Analysis is also known as Linear Analysis or automata.

Scanning. [1] Deterministic finite automata (DFA) :

 First of all, lexical analyzer scans the whole program and For each state, DFA has exactly one edge leaving out for

divide it into Token. Token refers to the string with each symbol.In the theoryofcomputation, a branch of

meaning. Token describes the class or category of input theoretical computer science, a

string. Example: Identifiers, Keywords, Constants etc. deterministicfiniteautomaton also known as a

 Sentinel refers to the end of buffer or end of token. deterministicfiniteacceptor.

 Pattern refers to set of rules that describes the token. Deterministicfinitestatemachine(DFSM)is a finite-state

 Lexemes refers to the sequence of characters in source code machine that accepts and rejects strings of symbols and

that are matched with the pattern of tokens. Example: int, i, only produces a unique computation of the automaton

num etc. for each input string.Deterministic refers to the

IJCRT1801085 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org 635

uniqueness of the computation. * C

2
[2] Nondeterministic finite automata (NFA) :
There are norestrictions on the edges leaving a state. Output:
There can be several with the same symbol as label
and some edges can be labeled with ε.A
nondeterministicfiniteautomaton(NFA) or
nondeterministicfinitestatemachine does not need to
obey these restrictions. In particular, every DFA is
also an NFA. Sometimes the term NFA is used in a
Fig. 5Syntax Tree
narrower sense, referring to a NDFA that is not a
DFA. X. TYPES OF DERIVATION
IX. SYNTAX ANALYSIS There are mainly two types of derivations which are

 Syntax analysis is also known as syntacticalanalysis Leftmostderivation and Rightmostderivation. Let’sconsider the

or parsing or hierarchicalanalysis. grammar with the production S ->S+S | S-S | S*S | S/S |(S))| a

 Syntax refers to the arrangement of words and [1] Leftmost derivation :

phrases to create well-formed sentences in a  A derivation of a string W in a grammar G is a left most

language. derivation if at every step the leftmostnon-terminal

 Tokens generated by lexical analyzer are grouped isreplaced.

together to form a hierarchical structure which is  Consider string : a*a-a

known as syntaxtreewhich is less detailed. S ->S-S

S*S-S
a*S-S
a*a-S
a*a-a
 Equivalent left most derivation tree
S

[2] Rightmost derivation :

 A derivation of a string W in a grammar G is a right
most derivation if at every step the rightmostnon-
Fig. 4 Lexical and Syntax Analyzer
terminal isreplaced.
 Input is token and output is syntaxtree.  Consider string: a-a/a
 Grammatical errors are checked during this phase. S ->S-S
Example: Parenthesis missing, semicolon S-S/S
missing,syntax errors etc. S-S/a
 For above given example: S-a/a
Input: tokens or tables of tokens a-a/a
 Equivalent Right most derivation tree
= A
+ B

IJCRT1801085 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org 636

XI. AMBIGUIOUS GRAMMER

An ambiguous grammar is one that produces more

than one leftmost or more than one rightmost derivation for
the samesentence. In general, ambiguous grammar can
generate more than one parse tree.
Fig. 6Types of Parsing Techniques
S -> S+S S -> S+S
a+S S+S+S XV. TOP DOWNPARSING
a+S+S a+S+S
 Root to leaves
a+a+S a+a+S
 LL Parser
a+a+a a+a+a
 Left most derivation
 Derivation Process ( Sentential )
 Less Complex
 Simple to implement
 Doesn’t work with NFA
 Doesn’t support recursion
 Common prefix not supported
XII. LEFT RECURSION
 Applicable to small languages
Left hand side of terminal in right hand side of  i.e. E
production rule is same as non-terminal on left hand side of
production rule. i.e. A -> Aa|b. Left recursion should not be
there in grammar or production rule. In order to remove this id + id + id
leftrecursion, convert it into rightrecursion.
XVI. BOTTOM UP PARSING
A -> bA'
A’ -> aA'|Є  Leaves to root
 LR parser
XIII. LEFT FACTORING  Right most derivation

Left factoring is kind of same as commonprefix. i.e.A -  Reduction process

> aB1|aB2|aB3. Left factoring should not be there in grammaror  High complex

production rule. To remove this left factoring,  Complex to implement

A -> aE  Works with NFA

E -> B1|B2|B3  Supports recursion

 Common prefix supported

XIV. TYPES OF PARSING  Applicable to broad class of languages

i.e. id + id + id
There are mainly two types of parsing techniques.
[1] Top Down Parsing
[2] Bottom Up Parsing
E

IJCRT1801085 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org 637

XVII. ERROR HANDLING REFERENCES

Each and every phase of compiler detects errors which [1] Wikipedia - Available on :
must be reported to error handler whose task is to handle the https://en.wikipedia.org/wiki/Nondeterministic_finite_automaton
errors so that compilation can proceed. Lexical errorscontain https://en.wikipedia.org/wiki/Deterministic_finite_automaton

spelling errors, exceeding length of identifier or numeric https://en.wikipedia.org/wiki/Compiler

constants, appearance of illegal characters etc. Syntax errors [2] Diagrams and Flowcharts – Available on : https://www.draw.io/s
contains errors in structure, missing operators, missing [3] Vishal Trivedi – ―Life Cycle of Source Program – Compiler
parenthesis etc. Semantic errorscontain incompatible types of
Design‖ – International Journal of Creative Research and Thoughts
operands, undeclared variables, not matching of actual
– Volume 5 – Issue 4 November 2017 – Paper ID : IJCRT1704159
arguments with formal arguments etc. There are various
strategies to recover the errors which can be implement by – ISSN : 2320-2882

analyzers. [4] Mrs. Anuradha A. Puntambekar – ―Compiler Design‖ - Technical

Publication – Second Revised Edition August 2016

[5] Darshan Institute of Engineering and Technology – Study Materials

Available on :

http://www.darshan.ac.in/Upload/DIET/Documents/CE/2170701_C
D_Sem%207_GTU_Study%20Material_15112016_100740AM.pdf

[6] Tutorials Point – Available on :

https://www.tutorialspoint.com/compiler_design/compiler_design_s

Fig. 7Error Handler ymbol_table.htm

[7] Dr. Matt Poole and Mr. Christopher Whyley –―Compilers‖ -

XVIII. CONCLUSION
Department of Computer Science – University of Wales Swansea,
To conclude this research, source program has to pass
UK
and parse from all sections of compilers to be converted into
predicted target program. After studying this research paper, [8] Neha Pathapati, Niharika W. M. and Lakshmishree .C –
one can understand the exact procedure and step by step ―Introduction to Compilers‖ – International Journal of Science and
evaluation of source code in Lexical and Syntax Research – Volume 4 – Issue 4 April 2015 - Paper ID: SUB153522
Analysiswhich containIndex Terms, Compilers, Phases of
- ISSN 2319-7064
Compiler, Operations on grammar, Lexical analysis, Roll of
Scanner, Finite automata, Syntax analysis, Types of [9] Charu Arora, Chetna Arora, Monika Jaitwal – ―RESEARCH
Derivation, Ambiguous grammar, Left recursion, Left PAPER ON PHASES OF COMPILER‖ – International Journal of
factoring, Types of Parsing, Top Down Parsing, Bottom Up Innovative Research in Technology – Volume 1 – Issue 5 2014
Parsing, Error Handling.
ISSN : 2349-6002

ACKNOWLEDGMENT [10] Aho, Lam, Sethi, and Ullman – ―Compilers: Principles, Techniques

I am using this opportunity to express my gratitude to and Tools‖ - Second Edition, Pearson, 2014
everyone who supported me in this research. I am thankful for
their aspiring guidance, invaluably constructive criticism and
friendly advice during the research. I am sincerely grateful to
them for sharing their truthful and illuminating views on a
number of issues related to the research work.

IJCRT1801085 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org 638

General EBS Setup
No ratings yet
General EBS Setup
119 pages
RA9292
No ratings yet
RA9292
11 pages
Alexis Reid - Type Specimens
No ratings yet
Alexis Reid - Type Specimens
81 pages
NTC ESD Process Flow and Requirements For Type Approval and Acceptance Certificate Application
No ratings yet
NTC ESD Process Flow and Requirements For Type Approval and Acceptance Certificate Application
33 pages
Notifier FCPS 24S6 FCPS 24S8 Field Charger Power Supply
No ratings yet
Notifier FCPS 24S6 FCPS 24S8 Field Charger Power Supply
44 pages
CC6400 Algorithms 808-891-060103
No ratings yet
CC6400 Algorithms 808-891-060103
683 pages
Erro ORA 00933
No ratings yet
Erro ORA 00933
187 pages
Bhuvaneswar Reddy Pidatala
No ratings yet
Bhuvaneswar Reddy Pidatala
1 page
BTech - 5sem - CE - Booklet - 2022-23-ODD
No ratings yet
BTech - 5sem - CE - Booklet - 2022-23-ODD
37 pages
Nozomi Networks WP Drone Telemetry
No ratings yet
Nozomi Networks WP Drone Telemetry
73 pages
Operations Management: Linear Programming Module B
No ratings yet
Operations Management: Linear Programming Module B
29 pages
Chapter 2 Arrays Iteration Invariants
No ratings yet
Chapter 2 Arrays Iteration Invariants
19 pages
Quectel BC660K-GL TCPIP Application Note V1.1
No ratings yet
Quectel BC660K-GL TCPIP Application Note V1.1
37 pages
Work Breakdown Structure WBS
No ratings yet
Work Breakdown Structure WBS
26 pages
Interview Questions
No ratings yet
Interview Questions
14 pages
Discrete-Event Simulation A F
No ratings yet
Discrete-Event Simulation A F
3 pages
Link Prediction in Multilayer Networks Via Cross-Network Embedding
No ratings yet
Link Prediction in Multilayer Networks Via Cross-Network Embedding
9 pages
310 Unit
No ratings yet
310 Unit
8 pages
Database Management Systems 1
No ratings yet
Database Management Systems 1
7 pages
Metasyntheisis With Max and Jitter
No ratings yet
Metasyntheisis With Max and Jitter
6 pages
Gen Eng P
No ratings yet
Gen Eng P
4 pages
Casework Aime
No ratings yet
Casework Aime
5 pages
Excercise Solution 3-5
No ratings yet
Excercise Solution 3-5
5 pages
Comp nd2 FT
No ratings yet
Comp nd2 FT
5 pages
Upload A Document - Scribd
No ratings yet
Upload A Document - Scribd
4 pages
Texte Robot
No ratings yet
Texte Robot
2 pages
STT - JD Digital Factory Responsible (BE FE) (2) 1
No ratings yet
STT - JD Digital Factory Responsible (BE FE) (2) 1
1 page
Dear Candidate
No ratings yet
Dear Candidate
3 pages
1911 Manufacturer Dates
No ratings yet
1911 Manufacturer Dates
1 page
Types of Inspection Documents As Per en 10204 (2004
No ratings yet
Types of Inspection Documents As Per en 10204 (2004
2 pages
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)

Lexical and Syntax Analysis in Compiler Design by Vishal Trivedi

Uploaded by

Lexical and Syntax Analysis in Compiler Design by Vishal Trivedi

Uploaded by

www.ijcrt.

org © 2018 IJCRT | Volume 6, Issue 1 January 2018 | ISSN: 2320-2882

Lexical and Syntax Analysis in Compiler Design

II. INDEX TERMS

Token refers to sequence of character having a collective

IJCRT1801085 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org 634

 There are two pointers in lexical analysis named

Fig. 2Phases of Compiler

 | s | refers to Length of a string.

 Union of L and M written as L U M or L + M

 Concatenation of L and M written as L M Fig. 3Roll of Lexical Analyzer

 Lexical Analysis is also known as Linear Analysis or automata.

Scanning. [1] Deterministic finite automata (DFA) :

string. Example: Identifiers, Keywords, Constants etc. deterministicfiniteautomaton also known as a

 Sentinel refers to the end of buffer or end of token. deterministicfiniteacceptor.

num etc. for each input string.Deterministic refers to the

IJCRT1801085 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org 635

uniqueness of the computation. * C

 Syntax refers to the arrangement of words and [1] Leftmost derivation :

language. derivation if at every step the leftmostnon-terminal

 Tokens generated by lexical analyzer are grouped isreplaced.

together to form a hierarchical structure which is  Consider string : a*a-a

known as syntaxtreewhich is less detailed. S ->S-S

[2] Rightmost derivation :

IJCRT1801085 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org 636

XI. AMBIGUIOUS GRAMMER

An ambiguous grammar is one that produces more

Left factoring is kind of same as commonprefix. i.e.A -  Reduction process

production rule. To remove this left factoring,  Complex to implement

A -> aE  Works with NFA

E -> B1|B2|B3  Supports recursion

XIV. TYPES OF PARSING  Applicable to broad class of languages

IJCRT1801085 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org 637

XVII. ERROR HANDLING REFERENCES

spelling errors, exceeding length of identifier or numeric https://en.wikipedia.org/wiki/Compiler

analyzers. [4] Mrs. Anuradha A. Puntambekar – ―Compiler Design‖ - Technical

Publication – Second Revised Edition August 2016

[5] Darshan Institute of Engineering and Technology – Study Materials

[6] Tutorials Point – Available on :

Fig. 7Error Handler ymbol_table.htm

[7] Dr. Matt Poole and Mr. Christopher Whyley –―Compilers‖ -

IJCRT1801085 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org 638

You might also like