0% found this document useful (0 votes)
37 views5 pages

Lexical and Syntax Analysis in Compiler Design by Vishal Trivedi

This document discusses lexical and syntax analysis in compiler design. It provides details on how the source code is evaluated in two phases: lexical analysis and syntax analysis. Lexical analysis involves scanning the code and dividing it into tokens. Syntax analysis then analyzes the tokens to determine the syntactic structure of the code according to the rules of the language. The roles of the lexical analyzer or scanner and use of finite automata in lexical analysis are also summarized.

Uploaded by

Itiel López
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views5 pages

Lexical and Syntax Analysis in Compiler Design by Vishal Trivedi

This document discusses lexical and syntax analysis in compiler design. It provides details on how the source code is evaluated in two phases: lexical analysis and syntax analysis. Lexical analysis involves scanning the code and dividing it into tokens. Syntax analysis then analyzes the tokens to determine the syntactic structure of the code according to the rules of the language. The roles of the lexical analyzer or scanner and use of finite automata in lexical analysis are also summarized.

Uploaded by

Itiel López
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

www.ijcrt.

org © 2018 IJCRT | Volume 6, Issue 1 January 2018 | ISSN: 2320-2882

Lexical and Syntax Analysis in Compiler Design


Vishal Trivedi
Gandhinagar Institute of Technology, Gandhinagar, Gujarat, India

Abstract — This Research paper gives brief information on how III. COMPILERS
the source program gets evaluated in Lexical analysis phase of
Compiler reads whole program at a time and generate errors (if
compiler and Syntax analysis phase of compiler. In addition to
that, this paper also explains the concept ofCompiler and Phases of occurred). Compiler generates intermediate code in order to
Compiler. Mainly this paper concentrates on Lexical analysis and generate target code. Once the whole program is checked, errors
Syntax analysis. are displayed. Example of compilers are Borland Compiler,
Turbo C Compiler. Generated target code is easy to understand
Keywords —Token, Lexeme, Identifier, Operator, Operand, after the process of compilation. The process of compilation
Sentinel, Prefix, Derivation, Kleene closure, Positive closure, must be done efficiently. There are mainly two parts of
Terminal, Production rule, Non-terminal, Sentential. compilation process.
[1] Analysis Phase: This phase of compilation process is
I. INTRODUCTION machineindependent. The main objective of analysis phase
is to divide to source code into parts and rearrange these
Whenever we create a source code and start the process of
parts into meaningful structure. The meaning of source
evaluating it, computer only shows the output and errors (if
code is determined and then intermediate code is created
occurred). We don’t know the actual process behind it. In this
from the source program. Analysis phase contains mainly
research paper, the exact procedure and step by step
three sub-phases named lexicalanalysis, syntaxanalysis and
evaluation of source code in Lexical and Syntax Analysis are
semanticanalysis.
explained. In addition to that touched topics are Index Terms,
[2] Synthesis Phase: This phase of compilation process is
Compilers, Phases of Compiler, Operations on grammar,
machinedependent. The intermediate code is taken and
Lexical analysis, Roll of Scanner, Finite automata, Syntax
converted into an equivalent target code. Synthesis phase
analysis, Types of Derivation, Ambiguous grammar, Left
contains mainly three sub-phases named intermediatecode,
recursion, Left factoring, Types of Parsing, Top Down
codeoptimization and codegeneration.
Parsing, Bottom Up Parsing, Error Handling.

II. INDEX TERMS

Token refers to sequence of character having a collective


meaning. Token describes the class or category of input string.
Typical Tokens are Identifiers, Operators, Special symbols,
Constants etc. Pattern refers to the set of rules associated with a
token.Lexeme refers to the sequence of characters in source
code that are matched with the pattern of tokens. Example: int, Fig. 1Compilers
i, num etc. Sentinel refers to the end of buffer or end of token.
Regular expressions used to construct finite automata which is IV. PHASESOFCOMPILER
used to Token recognition. As mentioned above, compiler contains lexical analysis,
syntax analysis, semantic analysis, intermediate code, code
optimization and code generation phases.

IJCRT1801085 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org 634


www.ijcrt.org © 2018 IJCRT | Volume 6, Issue 1 January 2018 | ISSN: 2320-2882

 There are two pointers in lexical analysis named


Lexemepointer and Forwardpointer.
 In order to perform tokenrecognition, RegularExpressions
are used to construct Finiteautomata which is separate topic
itself.
 Input is sourcecode and output is token.
 Consider an Example:
Input: a=a+b*c*2;
Output: Tokens or tables of tokens

= a
+ b
* c
2

Fig. 2Phases of Compiler


VII. ROLL OF SCANNER

V. OPERATIONS The lexical analyzer is the first phase of compiler. It’s main task
is to read the input characters and produces a sequence of tokens
 Єrefers to Empty string.
as output that parser uses for syntax analysis.
 Λ or ∅refer to Empty set of string.

 | s | refers to Length of a string.

 Union of L and M written as L U M or L + M


refer to {s | s is in L or s is in M}.

 Concatenation of L and M written as L M Fig. 3Roll of Lexical Analyzer


refers to {st | s is in L and t is in M}.
VIII. FINITE AUTOMATA
 Kleeneclosure of L written as L*
refers to Zero or More occurrences of L. We compile a regular expression into a recognizer by
 Positiveclosure of L written as L+ constructing a generalized transition diagram called a
refers to One or More occurrences of L. finiteautomaton. A finite automata or finitestatemachine is a 5-
tuple (S, ∑, S0, F, δ) where S is finite set of states, ∑ is finite
VI. LEXICAL ANALYSIS alphabet of input symbol, S0is initial state, Fis set of accepting

 Lexical Analysis is first phase of compiler. states, δ is a transition function. There are two types of finite

 Lexical Analysis is also known as Linear Analysis or automata.

Scanning. [1] Deterministic finite automata (DFA) :

 First of all, lexical analyzer scans the whole program and For each state, DFA has exactly one edge leaving out for

divide it into Token. Token refers to the string with each symbol.In the theoryofcomputation, a branch of

meaning. Token describes the class or category of input theoretical computer science, a

string. Example: Identifiers, Keywords, Constants etc. deterministicfiniteautomaton also known as a

 Sentinel refers to the end of buffer or end of token. deterministicfiniteacceptor.

 Pattern refers to set of rules that describes the token. Deterministicfinitestatemachine(DFSM)is a finite-state

 Lexemes refers to the sequence of characters in source code machine that accepts and rejects strings of symbols and

that are matched with the pattern of tokens. Example: int, i, only produces a unique computation of the automaton

num etc. for each input string.Deterministic refers to the

IJCRT1801085 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org 635


www.ijcrt.org © 2018 IJCRT | Volume 6, Issue 1 January 2018 | ISSN: 2320-2882

uniqueness of the computation. * C


2
[2] Nondeterministic finite automata (NFA) :
There are norestrictions on the edges leaving a state. Output:
There can be several with the same symbol as label
and some edges can be labeled with ε.A
nondeterministicfiniteautomaton(NFA) or
nondeterministicfinitestatemachine does not need to
obey these restrictions. In particular, every DFA is
also an NFA. Sometimes the term NFA is used in a
Fig. 5Syntax Tree
narrower sense, referring to a NDFA that is not a
DFA. X. TYPES OF DERIVATION
IX. SYNTAX ANALYSIS There are mainly two types of derivations which are

 Syntax analysis is also known as syntacticalanalysis Leftmostderivation and Rightmostderivation. Let’sconsider the

or parsing or hierarchicalanalysis. grammar with the production S ->S+S | S-S | S*S | S/S |(S))| a

 Syntax refers to the arrangement of words and [1] Leftmost derivation :


phrases to create well-formed sentences in a  A derivation of a string W in a grammar G is a left most

language. derivation if at every step the leftmostnon-terminal

 Tokens generated by lexical analyzer are grouped isreplaced.

together to form a hierarchical structure which is  Consider string : a*a-a

known as syntaxtreewhich is less detailed. S ->S-S


S*S-S
a*S-S
a*a-S
a*a-a
 Equivalent left most derivation tree
S

[2] Rightmost derivation :


 A derivation of a string W in a grammar G is a right
most derivation if at every step the rightmostnon-
Fig. 4 Lexical and Syntax Analyzer
terminal isreplaced.
 Input is token and output is syntaxtree.  Consider string: a-a/a
 Grammatical errors are checked during this phase. S ->S-S
Example: Parenthesis missing, semicolon S-S/S
missing,syntax errors etc. S-S/a
 For above given example: S-a/a
Input: tokens or tables of tokens a-a/a
 Equivalent Right most derivation tree
= A
+ B

IJCRT1801085 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org 636


www.ijcrt.org © 2018 IJCRT | Volume 6, Issue 1 January 2018 | ISSN: 2320-2882

XI. AMBIGUIOUS GRAMMER

An ambiguous grammar is one that produces more


than one leftmost or more than one rightmost derivation for
the samesentence. In general, ambiguous grammar can
generate more than one parse tree.
Fig. 6Types of Parsing Techniques
S -> S+S S -> S+S
a+S S+S+S XV. TOP DOWNPARSING
a+S+S a+S+S
 Root to leaves
a+a+S a+a+S
 LL Parser
a+a+a a+a+a
 Left most derivation
 Derivation Process ( Sentential )
 Less Complex
 Simple to implement
 Doesn’t work with NFA
 Doesn’t support recursion
 Common prefix not supported
XII. LEFT RECURSION
 Applicable to small languages
Left hand side of terminal in right hand side of  i.e. E
production rule is same as non-terminal on left hand side of
production rule. i.e. A -> Aa|b. Left recursion should not be
there in grammar or production rule. In order to remove this id + id + id
leftrecursion, convert it into rightrecursion.
XVI. BOTTOM UP PARSING
A -> bA'
A’ -> aA'|Є  Leaves to root
 LR parser
XIII. LEFT FACTORING  Right most derivation

Left factoring is kind of same as commonprefix. i.e.A -  Reduction process

> aB1|aB2|aB3. Left factoring should not be there in grammaror  High complex

production rule. To remove this left factoring,  Complex to implement

A -> aE  Works with NFA

E -> B1|B2|B3  Supports recursion


 Common prefix supported

XIV. TYPES OF PARSING  Applicable to broad class of languages


i.e. id + id + id
There are mainly two types of parsing techniques.
[1] Top Down Parsing
[2] Bottom Up Parsing
E

IJCRT1801085 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org 637


www.ijcrt.org © 2018 IJCRT | Volume 6, Issue 1 January 2018 | ISSN: 2320-2882

XVII. ERROR HANDLING REFERENCES

Each and every phase of compiler detects errors which [1] Wikipedia - Available on :
must be reported to error handler whose task is to handle the https://en.wikipedia.org/wiki/Nondeterministic_finite_automaton
errors so that compilation can proceed. Lexical errorscontain https://en.wikipedia.org/wiki/Deterministic_finite_automaton

spelling errors, exceeding length of identifier or numeric https://en.wikipedia.org/wiki/Compiler

constants, appearance of illegal characters etc. Syntax errors [2] Diagrams and Flowcharts – Available on : https://www.draw.io/s
contains errors in structure, missing operators, missing [3] Vishal Trivedi – ―Life Cycle of Source Program – Compiler
parenthesis etc. Semantic errorscontain incompatible types of
Design‖ – International Journal of Creative Research and Thoughts
operands, undeclared variables, not matching of actual
– Volume 5 – Issue 4 November 2017 – Paper ID : IJCRT1704159
arguments with formal arguments etc. There are various
strategies to recover the errors which can be implement by – ISSN : 2320-2882

analyzers. [4] Mrs. Anuradha A. Puntambekar – ―Compiler Design‖ - Technical

Publication – Second Revised Edition August 2016

[5] Darshan Institute of Engineering and Technology – Study Materials

Available on :

http://www.darshan.ac.in/Upload/DIET/Documents/CE/2170701_C
D_Sem%207_GTU_Study%20Material_15112016_100740AM.pdf

[6] Tutorials Point – Available on :

https://www.tutorialspoint.com/compiler_design/compiler_design_s

Fig. 7Error Handler ymbol_table.htm

[7] Dr. Matt Poole and Mr. Christopher Whyley –―Compilers‖ -


XVIII. CONCLUSION
Department of Computer Science – University of Wales Swansea,
To conclude this research, source program has to pass
UK
and parse from all sections of compilers to be converted into
predicted target program. After studying this research paper, [8] Neha Pathapati, Niharika W. M. and Lakshmishree .C –
one can understand the exact procedure and step by step ―Introduction to Compilers‖ – International Journal of Science and
evaluation of source code in Lexical and Syntax Research – Volume 4 – Issue 4 April 2015 - Paper ID: SUB153522
Analysiswhich containIndex Terms, Compilers, Phases of
- ISSN 2319-7064
Compiler, Operations on grammar, Lexical analysis, Roll of
Scanner, Finite automata, Syntax analysis, Types of [9] Charu Arora, Chetna Arora, Monika Jaitwal – ―RESEARCH
Derivation, Ambiguous grammar, Left recursion, Left PAPER ON PHASES OF COMPILER‖ – International Journal of
factoring, Types of Parsing, Top Down Parsing, Bottom Up Innovative Research in Technology – Volume 1 – Issue 5 2014
Parsing, Error Handling.
ISSN : 2349-6002

ACKNOWLEDGMENT [10] Aho, Lam, Sethi, and Ullman – ―Compilers: Principles, Techniques

I am using this opportunity to express my gratitude to and Tools‖ - Second Edition, Pearson, 2014
everyone who supported me in this research. I am thankful for
their aspiring guidance, invaluably constructive criticism and
friendly advice during the research. I am sincerely grateful to
them for sharing their truthful and illuminating views on a
number of issues related to the research work.

IJCRT1801085 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org 638

You might also like