Syntax Analysis Introduction: Distributed Computing, M. L. Liu 1
Syntax Analysis Introduction: Distributed Computing, M. L. Liu 1
Distributed Computing, M. L. Liu 1
Syntax Analysis phase
• Comes in analysis portion of the compiler
• Grouping of tokens into grammatical phases
• Hierarchical analysis
• Called as parsing
• Uses rules that prescribe syntactic structure
of well formed programs
Distributed Computing, M. L. Liu 2
Syntax Analysis
• Done by the syntax analyzer of the compiler
• Syntax analyser is also known as Parser
• Creates syntactic structure of the source prg.
• The syntactic structure is a parse tree
Distributed Computing, M. L. Liu 3
Syntax Analysis
Distributed Computing, M. L. Liu 4
Advs of CFG
• Precise, easytounderstand and syntactic
specification of a prgmng language
• Effcient parser from properly designed
grammar
• Translation of source prog into correct object
code
• New constructs can be added to the
language
Distributed Computing, M. L. Liu 5
Syntax Analyzer
Distributed Computing, M. L. Liu 6
Syntax Analyzer
Distributed Computing, M. L. Liu 7
Role of the Parser
• Obtains string of tokens from lexical analyzer
• Verifies the string
• Reports errors
• Recovers from common errors
• Collect infn about tokens into symbol tables
• Type checking and semantic analysis.
• Construct a syntax tree in preparation for
intermediate code generation.
Distributed Computing, M. L. Liu 8
Parser in Compiler model
Distributed Computing, M. L. Liu 9
Types of Parsers
• Universal Parsing inefficient
• Top Down Parsing
• Bottom Up Parsing
Distributed Computing, M. L. Liu 10
Syntax Errors Handling
• Programming errors occur at diff levels
– Lexical Errors
• misspelling of identifiers, keywords, or
operators, and
• missing quotes around text intended as string
– Syntactic Errors
• misplaced semicolons or extra or missing
braces;
• In C or Java, the appearance of case
statement without an enclosing switch
Distributed Computing, M. L. Liu 11
Syntax Errors Handling
Programming errors occur at diff levels
– Semantic Errors
• type mismatches between operators and operands
• A return statement in Java method with result type
void
– Logical Errors
• Anything from incorrect reasoning on the part of the
programmer e.g the use in a C program of the
assignment operator = instead of comparison
operator ==.
Distributed Computing, M. L. Liu 12
Syntax Error Handler
The error handler in a parser has goals that are
simple to state but challenging to realize:
– Report the presence of errors clearly and
accurately.
– Recover from each error quickly enough to
detect subsequent errors.
– Add minimal overhead to the processing of
correct programs.
Distributed Computing, M. L. Liu 13
Error recovery strategies
• Panic mode
• Phrase level
• Error productions
• Global corrections
Distributed Computing, M. L. Liu 14
Context Free Grammar
• The prgming language constructs have inherently
recursive structures.
• These recursive structures can be defined by context
free grammars
• A Context Free Grammar (CFG) is a set of recursive
rewriting rules (or productions) used to generate
patterns of strings.
• CFG consists of
– Terminal
– Nonterminals
– Start symbol
– ProductionsDistributed Computing, M. L. Liu 15
Contextfree Grammars
• Terminals are basic symbols from which strings
are Formed.
– Token is a terminal (if, then, else are terminals)
• Nonterminals are syntactic structure that denote
sets of strings.
– (syntacticvariables)
• One nonterminal is distinguished as start symbol.
• Production specify the manner in which the
terminals and nonterminals can be combined to
form strings.
Distributed Computing, M. L. Liu 16
Contextfree Grammars
• Each production can consists of nonterminal
followed by an arrow and followed by a string of
nonterminals and terminals
• A production rule is in the following form
– A → α
– where A is a nonterminal and
– α is a string of terminals and nonterminals
(including the empty string)
Distributed Computing, M. L. Liu 17
Context Free Grammars
• A contextfree grammar G is a 4tuple
G = (V, T, P, S), where:
– V is a finite set of variables (or nonterminals).
These describe sets of “related” strings.
– T is a finite set of terminals (i.e., tokens).
– P is a finite set of productions, each of the form A
→ α where A ∈ V is a variable, and α ∈ (V ∪
T)* is a sequence of terminals and nonterminals.
– S ∈ V is the start symbol.
Distributed Computing, M. L. Liu 18
CFG for simple arithmetic
expressions
expr → expr op expr
expr → (expr)
expr → expr
expr → id
op → + op → op → *
op → / op → ^
Terminals id, +, , *, /, ^, (, )
Nonterminals expr, op,
expr start symbol
Distributed Computing, M. L. Liu 19
Notational Conventions
E → E op E
E → (E)
E → E
E → id
op → + op → op → *
op → / op → ^
Distributed Computing, M. L. Liu 20
Distributed Computing, M. L. Liu 21