0% found this document useful (0 votes)
23 views14 pages

Compiler Design Unit-2

Syntax analysis, or parsing, is the second phase of compiler design that checks if source code adheres to grammatical rules, creating a parse tree or abstract syntax tree (AST) to represent the code's structure. It utilizes various parsing algorithms such as LL, LR, and LALR, and is responsible for error detection, intermediate code generation, and facilitating semantic analysis. While it enhances code generation and structural validation, syntax analysis can be complex and may lead to performance issues and limited error recovery.

Uploaded by

sukritisinha147
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views14 pages

Compiler Design Unit-2

Syntax analysis, or parsing, is the second phase of compiler design that checks if source code adheres to grammatical rules, creating a parse tree or abstract syntax tree (AST) to represent the code's structure. It utilizes various parsing algorithms such as LL, LR, and LALR, and is responsible for error detection, intermediate code generation, and facilitating semantic analysis. While it enhances code generation and structural validation, syntax analysis can be complex and may lead to performance issues and limited error recovery.

Uploaded by

sukritisinha147
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

UNIT-2

Compiler Design

Syntax Analysis:- Syntax analysis, also known as parsing, is a


process in compiler design where the compiler checks if the
source code follows the grammatical rules of the programming
language. This is typically the second stage of the compilation
process, following lexical analysis.

When an input string (source code or a program in


some language) is given to a compiler, the compiler
processes it in several phases, starting from lexical
analysis (scans the input and divides it into tokens)
to target code generation.
Syntax Analysis or Parsing is the second phase, i.e.
after lexical analysis. It checks the syntactical
structure of the given input, i.e. whether the given
input is in the correct syntax (of the language in
which the input has been written) or not. It does so
by building a data structure, called a Parse tree or
Syntax tree. The parse tree is constructed by using
the pre-defined Grammar of the language and the
input string. If the given input string can be
produced with the help of the syntax tree (in the
derivation process), the input string is found to be in
the correct syntax. if not, the error is reported by the
syntax analyzer.
Syntax analysis, also known as parsing, is a process in
compiler design where the compiler checks if the source
code follows the grammatical rules of the programming
language. This is typically the second stage of the
compilation process, following lexical analysis.
The main goal of syntax analysis is to create a parse tree
or abstract syntax tree (AST) of the source code, which is
a hierarchical representation of the source code that
reflects the grammatical structure of the program.
There are several types of parsing algorithms
used in syntax analysis, including:
 LL parsing: This is a top-down parsing algorithm

that starts with the root of the parse tree and


constructs the tree by successively expanding
non-terminals. LL parsing is known for its
simplicity and ease of implementation.

 LR parsing: This is a bottom-up parsing algorithm


that starts with the leaves of the parse tree and
constructs the tree by successively reducing
terminals. LR parsing is more powerful than LL
parsing and can handle a larger class of
grammars.

 LR(1) parsing: This is a variant of LR parsing that


uses lookahead to disambiguate the grammar.
 LALR parsing: This is a variant of LR parsing that
uses a reduced set of lookahead symbols to
reduce the number of states in the LR parser.

 Once the parse tree is constructed, the compiler


can perform semantic analysis to check if the
source code makes sense and follows the
semantics of the programming language.

 The parse tree or AST can also be used in the


code generation phase of the compiler design to
generate intermediate code or machine code .
Features of syntax analysis:

Syntax Trees: Syntax analysis creates a syntax


tree, which is a hierarchical representation of the
code’s structure. The tree shows the relationship
between the various parts of the code, including
statements, expressions, and operators .
Context-Free Grammar: Syntax analysis uses
context-free grammar to define the syntax of the
programming language. Context-free grammar is a
formal language used to describe the structure of
programming languages.
Top-Down and Bottom-Up Parsing: Syntax analysis can be
performed using two main approaches: top-down parsing and
bottom-up parsing. Top-down parsing starts from the highest
level of the syntax tree and works its way down, while bottom-up
parsing starts from the lowest level and works its way up.

Error Detection: Syntax analysis is responsible for


detecting syntax errors in the code. If the code does not
conform to the rules of the programming language, the
parser will report an error and halt the compilation
process.

Intermediate Code Generation: Syntax analysis


generates an intermediate representation of the
code, which is used by the subsequent phases of
the compiler. The intermediate representation is
usually a more abstract form of the code, which is
easier to work with than the original source code .
Optimization: Syntax analysis can perform basic
optimizations on the code, such as removing
redundant code and simplifying expressions.
The pushdown automata (PDA) is used to design
the syntax analysis phase.
The Grammar for a Language consists of
Production rules.
Example: Suppose Production rules for the
Grammar of a language are:
S -> cAd
A -> bc|a
And the input string is “cad”.
Now the parser attempts to construct a syntax tree from this grammar for the
given input string. It uses the given production rules and applies those as
needed to generate the string. To generate string “cad” it uses the rules as
shown in the given diagram:

In step (iii) above, the production rule A->bc was not a suitable one to apply
(because the string produced is “cbcd” not “cad”), here the parser needs to
backtrack, and apply the next production rule available with A which is shown in
step (iv), and the string “cad” is produced.
Thus, the given input can be produced by the given grammar, therefore the
input is correct in syntax. But backtrack was needed to get the correct syntax
tree, which is really a complex process to implement.
There can be an easier way to solve this, which we shall see in the next article
“Concepts of FIRST and FOLLOW sets in Compiler Design”.

Advantages :
 Advantages of using syntax analysis in compiler design include:
 Structural validation: Syntax analysis allows the compiler to check if the
source code follows the grammatical rules of the programming language,
which helps to detect and report errors in the source code.
 Improved code generation: Syntax analysis can generate a parse tree or
abstract syntax tree (AST) of the source code, which can be used in the
code generation phase of the compiler design to generate more efficient and
optimized code.
 Easier semantic analysis: Once the parse tree or AST is constructed, the
compiler can perform semantic analysis more easily, as it can rely on the
structural information provided by the parse tree or AST.

Disadvantages:

 Disadvantages of using syntax analysis in compiler design include:


 Complexity: Parsing is a complex process, and the quality of the parser can
greatly impact the performance of the resulting code. Implementing a parser
for a complex programming language can be a challenging task, especially
for languages with ambiguous grammars.
 Reduced performance: Syntax analysis can add overhead to the compilation
process, which can reduce the performance of the compiler.
 Limited error recovery: Syntax analysis algorithms may not be able to
recover from errors in the source code, which can lead to incomplete or
incorrect parse trees and make it difficult for the compiler to continue the
compilation process.
 Inability to handle all languages: Not all languages have formal grammars,
and some languages may not be easily parseable.
 Overall, syntax analysis is an important stage in the compiler design
process, but it should be balanced against the goals and
Quiz on Syntax Analysis

Syntax analysis, also known as parsing, is a crucial stage in the process of


compiling a program. Its primary task is to analyze the structure of the input
program and check whether it conforms to the grammar rules of the
programming language. This process involves breaking down the input
program into a series of tokens and then constructing a parse tree or abstract
syntax tree (AST) that represents the hierarchical structure of the program.

The syntax analysis phase typically involves the following steps:


1. Tokenization: The input program is divided into a sequence of tokens, which
are basic building blocks of the programming language, such as identifiers,
keywords, operators, and literals.
2. Parsing: The tokens are analyzed according to the grammar rules of the
programming language, and a parse tree or AST is constructed that
represents the hierarchical structure of the program.
3. Error handling: If the input program contains syntax errors, the syntax
analyzer detects and reports them to the user, along with an indication of
where the error occurred.
4. Symbol table creation: The syntax analyzer creates a symbol table, which is
a data structure that stores information about the identifiers used in the
program, such as their type, scope, and location.
5. The syntax analysis phase is essential for the subsequent stages of the
compiler, such as semantic analysis, code generation, and optimization. If
the syntax analysis is not performed correctly, the compiler may generate
incorrect code or fail to compile the program altogether

Syntax Directed translation syntax analysis :- A technique of compiler


execution, where the source code translation is totally conducted by
the parser, is known as syntax-directed translation. The parser
primarily uses a Context-free-Grammar to check the input sequence
and deliver output for the compiler's next stage.

What is analysis of syntax directed translation?


Syntax-directed translation refers to a method of compiler implementation where the
source language translation is completely driven by the parser. In other words, the
parsing process and parse trees are used to direct semantic analysis and the translation
of the source program.

What are the two types of SDT schemes?


When the semantic actions are present anywhere on the right side of the production
then it is SDT with action inside the production. It is evaluated and actions are
performed immediately after the left non-terminal is processed. This type of SDT
includes both S-attributed and L-attributed SDTs

What is known as syntax analysis?


Analyzing a sequence of tokens to determine if they form a sentence in the grammar of
the programming language is called syntax analysis. Syntax analysis is often
called parsing.

What are the 4 types of syntax?

Types of Syntax:
 Simple sentences.
 Complex sentences.
 Compound sentences.
 Compound-Complex sentences
context free grammar in compiler design:- A context-
free grammar is a set of recursive rules used to generate patterns of
strings. A context-free grammar can describe all regular
languages and more, but they cannot describe all possible languages.
Context-free grammars are studied in fields of theoretical computer
science, compiler design, and linguistics.
Context Free Grammar is formal grammar, the syntax or structure of a formal language
can be described using context-free grammar (CFG), a type of formal grammar. The
grammar has four tuples: (V,T,P,S).
V - It is the collection of variables or nonterminal symbols.
T - It is a set of terminals.
P - It is the production rules that consist of both terminals
and nonterminals.
S - It is the Starting symbol.

G -> (V∪T)*, where G ∊ V


A grammar is said to be the Context-free grammar if every production is in the form of :

 And the left-hand side of the G, here in the example can only be a Variable, it cannot
be a terminal.
 But on the right-hand side here it can be a Variable or Terminal or both combination
of Variable and Terminal.
Above equation states that every production which contains any combination of the ‘V’
variable or ‘T’ terminal is said to be a context-free grammar.
For example the grammar A = { S, a,b, P,S} having production :
 Here S is the starting symbol.
 {a,b} are the terminals generally represented by small characters.
 P is variable along with S.
S-> aS7-
S-> bSa
but
a->bSa, or
a->ba is not a CFG as on the left-hand side there is a variable
which does not follow the CFGs rule.
In the computer science field, context-free grammars are frequently used, especially in
the areas of formal language theory, compiler development, and natural language
processing. It is also used for explaining the syntax of programming languages and other
formal languages.
Limitations of Context-Free Grammar
Apart from all the uses and importance of Context-Free Grammar in the Compiler design
and the Computer science field, there are some limitations that are addressed, that is
CFGs are less expressive, and neither English nor programming language can be
expressed using Context-Free Grammar. Context-Free Grammar can be ambiguous
means we can generate multiple parse trees of the same input. For some grammar,
Context-Free Grammar can be less efficient because of the exponential time complexity.
And the less precise error reporting as CFGs error reporting system is not that precise that
can give more detailed error messages and information.

Type Name

0 Unrestricted Grammar

1 Context Sensitive Grammar

2 Context Free Grammar

3 Regular Grammar
1. Context Free Grammar :
 Language generated by Context Free Grammar is accepted by Pushdown
Automata
 It is a subset of Type 0 and Type 1 grammar and a superset of Type 3
grammar.
 Also called phase structured grammar.
 Different context-free grammars can generate the same context-free
language.
 Classification of Context Free Grammar is done on the basis of the number
of parse trees.
 Only one parse tree->Unambiguous.
 More than one parse tree->Ambiguous.

Productions are in the form –


A->B;
A∈N i.e A is a non-terminal.
B∈V*(Any string).

Example –
S –> AB
A –> a
B –> b
2. Regular Grammar :
 It is accepted by Finite State Automata.
 It is a subset of Type 0 ,Type 1 and Type 2 grammar.
 The language it generates is called Regular Language.
 Regular languages are closed under operations like Union, Intersection,
Complement etc.
 They are the most restricted form of grammar.
Productions are in the form –
V –> VT / T (left-linear grammar)
(or)
V –> TV /T (right-linear grammar)
Example –
1. S –> ab.
2. S -> aS | bS | ∊

Difference Between Context Free Grammar and Regular Grammar:


Parameter Context Free Grammar Regular Grammar

Type Type-2 Type-3

Recognizer Push-down automata. Finite State Automata

Productions are of the form: Productions are of the form:


A->B; V –> VT / T (left-linear grammar)
Rules
A∈N(Non-Terminal) (or)
B∈V*(Any string) V –> TV /T (right-linear grammar)

Restriction Less than Regular Grammar More than any other grammar

The right-hand side of production


Right-hand The right-hand side of
should be either left linear or right
Side production has no restrictions.
linear.

Super Set of Regular


Set Property Subset of Context Free Grammar
Grammar

Intersection of two CFL need


Intersection Intersection of two RG is a RG.
not be a CFL

They are not closed under


Complement Closed under complement
complement
The range of languages that The range of languages that come
Range
come under CFG is wide. under RG is less than CFG.

Examples S –> AB;A –> a;B –> b S -> aS | bS | ∊

You might also like