Syntax analysis in CC[1]
Syntax analysis in CC[1]
By
Attia Agha
Overview
Phases of CC
Lexical Analysis: The first phase of a compiler is lexical analysis, also known as scanning. This
phase reads the source code and breaks it into a stream of tokens, which are the basic
units of the programming language. The tokens are then passed on to the next phase for
further processing.
Syntax Analysis: The second phase of a compiler is syntax analysis, also known as parsing.
This phase takes the stream of tokens generated by the lexical analysis phase and checks
whether they conform to the grammar of the programming language. The output of this
phase is usually an Abstract Syntax Tree (AST).
Semantic Analysis: The third phase of a compiler is semantic analysis. This phase checks
whether the code is semantically correct, i.e., whether it conforms to the language’s type
system and other semantic rules. In this stage, the compiler checks the meaning of the
source code to ensure that it makes sense. The compiler performs type checking, which
ensures that variables are used correctly and that operations are performed on
compatible data types. The compiler also checks for other semantic errors, such as
undeclared variables and incorrect function calls.
CONT.
Now the parser attempts to construct a syntax tree from this grammar for the
given input string. It uses the given production rules and applies those as
needed to generate the string. To generate string “cad” it uses the rules as
shown in the given diagram:
In step (iii) above, the production rule
A->bc was not a suitable one to apply
(because the string produced is
“cbcd” not “cad”), here the parser
needs to backtrack, and apply the
next production rule available with A
which is shown in step (iv), and the
string “cad” is produced.
Thus, the given input can be produced
by the given grammar, therefore the
input is correct in syntax. But backtrack
was needed to get the correct syntax
tree, which is really a complex process
to implement.
There can be an easier way to solve
this,“Concepts of FIRST and FOLLOW
sets in CC”.
Advantages
Syntax analysis, also known as parsing, is a crucial stage in the process of compiling a
program. Its primary task is to analyze the structure of the input program and check
whether it conforms to the grammar rules of the programming language. This process
involves breaking down the input program into a series of tokens and then constructing a
parse tree or abstract syntax tree (AST) that represents the hierarchical structure of the
program.
The syntax analysis phase typically involves the following steps:
Tokenization: The input program is divided into a sequence of tokens, which are basic
building blocks of the programming language, such as identifiers, keywords, operators,
and literals.
Parsing: The tokens are analyzed according to the grammar rules of the programming
language, and a parse tree or AST is constructed that represents the hierarchical structure
of the program.
Error handling: If the input program contains syntax errors, the syntax analyzer detects and
reports them to the user, along with an indication of where the error occurred.
Cont.
Symbol table creation: The syntax analyzer creates a symbol table, which is
a data structure that stores information about the identifiers used in the
program, such as their type, scope, and location.
The syntax analysis phase is essential for the subsequent stages of the
compiler, such as semantic analysis, code generation, and optimization. If
the syntax analysis is not performed correctly, the compiler may generate
incorrect code or fail to compile the program altogether.
Overall, syntax analysis is a critical stage in the process of compiling a
program, as it ensures that the program is syntactically correct and ready
for further processing by the compiler.