0% found this document useful (0 votes)
98 views

Parsing PDF

The parser checks a stream of tokens for grammatical correctness based on the syntax rules of the programming language. It determines if the code is syntactically valid and builds an intermediate representation. The parser detects syntax errors like missing semicolons or incorrect expressions. Context-free grammars are commonly used to represent the syntax of programming languages, with rules to derive sentences from non-terminals. Parsing involves discovering a derivation through grammar productions to check if the token stream matches the language syntax.

Uploaded by

Awais
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views

Parsing PDF

The parser checks a stream of tokens for grammatical correctness based on the syntax rules of the programming language. It determines if the code is syntactically valid and builds an intermediate representation. The parser detects syntax errors like missing semicolons or incorrect expressions. Context-free grammars are commonly used to represent the syntax of programming languages, with rules to derive sentences from non-terminals. Parsing involves discovering a derivation through grammar productions to check if the token stream matches the language syntax.

Uploaded by

Awais
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Parsing

We now move the second module of the front-end: the parser. Recall the front-end components:

The parser checks the stream of words (tokens) and their parts of speech for grammatical
correctness. It determines if the input is syntactically well formed. It guides context-sensitive
(“semantic”) analysis (type checking). Finally, it builds IR for source program.
Syntactic Analysis
Consider the sentence “He wrote the program”. The structure of the sentence can be described
using grammar syntax of English language.

The analogy can be carried over to syntax of sentences in a programming language. For
example, an if-statement has the syntax

The parser ensures that sentences of a programming language that make up a program abide by
the syntax of the language. If there are errors, the parser will detect them and reports them
accordingly. Consider the following code segment that contains a number of syntax errors:

int* foo(int i, int j))


{
for(k=0; i j; ) fi( i > j )
return j;
}
It is clear that a scanner based upon regular expressions will not be able to detect syntax error.
Syntactic Analysis
Consider the following C++ function. There are a number of syntax errors present.
Line 1 has extra parenthesis at the end. The boolean expression in the for loop in line 3 is
incorrect. Line 4 has a missing semicolon at the end. All such errors are due to the fact the function
does not abide by the syntax of the C++ language grammar.
Semantic Analysis
Consider the English language sentence “He wrote the computer”. The sentence is syntactically
correct but semantically wrong. The meaning of the sentence is incorrect; one does not “write” a
computer. Issues related to meaning fall under the heading of semantic analysis. The following
C++ function has semantic errors. The type of the local variable sum has not been declared. The
returned value does not match the return value type of the function (int*). The function is
syntactically correct.

Role of the Parser


Not all sequences of tokens are program. Parser must distinguish between valid and invalid
sequences of tokens. What we need is an expressive way to describe the syntax of programs and
an acceptor mechanism that determines if input token stream satisfies the syntax of the
programming language. The acceptor mechanism determines if input token stream satisfies the
syntax of a programming language.
Parsing is the process of discovering a derivation for some sentence of a language. The
mathematical model of syntax is represented by a grammar G. The language generated by the
grammar is indicated by L(G). Syntax of most programming languages can be represented by
Context Free Grammars (CFG).
A CFG is a four tuple G=(S,N,T,P)
1. S is the start symbol
2. N is a set of non-terminals
3. T is a set of terminals
4. P is a set of productions
Why aren’t Regular Expressions used to represent syntax? The reasons is that regular languages
do not have enough power to express syntax of programming languages. Moreover, finite
automaton can’t remember number of times it has visited a particular state.
Consider the following example of CFG

This CFG defines the set of noises sheep make. We can use the SheepNoise grammar to create
sentences of the language. We use the productions as rewriting rules

While it is cute, this example quickly runs out intellectual steam. To explore uses of CFGs, we need
a more complex grammar. Consider the grammar for arithmetic expressions:

Grammar rules in a similar form were first used in the description of the Algol- 60 programming
language. The syntax of C, C++ and Java is derived heavily from Algol-60. The notation was
developed by John Backus and adapted by Peter Naur for the Algol-60 language report; thus the
term Backus-Naur Form (BNF). Let us use the expression grammar to derive the sentence
x–2*y
Such a process of rewrites is called a derivation and the process or discovering a derivation is
called parsing. At each step, we choose a non-terminal to replace. Different choices can lead to
different derivations.
Two derivations are of interest
1. Leftmost: replace leftmost non-terminal (NT) at each step
2. Rightmost: replace rightmost NT at each step
The example on the preceding slides was leftmost derivation. There is also a rightmost derivation.
In both cases we have

expr ? * id – num. id
The two derivations produce different parse trees. The parse trees imply different evaluation
orders!
Parse Trees
The derivations can be represented in a tree-like fashion. The interior nodes contain the non-
terminals used during the derivation
Precedence
These two derivations point out a problem with the grammar. It has no notion of precedence, or
implied order of evaluation. The normal arithmetic rules say that multiplication has higher
precedence than subtraction. To add precedence, create a non-terminal for each level of
precedence. Isolate corresponding part of grammar to force parser to recognize high precedence
sub-expressions first. Here is the revised grammar:

This grammar is larger and requires more rewriting to reach some of the terminal symbols. But
it encodes expected precedence. Let’s see how it parses

This produces same parse tree under leftmost and rightmost derivations.
Both leftmost and rightmost derivations give the same expression because the grammar directly
encodes the desired precedence.

You might also like