Heba Compiler Design Book - 2025

Download as pdf or txt
Download as pdf or txt
You are on page 1of 133

Compiler Design

Prepared by:
Dr. Heba El Hadidi

2024-2025
Compiler Design Dr Heba El Hadidi

Contents

Chapter One -Introduction .................................................................................................................................................... 3


Chapter Two-Language Processing System ........................................................................................................................... 9
Chapter Three- Structure of Compiler Phases .................................................................................................................... 20
Chapter Four- Error Types-Symbol Table ............................................................................................................................ 44
Chapter Five- Parsing ........................................................................................................................................................... 59
Chapter Six- LL(1) Predictive Parser .................................................................................................................................. 113
Chapter Seven- LR Parsers ................................................................................................................................................. 115
SLR(1) .................................................................................................................................................................................. 122
LR(1).................................................................................................................................................................................... 126
LALR .................................................................................................................................................................................... 130
References .......................................................................................................................................................................... 133

Page 2 2024-2025
Compiler Design Dr Heba El Hadidi

Chapter One -Introduction

A compiler is a program that takes a program written in a source language and translates it
into an equivalent program in a target language. The source language often is a high level
language and target language is machine language.

Source program -> COMPILER -> Target program

Page 3 Chapter One -Introduction 2024-2025


Compiler Design Dr Heba El Hadidi

Necessity of compiler:

• Techniques used in a lexical analyzer can be used in text editors, information retrieval
system, and pattern recognition programs.

• Techniques used in a parser can be used in a query processing system such as SQL.

• Many software having a complex front-end may need techniques used in compiler design.

• A symbolic equation solver which takes an equation as input. That program should parse
the given input equation.

• Most of the techniques used in compiler design can be used in Natural Language Processing
(NLP) systems.

Page 4 Chapter One -Introduction 2024-2025


Compiler Design Dr Heba El Hadidi

Properties of Compiler:

a) Correctness

i) Correct output in execution. ii) It should report errors iii) Correctly report if the
programmer is not following language syntax.

b) Efficiency

c) Compile time and execution.

d) Debugging / Usability.

e) Good error reporting/handling

Interpreter:

An interpreter is another kind of language processor.

Instead of producing a target program as a translation, an interpreter appears to directly


execute the operations specified in the source program on inputs supplied by the user.

Page 5 Chapter One -Introduction 2024-2025


Compiler Design Dr Heba El Hadidi

Types of Compiler:

One pass

Two pass

Multi pass

Give definition and examples. ???

Page 6 Chapter One -Introduction 2024-2025


Compiler Design Dr Heba El Hadidi

Source to source compiler It is a type of compiler that takes a high level language as a input
and its output as high level language. Example Open MP

Page 7 Chapter One -Introduction 2024-2025


Compiler Design Dr Heba El Hadidi

List of compilers

1. Ada compiler

2. ALGOL compiler

3. BASIC compiler

4. C# compiler

5. C compiler

6. C++ compiler

7. COBOL compiler

8. Smalltalk comiler

9. Java compiler

THE Assistants OF The COMPILER:

The Assistant of a compiler are :

1. Preprocessor.

2. Assembler.

3. Loader and link-editor (linker).

Page 8 Chapter One -Introduction 2024-2025


Compiler Design Dr Heba El Hadidi

Chapter Two-Language Processing System


A source program may be divided into modules stored in separate files.

Preprocessor –collects all the separate files to the source program.

A preprocessor produce input to compilers.

A preprocessor is a program that processes its input data to produce output that is used as
input to another program. The output is said to be a preprocessed form of the input data,
which is often used by some subsequent programs like compilers. The preprocessor is
executed before the actual compilation of code begins; therefore the preprocessor digests
all these directives before any code is generated by the statements.

They may perform the following functions:

1. Macro processing: A preprocessor may allow a user to define macros that are short
hands for longer constructs.

A macro is a rule or pattern that specifies how a certain input sequence (often a sequence of
characters) should be mapped to an output sequence (also often a sequence of characters)
according to a defined procedure. The mapping processes that instantiate (transforms) a
macro into a specific output sequence is known as macro expansion.

macro definitions (#define, #undef)

2. File inclusion: A preprocessor may include header files into the program text.

Preprocessor includes header files into the program text. When the preprocessor finds
#include directive it replaces it by the entire content of the specified file. There are two ways
to specify a file to be included:

Page 9 Chapter Two-Language Processing System 2024-2025


Compiler Design Dr Heba El Hadidi

#include "file"

#include <file>

The only difference between both expressions is the places (directories) where the compiler
is going to look for the file. In the first case where the file name is specified between double-
quotes, the file is searched first in the same directory that includes the file containing the
directive. In case that it is not there, the compiler searches the file in the default directories
where it is configured to look for the standard header files.
If the file name is enclosed between angle-brackets <> the file is searched directly where the
compiler is configured to look for the standard header files. Therefore, standard header files
are usually included in angle-brackets, while other specific header files are included using
quotes.

3. Rational preprocessor: these preprocessors augment older languages with more


modern flow-of-control and data structuring facilities.

For example, such a preprocessor might provide the user with built-in macros for constructs
like while-statements or if-statements, where none exist in the programming language
itself.

4. Language Extensions: These preprocessor attempts to add capabilities to the


language by certain amounts to build-in macro

Page 10 Chapter Two-Language Processing System 2024-2025


Compiler Design Dr Heba El Hadidi

For example, the language equal is a database query language embedded in C. Statements
begging with ## are taken by the preprocessor to be database access statements unrelated
to C and are translated into procedure calls on routines that perform the database access.

The behavior of the compiler with respect to extensions is declared with


the #extension directive: #extension extension_name : behavior #extension all : behavior
extension_name is the name of an extension. The token all means that the specified
behavior should apply to all extensions supported by the compiler.

A language Processing System

Page 11 Chapter Two-Language Processing System 2024-2025


Compiler Design Dr Heba El Hadidi

Page 12 Chapter Two-Language Processing System 2024-2025


Compiler Design Dr Heba El Hadidi

ASSEMBLER

Programs known as assembler were written to automate the translation of assembly


language into machine language. The input to an assembler program is called source
program, the output is a machine language translation (object program).

Typically a modern assembler creates object code by translating assembly instruction


mnemonics into opcodes, and by resolving symbolic names for memory locations and other
entities. The use of symbolic references is a key feature of assemblers, saving tedious
calculations and manual address updates after program modifications. Most assemblers also
include macro facilities for performing textual substitution-e.g., to generate common short
sequences of instructions as inline, instead of called subroutines, or even generate entire
programs or program suites.

There are two types of assemblers based on how many passes through the source are needed
to produce the executable program.

One-pass assemblers go through the source code once and assume that all symbols will be
defined before any instruction that references them.

Two-pass assemblers create a table with all symbols and their values in the first pass, and
then use the table in a second pass to generate code. The assembler must at least be able to
determine the length of each instruction on the first pass so that the addresses of symbols
can be calculated.

The advantage of a one-pass assembler is speed, which is not as important as it once was with
advances in computer speed and capabilities. The advantage of the two-pass assembler is
that symbols can be defined anywhere in the program source. As a result, the program can be

Page 13 Chapter Two-Language Processing System 2024-2025


Compiler Design Dr Heba El Hadidi

defined in a more logical and meaningful way. This makes two-pass assembler programs
easier to read and maintain

Interpreter

Languages such as BASIC, pyhton, LISP can be translated using interpreters. JAVA also uses
an interpreter. The process of interpretation can be carried out in the following phases.

1. Lexical analysis

2. Synatx analysis

3. Semantic analysis

4. Direct Execution

Advantages

- Modification of user program can be easily made and implemented as execution


proceeds.
- Type of object that denotes a various may change dynamically.
- Debugging a program and finding errors is simplified task for a program used for
interpretation.
- The interpreter for the language makes it machine independent.

Disadvantages

-The execution of the program is slower.

- Memory consumption is more.

Page 14 Chapter Two-Language Processing System 2024-2025


Compiler Design Dr Heba El Hadidi

Loader and Linker

Once the assembler procedures an object program, that program must be placed into
memory and executed. The assembler could place the object program directly in memory and
transfer control to it, thereby causing the machine language program to be executed. This
would waste core by leaving the assembler in memory while the user’s program was being
executed. Also, the programmer would have to retranslate his program with each execution,
thus wasting translation time. To overcome this problem of wasted translation time and
memory.

So,

A linker or link editor is a program that takes one or more objects generated by a compiler
and combines them into a single executable program.

Three tasks

1. Searches the program to find library routines used by program, e.g. printf(), math
routines.
2. Determines the memory locations that code from each module will occupy and
relocates its instructions by adjusting absolute references
3. Resolves references among files Loader

Page 15 Chapter Two-Language Processing System 2024-2025


Compiler Design Dr Heba El Hadidi

System programmers developed another component called Loader.

“A loader is a program that places programs into memory and prepares them for execution.”

It would be more efficient if subroutines could be translated into object form the loader
could” relocate” directly behind the user’s program. The task of adjusting programs or they
may be placed in arbitrary core locations is called relocation. Relocation loaders perform four
functions.

A loader is the part of an operating system that is responsible for loading programs, one of
the essential stages in the process of starting a program. Loading a program involves reading
the contents of executable file, the file containing the program text, into memory, and then
carrying out other required preparatory tasks to prepare the executable for running. Once
loading is complete, the operating system starts the program by passing control to the loaded
program code.

Page 16 Chapter Two-Language Processing System 2024-2025


Compiler Design Dr Heba El Hadidi

All operating systems that support program loading have loaders, apart from systems where
code executes directly from ROM or in the case of highly specialized computer systems that
only have a fixed set of specialized programs.

In many operating systems the loader is permanently resident in memories, although some
OSs that support virtual memory may allow the loader to be located in a region of memory
that is pageable. In the case of operating systems that support virtual memory, the loader
may not actually copy the contents of executable files into memory, but rather may simply
declare to the virtual memory subsystem that there is a mapping between a region of memory
allocated to contain the running program's code and the contents of the associated
executable file. The virtual memory subsystem is hen made aware that pages with that region
of memory need to be filled on demand if and when program execution actually hits those
areas of unfilled memory. This may mean parts of a program's code are not actually copied
into memory until they are actually used, and unused code may never be loaded into memory
at all.

Steps for loaders:

- Read executable file's header to determine the size of text and data segments
- Create a new address space for the program
- Copies instructions and data into address space
- Copies arguments passed to the program on the stack
- Initializes the machine registers including the stack ptr
- Jumps to a startup routine that copies the program's arguments from the stack to registers
and calls the program's main routine

Page 17 Chapter Two-Language Processing System 2024-2025


Compiler Design Dr Heba El Hadidi

Page 18 Chapter Two-Language Processing System 2024-2025


Compiler Design Dr Heba El Hadidi

Page 19 Chapter Two-Language Processing System 2024-2025


Compiler Design Dr Heba El Hadidi

Chapter Three- Structure of Compiler Phases

There are two major parts of a compiler:


Analysis and Synthesis
• In analysis phase, an intermediate representation is created from the given source
program.
Lexical Analyzer, Syntax Analyzer and Semantic Analyzer are the phases in this part.
• In synthesis phase, the equivalent target program is created from this intermediate
representation. Intermediate Code Generator, Code Generator, and Code Optimizer are the
phases in this part.

Page 20 Chapter Three- Structure of Compiler Phases 2024-2025


Compiler Design Dr Heba El Hadidi

Compiler Phases:

Each phase transforms the source program from one representation into another
representation. They communicate with error handlers and the symbol table.

Lexical Analyzer

• Lexical Analyzer reads the source program character by character and returns the tokens of
the source program.

• A token describes a pattern of characters having the same meaning in the source program.
(such as identifiers, operators, keywords, numbers, delimiters and so on)

Example:

In the code line: newval := oldval + 12,

tokens are:

newval (identifier)

:= (assignment operator)

oldval (identifier)

+ (add operator)

12 (a number)

Page 21 Chapter Three- Structure of Compiler Phases 2024-2025


Compiler Design Dr Heba El Hadidi

• Puts information about identifiers into the symbol table.

• Regular expressions are used to describe tokens (lexical constructs).

• A (Deterministic) Finite State Automaton can be used in the implementation of a lexical


analyzer.

So, the role of the Lexical Analyzer:


As the first phase of a compiler, the main task of the lexical analyzer is to read the input
characters of the source program, group them into lexemes, and produce as output a
sequence of tokens for each lexeme in the source program. The stream of tokens is sent to
the parser for syntax analysis. It is common for the lexical analyzer to interact with the symbol
table as well. When the lexical analyzer discovers a lexeme constituting an identifier, it needs
to enter that lexeme into the symbol table. In some cases, information regarding the These
interactions are suggested in Fig. 3.1. Commonly, the interaction is implemented by having
the parser call the lexical analyzer. The call, suggested by the getNextToken command,
causes the lexical analyzer to read characters from its input until it can identify the next
lexeme and produce for it the next token, which it returns to the parser.

Page 22 Chapter Three- Structure of Compiler Phases 2024-2025


Compiler Design Dr Heba El Hadidi

Since the lexical analyzer is the part of the compiler that reads the source text, it may perform
certain other tasks besides identification of lexemes.
One such task is stripping out comments and whitespace (blank, newline, tab, and perhaps
other characters that are used to separate tokens in the input).
Another task is correlating error messages generated by the compiler with the source
program. For instance, the lexical analyzer may keep track of the number of newline
characters seen, so it can associate a line number with each error message.
In some compilers, the lexical analyzer makes a copy of the source program with the error
messages inserted at the appropriate positions. If the source program uses a macro-
preprocessor, the expansion of macros may also
be performed by the lexical analyzer.

TOKENS, PATTERNS, AND LEXEMES:


When discussing lexical analysis, we use three related but distinct terms:
Token:

Page 23 Chapter Three- Structure of Compiler Phases 2024-2025


Compiler Design Dr Heba El Hadidi

A token is a pair consisting of a token name and an optional attribute value. The token name
is an abstract symbol representing a kind of lexical unit, e.g., a particular keyword, or a
sequence of input characters denoting an identifier. The token names are the input symbols
that the parser processes. In what follows, we shall generally write the name of a token in
boldface. We will often refer to a token by its token name.
Pattern:
A pattern is a description of the form that the lexemes of a token may take. In the case of a
keyword as a token,the pattern is just the sequence of characters that form the keyword. For
identifiers and some other tokens,
the pattern is a more complex structure that is matched by many strings.
Lexeme:
A lexeme is a sequence of characters in the source program that matches the pattern for a
token and is identified by the lexical analyzer as an instance of that token.
Examples of Tokens:

1.Tokens are treated as terminal symbols in the grammar for the source language using
boldface names to represent tokens.
2.The lexemes matched by the pattern for the tokens represent the strings of characters in
the source program that can be treated together as a lexical unit

Page 24 Chapter Three- Structure of Compiler Phases 2024-2025


Compiler Design Dr Heba El Hadidi

3.In most of the programming languages keywords, operators identifiers , constants , literals
and punctuation symbols are treated as tokens.
4. A pattern is a rule describing the set of lexemes that can represent a particular token in the
source program.
5. In many languages certain strings are reserved i.e their meanings are predefined and
cannot be changes by the users
6. If the keywords are not reserved then the lexical analyzer must distinguish between a
keyword and a user-defined identifier
ATTRIBUTES FOR TOKENS:
When more than one lexeme can match a pattern, the lexical analyzer must provide the
subsequent compiler phases additional information about the particular lexeme that
matched. For example, the pattern for token number matches both 0 and 1, but it is extremely
important for the code generator to know which lexeme was found in the source program.
Thus, in many cases the lexical analyzer returns to the parser not only a token name, but an
attribute value that describes the lexeme represented by the token; the token name
influences parsing decisions, while the attribute value influences translation of tokens after
the parse. We shall assume that tokens have at most one associated attribute, although this
attribute may have a structure that combines several pieces of information. The most
important example is the token id, where we need to associate with
the token a great deal of information. Normally, information about an identifier - e.g., its
lexeme, its type, and the location at which it is first found (in case an error message about that
identifier must be issued) - is kept in the symbol table. Thus, the appropriate attribute value
for an identifier is a pointer to the symbol-table entry for that identifier.
Example : The token names and associated attribute values for the Fortran statement
are written below as a sequence of pairs.

Page 25 Chapter Three- Structure of Compiler Phases 2024-2025


Compiler Design Dr Heba El Hadidi

<id, pointer to symbol-table entry for E>


< assign-op >
<id, pointer to symbol-table entry for M>
<mult -op>
<id, pointer to symbol-table entry for C>
<exp-op>
<number , integer value 2 >
Note that in certain pairs, especially operators, punctuation, and keywords, there is no need
for an attribute value. In this example, the token number has been given an integer-valued
attribute. In practice, a typical compiler would instead store a character string representing
the constant and use as an attribute value for number a pointer to that string.
ERRORS IN LEXICAL ANALYSIS :
It is hard for a lexical analyzer to tell, without the aid of other components, that there is a
source-code error. For instance, if the string f i is encountered for the first time in a C program
in the context:
a lexical analyzer cannot tell whether f i is a misspelling of the keyword if or an undeclared
function identifier. Since f i is a valid lexeme for the token id, the lexical analyzer must return
the token id to the parser and let some other phase of the compiler - probably the parser in
this case - handle an error due to transposition of the letters. However, suppose a situation
arises in which the lexical analyzer is unable to proceed because none of the patterns for
tokens matches any prefix of the remaining input.
Panic mode :
The simplest recovery strategy is "panic mode" recovery. We delete successive characters
from the remaining input, until the lexical analyzer can find a well-formed token at the
beginning of what input is left. This recovery technique may confuse the parser, but in an
interactive computing environment it may be quite adequate.

Page 26 Chapter Three- Structure of Compiler Phases 2024-2025


Compiler Design Dr Heba El Hadidi

Other possible error-recovery actions are:


1. Delete one character from the remaining input.
2. Insert a missing character into the remaining input.
3. Replace a character by another character.
4. Transpose two adjacent characters.

INPUT BUFFERING:
During lexical analyzing , to identify a lexeme , it is important to look ahead atleast one
additional character. Specialized buffering techniques have been developed to reduce the
amount of overhead required to process a single input character
An important scheme involves two buffers that are alternatively reloaded

Each buffer is of the same size N, and N is usually the size of a disk block, e.g., 4096 bytes.
Using one system read command we can read N characters into a buffer, rather than using
one system call per character. If fewer than N characters remain in the input file, then a special
character, represented by eof marks the end of the source file and is different from any
possible character of the source program.
Two pointers to the input are maintained:
I. Pointer lexemeBegin, marks the beginning of the current lexeme, whose extent we are
attempting to determine.
2. Pointer forward scans ahead until a pattern match is found; the exact strategy whereby this
determination is made will be covered in the balance of this chapter.

Page 27 Chapter Three- Structure of Compiler Phases 2024-2025


Compiler Design Dr Heba El Hadidi

Once the next lexeme is determined, forward is set to the character at its right end. Then, after
the lexeme is recorded as an attribute value of a token returned to the parser, 1exemeBegin
is set to the character immediately after the lexeme just found. In Fig. 3.3, we see forward has
passed the end of the next lexeme, ** (the Fortran exponentiation operator), and must be
retracted one position to its left.
Advancing forward requires that we first test whether we have reached the end of one of the
buffers, and if so, we must reload the other buffer from the input, and move forward to the
beginning of the newly loaded buffer. As long as we never need to look so far ahead of the
actual lexeme that the sum of the lexeme's length plus the distance we look ahead is greater
than N, we shall never overwrite the lexeme in its buffer before determining it.
Sentinels
If we use the scheme of Section 3.2.1 as described, we must check, each time we advance
forward, that we have not moved off one of the buffers; if we do, then we must also reload the
other buffer. Thus, for each character read, we make two tests: one for the end of the buffer,
and one to determine what character
is read (the latter may be a multiway branch). We can combine the buffer-end test with the
test for the current character if we extend each buffer to hold a sentinel character at the end.
The sentinel is a special character that cannot be part of the source program, and a natural
choice is the character eof. Figure 3.4 shows the same arrangement as Fig. 3.3, but with the
sentinels added. Note that eof retains its use as a marker for the end of the entire input. Any
eof that appears other than at the end of a buffer means that the input is at an end. Figure 3.5
summarizes the algorithm for advancing forward. Notice how the first test, which can be part
of a multiway branch based on the character pointed to by forward, is the only test we make,

Page 28 Chapter Three- Structure of Compiler Phases 2024-2025


Compiler Design Dr Heba El Hadidi

except in the case where we actually are at the end of a buffer or the end of the input.

SPECIFICATION OF TOKENS :
Regular languages are an important notation for specifying lexeme patterns
Strings and Languages:
• An alphabet is any finite set of symbols ex: Letters, digits and punctuation
• The set {01) is the binary alphabet
• A string over an alphabet is a finite sequence of symbols drawn from the alphabet
• The length of the string s,represented as |s| , is the number of occurrences of symbols
in s .
• The empty string denoted as € is the string of length 0
• A language is any countable set of strings over some fixed alphabet ex: abstract
languages
• If x and y are strings ten the concatenation of x and y denoted by xy is the string formed
by appending y to x for ex if x=cse and y=department , then xy=csedepartment .

REGULAR EXPRESSIONS:
Suppose we wanted to describe the set of valid C identifiers.
We are able to describe identifiers by giving names to sets of letters and digits and using the
language operators union, concatenation, and closure. This process is so useful that a
notation called regular expressions has come into common use for describing all the
languages that can be built from these operators applied to the symbols of some alphabet. In

Page 29 Chapter Three- Structure of Compiler Phases 2024-2025


Compiler Design Dr Heba El Hadidi

this notation, if letter- is established to stand for any letter or the underscore, and digit- is
established to stand for any digit, then we could describe the language of C
identifiers by:
letter- ( letter- I digit )*
Example : Let C = {a, b}.
1. The regular expression a| b denotes the language {a, b}.
2. (a| b) (a|b) denotes {aa, ab, ba, bb), the language of all strings of length two over the
alphabet C. Another regular expression for the same language is aalablbal bb.
3. a* denotes the language consisting of all strings of zero or more a's, that is, {E, a, aa, aaa, .
. . }.
4. (alb)* denotes the set of all strings consisting of zero or more instances of a or b, that is, all
strings of a's and b's: {e, a, b, aa, ab, ba, bb, aaa, . . .}. Another regular expression for the same
language is (a*b*)*.
5. ala*b denotes the language {a, b, ab, aab,aaab,. . .), that is, the string a and all strings
consisting of zero or more a's and ending in b. A language that can be defined by a regular
expression is called a regular set. If two regular expressions r and s denote the same regular
set, we say they are equivalent and write r = s. For instance, (alb) = (bla).

Example : C identifiers are strings of letters, digits, and underscores. Here is a regular
definition for the language of C identifiers. We shall conventionally use italics for the symbols
defined in regular definitions.
Letter→ (A |B | . .. |Z | a | b |. .. l z)
digit →( 0| 1 | … | 9)
id →letter+ ( letter- I digit )*

Page 30 Chapter Three- Structure of Compiler Phases 2024-2025


Compiler Design Dr Heba El Hadidi

For this language, the lexical analyzer will recognize the keywords i f , then, and else, as well
as lexemes that match the patterns for relop, id, and number. To simplify matters, we make
the common assumption that keywords are also reserved words: that is, they are not
identifiers, even though their lexemes match the pattern for identifiers. In addition, we assign
the lexical analyzer the job of stripping out whitespace, by recognizing the "token" ws defined
by:
ws → ( blank I tab | newline )+
Here, blank, tab, and newline are abstract symbols that we use to express the ASCII characters
of the same names. Token ws is different from the other tokens in that, when we recognize it,
we do not return it to the parser, but rather restart the lexical analysis from the character that
follows the whitespace. It is the following token that gets returned to the parser.

Page 31 Chapter Three- Structure of Compiler Phases 2024-2025


Compiler Design Dr Heba El Hadidi

Syntax Analyzer

• A Syntax Analyzer creates the syntactic structure (generally a parse tree) of the given
program.

• A syntax analyzer is also called a parser.

• A parse tree describes a syntactic structure.

Example:

For the line of code newval := oldval + 12, parse tree will be:

Page 32 Chapter Three- Structure of Compiler Phases 2024-2025


Compiler Design Dr Heba El Hadidi

•The syntax of a language is specified by a context free grammar (CFG).

• The rules in a CFG are mostly recursive.

• A syntax analyzer checks whether a given program satisfies the rules implied by a CFG or not.

• If it satisfies, the syntax analyzer creates a parse tree for the given program

Example:

CFG used for the above parse tree is:

assignment-> identifier := expression

expression -> identifier

expression -> number

Page 33 Chapter Three- Structure of Compiler Phases 2024-2025


Compiler Design Dr Heba El Hadidi

expression -> expression + expression

• Depending on how the parse tree is created, there are different parsing techniques.

• These parsing techniques are categorized into two groups:

– Top-Down Parsing,

– Bottom-Up Parsing

• Top-Down Parsing:

– Construction of the parse tree starts at the root, and proceeds towards the leaves.

– Efficient top-down parsers can be easily constructed by hand.

– Recursive Predictive Parsing, Non-Recursive Predictive Parsing (LL Parsing).

• Bottom-Up Parsing:

– Construction of the parse tree starts at the leaves and proceeds towards the root.

– Normally efficient bottom-up parsers are created with the help of some software tools.

– Bottom-up parsing is also known as shift-reduce parsing.

– Operator-Precedence Parsing – simple, restrictive, easy to implement

– LR Parsing – much general form of shift-reduce parsing, LR, SLR, LALR

Page 34 Chapter Three- Structure of Compiler Phases 2024-2025


Compiler Design Dr Heba El Hadidi

Semantic Analyzer

• A semantic analyzer checks the source program for semantic errors and collects the type
information for the code generation.

• Type-checking is an important part of semantic analyzer.

• Normally semantic information cannot be represented by a context-free language used in


syntax analyzers.

• Context-free grammar used in the syntax analysis are integrated with attributes (semantic
rules). The result is a syntax-directed translation and attribute grammars.

Example:

In the line of code:

newval := oldval + 12,

the type of the identifier newval must match with type of the expression (oldval+12)

Page 35 Chapter Three- Structure of Compiler Phases 2024-2025


Compiler Design Dr Heba El Hadidi

Intermediate Code Generation

Code Optimizer

The code optimizer optimizes the code produced by the intermediate code generator in the
terms of time and space.

Example:

The above piece of intermediate code can be reduced as follows:

MULT id2, id3, temp1

ADD temp1, #1, id1

Code Generator

• Produces the target language in a specific architecture.

• The target program is normally a relocatable object file containing the machine codes.

Page 36 Chapter Three- Structure of Compiler Phases 2024-2025


Compiler Design Dr Heba El Hadidi

Example:

Assuming that we have architecture with instructions that have at least one operand as a
machine register, the Final Code our line of code will be:

MOVE id2, R1

MULT id3, R1

ADD #1, R1

MOVE R1, id1

Page 37 Chapter Three- Structure of Compiler Phases 2024-2025


Compiler Design Dr Heba El Hadidi

Page 38 Chapter Three- Structure of Compiler Phases 2024-2025


Compiler Design Dr Heba El Hadidi

Page 39 Chapter Three- Structure of Compiler Phases 2024-2025


Compiler Design Dr Heba El Hadidi

Page 40 Chapter Three- Structure of Compiler Phases 2024-2025


Compiler Design Dr Heba El Hadidi

Page 41 Chapter Three- Structure of Compiler Phases 2024-2025


Compiler Design Dr Heba El Hadidi

Exercise:

1- What is the difference between a compiler and an interpreter


2- What are the advantages of (a) a compiler over an interpreter (b) an interpreter over a
compiler.
3- What advantages are there to a language-processing system in which the compiler
produces assembly language rather than machine language?
4- A compiler that translates a high-level language into another high-level language is
called a source-to-source translator. What advantages are there to using C as a target
language for a compiler?
5- Describe some of the tasks that an assembler needs to perform.

Page 42 Chapter Three- Structure of Compiler Phases 2024-2025


Compiler Design Dr Heba El Hadidi

Page 43 Chapter Three- Structure of Compiler Phases 2024-2025


Compiler Design Dr Heba El Hadidi

Chapter Four- Error Types-Symbol Table

Errors

1- Lexical Errors ‫أخطاء لفظية‬


Inserting characters that do not correspond to any existing Token in the language
‫ موجودة باللغه‬token ‫ادخال حروف ال تقابل أي مفردة‬
int $, b=0..5;
a=2x (2*x)
2+x
2- Syntax Error ‫أخطاء قواعدية (اعراب) في بناء الجملة‬
‫ ال تتطابق مع قواعد اللغه‬tokens ‫اذاكانت الجملة عبارة عن سلسلة من المفردات‬
If the sentence is a series of tokens that do not match the rules of the language

for ) int I; i<10;i+++), cin<<x


3- Semantic Error ‫أخطاء نحوية‬
‫اذا كانت األنواع غير متطابقة مثل جمع بيانات نوعها مختلف‬
If the types do not match, such as adding data of a different types
char x;
float y, z;
z=x+y;
or
c=f(3)+arr[n]
or
def f(n,m):
return n+m
z=(5,4,6)
4- RunTime Error ‫أخطاء وقت التنفيذ‬
‫مثل القسمة علي صفر‬
Division by zero
int x=5, y=0;
x=1/y;

Page 44 Chapter Four- Error Types-Symbol Table 2024-2025


Compiler Design Dr Heba El Hadidi

Scopes and Symbol table


Introduction

• An important concept in PLs is the ability to name objects such as variables, functions and
types. Each such named object will have a declaration, where the name is defined as a
synonym for the object. This is called binding.
• Each name will also have a number of uses, where the name is used as a reference to the
object to which it is bound. Often, the declaration of a name has a limited scope: a portion of
the program where the name will be visible. Such declarations are called local declarations,
whereas a declaration that makes the declared name visible in the entire program is called
global.
• It may happen that the same name is declared in several nested scopes. In this case, it is
normal that the declaration closest to a use of the name will be the one that defines that
particular use. In this context closest is related to the syntax tree of the program: The scope
of a declaration will be a sub-tree of the syntax tree and nested declarations will give rise to
scopes that are nested sub-trees. The closest declaration of a name is hence the declaration
corresponding to the smallest sub-tree that encloses the use of the name.
• As an example, look at this C statement block:
{
int x = 1; // declare integer variables x with scope until the closing brace in the last line.
int y = 2; // declare integer variables y with scope until the closing brace in the last line.
{
//A new scope is started by the second opening brace { and a floating-point variable x with an initial value close to π is declared. //This
will have scope until the first closing brace }, so the original x variable is not visible until the inner scope ends.

double x = 3.14159265358979;
y += (int)x; //This assignment will add 3 to y, so its new value is 5.

Page 45 Chapter Four- Error Types-Symbol Table 2024-2025


Compiler Design Dr Heba El Hadidi

}
y += x; //this assignment, we have exited the inner scope, so the original x is restored. So, 1 will be added to y, //which
will have the final value 6
}

• Scoping based on the structure of the syntax tree, as shown in the example, is called static
or lexical binding and is the most common scoping rule in modern PLs. We will in the rest of
this book assume that static binding is used.
• A few PLs have dynamic binding, where the declaration that was most recently
encountered during execution of the program defines the current use of the name. By its
nature, dynamic binding can not be resolved at compile-time, so the techniques that in the
rest of this book are described as being used in a compiler will have to be used at run-time if
the language uses dynamic binding.
• A compiler will need to keep track of names and the objects these are bound to, so that any
use of a name will be attributed correctly to its declaration. This is typically done using a
symbol table.

Symbol Table

• Symbol Table is an important data structure created and maintained by the compiler in
order to keep track of semantics of variables i.e. it stores information about the scope and
binding information about names, information about instances of various entities such as
variable and function names, classes, objects, etc.

Page 46 Chapter Four- Error Types-Symbol Table 2024-2025


Compiler Design Dr Heba El Hadidi

• A symbol table is a table that binds names to information. We need a number of operations
on symbol tables to accomplish this:
- We need an empty symbol table, in which no name is defined.
- We need to be able to bind a name to a piece of information. In case the name is already defined in
the symbol table, the new binding takes precedence over the old.
- We need to be able to look up a name in a symbol table to find the information the name is bound to.
If the name is not defined in the symbol table, we need to be told that.
- We need to be able to enter a new scope.
- We need to be able to exit a scope, reestablishing the symbol table to what it was before the scope
was entered.

• Symbol table is built-in lexical and syntax analysis phases.


• The information is collected by the analysis phases of the compiler and is used by the
synthesis phases of the compiler to generate code.
• It is used by the compiler to achieve compile-time efficiency.
• It is used by various phases of the compiler as follows:-
1. Lexical Analysis: Creates new table entries in the table, for example entries about
tokens.
2. Syntax Analysis: Adds information regarding attribute type, scope, dimension, line of
reference, use, etc in the table.
3. Semantic Analysis: Uses available information in the table to check for semantics i.e.
to verify that expressions and assignments are semantically correct (type checking) and
update it accordingly.
4. Intermediate Code generation: Refers symbol table for knowing how much and what
type of run-time is allocated and table helps in adding temporary variable information.

Page 47 Chapter Four- Error Types-Symbol Table 2024-2025


Compiler Design Dr Heba El Hadidi

5. Code Optimization: Uses information present in the symbol table for machine-
dependent optimization.
6. Target Code generation: Generates code by using address information of identifier
present in the table.

Symbol Table attributes

‫ يتم ادخال هذه المعلومات إلي جدول‬source code ‫يحتاج المترجم معلومات عن األسماء التي تظهر في برنامج المصدر‬
‫الرموز‬

The compiler needs information about the names that appear in the source code. This
information is entered into the symbol table. This information is called attributes and is
collected during the analysis phase. ‫يتم تجميع هذه المعلومات أثناء مرحلة التحليل‬

1- Variable name
lexical analysis ‫مفتاح البحث عن المتغير وتدخل المعلومة الي الجدول أثناء مرحلة ال‬
The key to searching for the variable and entering the information into the table
during the lexical analysis phase.
2- Object code address: determines the location of the variable value at execution time
‫يحدد الموقع الخاص بقيمة المتغير وقت التنفيذ‬
3- Data type ‫نوع المتغير وتستخدم هذه المعلومة لتحديد مساحة الذاكرة المطلوبة لقيم المتغير‬
This information is used to determine the memory space required for the variable
values
int 2B, char 1B, float 4B, double 8B,…
4- Number of dimension & No. of parameters
Simple variable 0, vector (1-D array) 1, 2D array (matrices) 2,
Function: according to its parameters ‫اذا كانت دالة يحدد عدد معامالتها او عدد المتغيرات التي‬
‫تستقبلها الدالة‬
5- Line declaration: The line number at which the variable is declared in the program
(one line)
‫رقم السطر الذي تم تعريف المتغير عنده في البرنامج‬
6- Signal (reference) line :The number of line(s) on which the variable is used, other than
the line on which it is declared.

Page 48 Chapter Four- Error Types-Symbol Table 2024-2025


Compiler Design Dr Heba El Hadidi

‫عرف فيه‬
ُ ‫رقم السطر (السطور) الذي استخدم فيه المتغير عدا السطر الذي‬
‫ أكثر من سطر يسمي‬,--cross-reference

Exercise:

Draw a cross-reference symbol table that would result when compiling the following C
source code segment:

1- Void main()
2- {
3- int i, j[5];
4- char C, index[5][6], block[5];
5- float f;
6- i=0;
7- i=i+k;
8- f=f+i;
9- C=”x”;
10- block[4]=C;
11- }
Solution:
counter Variable name Object address Type Dim Line Line
declared reference
1 i 0 int 0 3 6, 7, 8
2 j 2 (as i need 2Bytes) int 1 3 -
3 C 12 (j need 2x5 char 0 4 9, 10
bytes)
4 index 13 (C 1 Byte) char 2 4 -
5 block 43 char 1 4 10
6 f 48 float 0 5 8
7 k - - 0 - 7

So,

Error line 7: k is undefined


Warning line 3: j is not used
Warning line 4: index is not used
Warning line 8: f is used before initialization

Page 49 Chapter Four- Error Types-Symbol Table 2024-2025


Compiler Design Dr Heba El Hadidi

Symbol Table Types:

1- Unordered Symbol Table


-simple
- its attributes added to the table sequentially as its appear in the code ‫اضافة‬attributes
‫للجدول حسب تسلسل تعريف المتغيرات في البرنامج‬
- adding attributes doesn’t need comparison ‫اضافة المتغيرات ال تتطلب عملية مقارنة‬
-Average Search Length (ASL)=n+1/2 ‫معدل (متوسط) طول البحث‬
n ‫عدد المتغيرات‬
2- Ordered Symbol Table

-alphabetically ordered

- its attributes added to the table need search operation to decide the suitable place to add
in ‫عملية ادخال متغيرات بخصائصها في الجدول تتطلب عملية بحث في الجدول لتحديد المكان المناسب لتخزين المتغير‬

3- Binary Tree Symbol Table


• All names are created as child of the root node that always follows the property of
the binary search tree.
• Insertion and lookup are O(log 2 n) on average.

Ex: Add i, j, k, m, l to Tree structured symbol table.

Left L i …. R Right

L j … R

L k … R

L m … R

L l … R
Ex:

frog, tree, hill, bird, bad, cat, z1, och

Page 50 Chapter Four- Error Types-Symbol Table 2024-2025


Compiler Design Dr Heba El Hadidi

L frog …. R

L bird … R L tree … R

L bad … R L cat … R L hill … R

L z1 … R

L och … R
‫ من أعلي نزوال ألسفل‬root ‫المقارنة تعتمد علي ال‬

4- Hash Symbol Table


• In hashing scheme, two tables are maintained – a hash table and symbol table and are
the most commonly used method to implement symbol tables.
• A hash table is an array with an index range: 0 to table size – 1. These entries are
pointers pointing to the names of the symbol table.
• To search for a name we use a hash function that will result in an integer between 0 to
table size – 1.
• Insertion and lookup can be made very fast – O(1).
• The advantage is quick to search is possible and the disadvantage is that hashing is
complicated to implement.

Depend on equation:

Hash(var name)=(var word length+ ASCII of first letter var word)% hash max

‫اقصى عدد كلمات‬

A 65 B 66 C 67
D 68 E 69 F 70
G 71 H 72 I 73
J 74 K 75 L 76
M 77 N 78 O 79
P 80 Q 81 R 82
S 83 T 84 U 85

Page 51 Chapter Four- Error Types-Symbol Table 2024-2025


Compiler Design Dr Heba El Hadidi

V 86 W 87 X 88
Y 89 Z 90 a 97
b 98 c 99 d 100
e 101 f 102 g 103
h 104 i 105 j 106
k 107 l 108 m 109
n 110 o 111 p 112
q 113 r 114 s 115
t 116 u 117 v 118
w 119 x 120 y 121
z 122

Ex:

frog, tree, hill, bird, bad, cat

hash(frog)= (4+ 102)%6=106%6=4


hash(tree)=(4+116)%6=120%6=0
hash(hill)=(4+104)%6=108%6=0
hash(bird)= (4+98) %6=102%6=0
hash(bad)= (3+98) %6=101%6=5
hash(cat)= (3+99) %6=102%6=0
0 tree hill bird cat
1
2
3
4 frog
5 bad

Page 52 Chapter Four- Error Types-Symbol Table 2024-2025


Compiler Design Dr Heba El Hadidi

Advantages of Symbol Table

1. The efficiency of a program can be increased by using symbol tables, which give quick
and simple access to crucial data such as variable and function names, data kinds,
and memory locations.
2. better coding structure Symbol tables can be used to organize and simplify code,
making it simpler to comprehend, discover, and correct problems.
3. Faster code execution: By offering quick access to information like memory
addresses, symbol tables can be utilized to optimize code execution by lowering the
number of memory accesses required during execution.
4. Symbol tables can be used to increase the portability of code by offering a
standardized method of storing and retrieving data, which can make it simpler to
migrate code between other systems or programming languages.
5. Improved code reuse: By offering a standardized method of storing and accessing
information, symbol tables can be utilized to increase the reuse of code across
multiple projects.
6. Symbol tables can be used to facilitate easy access to and examination of a program’s
state during execution, enhancing debugging by making it simpler to identify and
correct mistakes.

Disadvantages of Symbol Table

1. Increased memory consumption: Systems with low memory resources may suffer from
symbol tables’ high memory requirements.
2. Increased processing time: The creation and processing of symbol tables can take a
long time, which can be problematic in systems with constrained processing power.
3. Complexity: Developers who are not familiar with compiler design may find symbol
tables difficult to construct and maintain.
4. Limited scalability: Symbol tables may not be appropriate for large-scale projects or
applications that require o the management of enormous amounts of data due to their
limited scalability.

Page 53 Chapter Four- Error Types-Symbol Table 2024-2025


Compiler Design Dr Heba El Hadidi

5. Upkeep: Maintaining and updating symbol tables on a regular basis can be time- and
resource-consuming.
6. Limited functionality: It’s possible that symbol tables don’t offer all the features a
developer needs, and therefore more tools or libraries will be needed to round out their
capabilities.

Symbol Table implementation

There are many ways to implement symbol tables, but the most important distinction
between these is how scopes are handled. This may be done using a persistent (or functional)
data structure, or it may be done using an imperative (or destructively-updated) data
structure. A persistent data structure has the property that no operation on the structure will
destroy it. Conceptually, a new modified copy is made of the data structure whenever an
operation updates it, hence preserving the old structure unchanged. This means that it is
trivial to reestablish the old symbol table when exiting a scope, as it has been preserved by
the persistent nature of the data structure. In practice, only a small portion of the data
structure is copied when a symbol table is updated, most is shared with the previous version.
In the imperative approach, only one copy of the symbol table exists, so explicit actions are
required to store the information needed to restore the symbol table to a previous state. This
can be done by using an auxiliary stack. When an update is made, the old binding of a name
that is overwritten is recorded (pushed) on the auxiliary stack. When a new scope is entered,
a marker is pushed on the auxiliary stack. When the scope is exited, the bindings on the
auxiliary stack (down to the marker) are used to reestablish the old symbol table. The
bindings and the marker are popped off the auxiliary stack in the process, returning the
auxiliary stack to the state it was in before the scope was entered.
1- Simple persistent symbol tables
2- A simple imperative symbol table

Page 54 Chapter Four- Error Types-Symbol Table 2024-2025


Compiler Design Dr Heba El Hadidi

Efficiency issues :

While all of the above implementations are simple, they all share the same efficiency problem:
Lookup is done by linear search, so the worst-case time for lookup is proportional to the size
of the symbol table. This is mostly a problem in relation to libraries: It is quite common for a
program to use libraries that define literally hundreds of names. A common solution to this
problem is hashing: Names are hashed (processed) into integers, which are used to index an
array. Each array element is then a linear list of the bindings of names that share the same
hash code. Given a large enough hash table, these lists will typically be very short, so lookup
time is basically constant.

Using hash tables complicates entering and exiting scopes somewhat. While each element of
the hash table is a list that can be handled like in the simple cases, doing this for all the array-
elements at every entry and exit imposes a major over-head. Instead, it is typical for
imperative e implementations to use a single auxiliary stack (as described in section 4.2.1) to
record all updates to the table so they can be undone in time proportional to the number of
updates that were done in the local scope. Functional implementations typically use
persistent hash-tables, which eliminates the problem.

Shared or separate name spaces

In some languages (like Pascal) a variable and a function in the same scope may have the
same name, as the context of use will make it clear whether a variable or a function is used.
We say that functions and variables have separate name spaces, which means that defining a
name in one space doesn’t affect the same name in the other space. In other languages (e.g.
C or SML) the context can not (easily) distinguish variables from functions. Hence, declaring
a local variable might hide a function declared in an outer scope or vice versa. These

Page 55 Chapter Four- Error Types-Symbol Table 2024-2025


Compiler Design Dr Heba El Hadidi

languages have a shared name space for variables and functions. Name spaces may be shared
or separate for all the kinds of names that can appear in a program, e.g., variables, functions,
types, exceptions, constructors, classes, field selectors etc. Which name spaces are shared is
language-dependent. Separate name spaces are easily implemented using one symbol table
per name space, whereas shared name spaces naturally share a single symbol table.
However, it is sometimes convenient to use a single symbol table even if there are separate
name spaces. This can be done fairly easily by adding name-space indicators to the names. A
name-space indicator can be a textual prefix to the name or it may be a tag that is paired with
the name. In either case, a lookup in the symbol table must match both the name and the
name-space indicator of the symbol that is looked up with the name and the name-space
indicator of the entry in the table.

Questions:

1- In some programming languages, identifiers are case-insensitive so, e.g., size and SiZe
refer to the same identifier. Describe how symbol tables can be made caseinsensitive.
2- What is the difference between compilation time and runtime.
3- Draw a figure to illustrate the compilation process.
4- What is the dimension attribute? What type of error can be determined based on the
dimension attribute?
5- Draw tree structured symbol table to store the following variables: name, age, degree,
average.
6- How one pass compiler work?
7- What is the input and output for semantic phase in compiler?
8- Where the variables (floor, door, window, apple, frog) will be stored in hash symbol table
if the hash table size=100 record?
9- Define parser and lexeme
10- Draw syntax tree for the expression a=d*c/2-5

Page 56 Chapter Four- Error Types-Symbol Table 2024-2025


Compiler Design Dr Heba El Hadidi

11- What are the languages described by the following regular expressions RE defined on
the alphabet {a, b}
(i) a(aΙb)*b (ii) aab(aaΙbb)+ (iii) (aa)*a
b ‫ وتنتهي بحرف‬a ‫ لغة تبدأ كل كلمة بها بحرف‬-1
bb ‫ أو‬aa ‫ ويليها تكرار واحد علي األقل من‬aab ‫ لغه تبدأ جميع كلماتها بالسلسلة‬-2
‫ ويكون طول كلماتها فردي‬a ‫ اللغة التي ال تحتوي كلماتها إال على حرف‬-3
12- Write RE to define the following languages:
(i) If the language alphabet is {a, b, c} and all its words start with letter a
a(a|b|c)*
(ii) Integer numbers multiplied of 5
5 ‫[االعداد الصحيحة من مضاعفات‬0-9]*[5Ι0]
13- Write the leftmost derivation of a0+ab*(a+b1) for the G with the productions:
E→I| E+E| E*E| (E)
I→ a|b|Ia|Ib|I0|I1
Solution:
E→ E+E→ I+E→ I0+E→a0+E→a0+E*E→ a0+I*E→ a0+Ib*E→ a0+ab*E→a0+ab*(E)→
a0+ab*(E+E)→ a0+ab*(I+E)→ a0+ab*(a+E)→ a0+ab*(a+I)→ a0+ab*(a+I1)→ a0+ab*(a+b1)
14- Write the rightmost derivation of a0+ab*(a+b1) for the G with the productions:
E→I| E+E| E*E| (E)
I→ a|b|Ia|Ib|I0|I1
15- Consider G:
state→ type|list|terminator
type→int|float|char
list→list,id|id
terminator→;
consider the input w=int id,id,id,id;
build the parse tree.

Page 57 Chapter Four- Error Types-Symbol Table 2024-2025


Compiler Design Dr Heba El Hadidi

Solution:
state

type terminator
list

int ;
list , id

list , id

list , id

id

16- Draw parse tree for the following expressions:


(i) a=b+3*d (ii) a=3*d+b

Id1 + ‫األقل أسبقية‬

Id2 *

const Id3

‫األقل أسبقية‬
Id1 +

* Id3

const Id2

Page 58 2024-2025
Compiler Design Dr Heba El Hadidi

Chapter Five- Parsing


♫ What is Parsing/Syntax analysis? It is the second phase of a compiler.
♫ A parser is a program that is part of the compiler, and parsing is part of the compiling
process. Parsing happens during the analysis stage of compilation.
♫ As previously seen, a lexical analyzer can identify tokens with the help of REs and
production rules.
♫ But a lexical analyzer cannot check the syntax of a given sentence due to the
limitations of the RES.
♫ Some authors consider parser as:
The parser consists of three components, each of which handles a different stage of the
parsing process. The three stages are:
1- Lexical analysis
2- Syntactic analysis
3- Semantic analysis

So, in this view:

In parsing, code is taken from the preprocessor, broken into smaller pieces and analyzed
so other software can understand it. The parser does this by building a data structure out
of the pieces of input.
More specifically, a person writes code in a human-readable language like C++ or Java and
saves it as a series of text files. The parser takes those text files as input and breaks them
down so they can be translated on the target platform.
And:
Stage 1: Lexical analysis
A lexical analyzer -- or scanner -- takes code from the preprocessor and breaks it into smaller
pieces. It groups the input code into sequences of characters called lexemes, each of which
corresponds to a token. Tokens are units of grammar in the programming language that the
compiler understands.
Lexical analyzers also remove white space characters, comments and errors from the
input.
Ex,

Page 59 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

Given the string: x+z=11


The lexical analyzer would separate it into a series of tokens and classify them into lexemes.
As:

Token Lexeme
x identifier
+ Addition operator
z identifier
= Assignment operator
11 Number/const
Stage 2: Syntactic analysis
This stage of parsing checks the syntactical structure of the input, using a data structure
called a parse tree or derivation tree. A syntax analyzer uses tokens to construct a parse tree
that combines the predefined grammar of the programming language with the tokens of the
input string. The syntactic analyzer reports a syntax error if the syntax is incorrect.

Ex:

The syntactic analyzer takes (x+y)*3 as input and returns this parse tree, which enables
the parser to understand the equation.

Stage 3: Semantic analysis


Semantic analysis verifies the parse tree against a symbol table and determines whether it
is semantically consistent. This process is also known as context sensitive analysis. It
includes data type checking, label checking and flow control checking.

If the code provided is this:


float a = 30.2; float b = a*20

Page 60 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

then the analyzer will treat 20 as 20.0 before performing the operation.

Some sources refer only to the syntactic analysis stage as parsing because it generates the
parse tree. They leave out lexical and semantic analysis.

♫ REs cannot check balancing tokens, such as parenthesis balancing.


♫ Therefore, this phase uses context-free grammar CFG.
♫ CFG is a helpful tool in describing the syntax of PLs, REs is a subset of CFG.
♫ A CFG can be defined by 4-tuple as: G=(V, T,P, S)
♫ V=Finite set of non-empty variables/non-terminal symbols.
♫ T= Finite set of terminal symbols
♫ P=Finite set of non-empty production rules
♫ S= Start symbol
♫ Grammar Constituents:
A grammar is mainly composed of two basic elements-
- 1- Terminal symbols

- Terminal symbols are those which are the constituents of the sentence generated
using grammar.
- Terminal symbols are denoted by using small case letters such as a, b, c etc.
- 2- Non-Terminal symbols

- Non-Terminal symbols are those which take part in the generation of the sentence but
are not part of it.
- Non-Terminal symbols are also called as auxiliary symbols or variables.
- Non-Terminal symbols are denoted by using capital letters such as A, B, C etc.

Page 61 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

Example:
Consider a grammar G = (V , T , P , S) where-
•V={ S} // Set of Non-Terminal symbols
•T ={a,b } // Set of Terminal symbols
• P = { S → aSbS , S → bSaS , S → ∈ } // Set of production rules

•S={ S } // Start symbol

This grammar generates the strings having equal number of a’s and b’s

So, Language of this grammar is-

L(G) = { ∈ , ab , ba , aabb , bbaa , abab , baba , …… }

This language consists of infinite number of strings.

Therefore, language of the grammar is infinite.

Example:

Consider a grammar G = (V , T , P , S) where-


•V={ S,A , B ,C }

•T ={a,b , c }

• P = { S → ABC , A → a , B → b , C → c }

•S={ S }

This grammar generates only one string “abc”.


So, Language of this grammar is-

L(G) = { abc }

• This language consists of finite number of strings.

• Therefore, language of the grammar is finite.

Page 62 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

♫ A syntax analyzer or parser takes the input from a lexical analyzer in the form of token
streams. The parser analyzes the source code (token stream) against the production
rules to detect any errors in the code.
♫ The output of this phase is a parse tree.

♫ The parser has two tasks:

1- parsing the code, looking for errors and

2-generating a parse tree as the output of the phase.

Derivation

♫ A derivation is basically a sequence of production rules, in order to get the input


string.
♫ During parsing, we take two decisions for some sentential form of input:
1- Deciding the non-terminal which is to be replaced.
2- Deciding the production rule, by which, the non-terminal will be replaced.
♫ To decide which non-terminal to be replaced with production rule, we have two
options:
1- Left-most Derivation: if the sentential form of an input is scanned and replaced
from left to right, it is called left-most derivation.
Example: production rules:
E→ E+E
E→E*E

Page 63 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

E→id input string: id+id*id


The left-most derivation is:
E→E*E
E→E+E*E
E→id+ E*E
E→id+ id*E
E→id+ id*id

Notice that the left-most side non-terminal is always processed first.

2- Right-most Derivation: if the sentential form of an input is scanned and replaced


from right to left, it is called right-most derivation.
Example: production rules:
E→ E+E
E→E*E
E→id input string: id+id*id
The right-most derivation is:

E→E+ E
E→E+E*E
E→E+ E*id
E→E+ id*id
E→id+ id*id

♫ Remember: Parse Tree


♫ A parse tree is a graphical depiction of a derivation.
♫ It is convenient to see how strings are derived from the start symbol.
♫ The start symbol of the derivation becomes the root of the parse tree.
♫ In a parse tree:
➢ All leaf nodes are terminals.
➢ All interior nodes are non-terminals.

Page 64 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

➢ In order traversal gives original input string.


♫ A parse tree depicts associativity and precedence of operators.
♫ The deepest sub-tree is traversed first, therefore the operator in that sub-tree gets
precedence over the operator which is in the parent nodes.
♫ Let us see this by an example from the last example.
♫ We take the left-most derivation of a+b*c
♫ The left-most derivation is:

E→E*E

E→E+E*E

E→id+ E*E

E→id+ id*E

E→id+ id*id

▪ Step-1: E→E*E →→

▪ Step-2: E→E+E*E →→

▪ Step-3: E→id+ E*E →→

▪ Step-4: E→id+ id*E →→

Page 65 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

▪ Step-5: E→ id+ id*id →→

♫ A grammar G is said to be ambiguous if it has more than one parse tree (left or right
derivation) for at least one string.
♫ Example:
E→E+ E
E→E- E
E→id
For the string id+id-id, G will generate two parse tree:

♫ The language generated by ambiguous grammar is said to be inherently ambiguous.


Ambiguity in grammar is not good for a compiler construction.
♫ No method can detect and remove ambiguity automatically.
♫ But it can be removed by either:
➢ Re-witing the grammar without ambiguity, or
➢ By setting and following associativity and precedence constraints.

Types of Parsing
❖ Syntax analysers follow production rules defined by means of context-free grammar.
❖ The way the production rules are implemented (derivation) divides parsing into two
types:

Page 66 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

1- Top-down parsing and

2- Bottom-up parsing

When a software language is created, its creators must specify a set of rules. These rules
provide the grammar needed to construct valid statements in the language.

Consider the following is a set of grammatical rules for a simple fictional language that only
contains a few words:

<sentence> ::= <subject> <verb> <object>


<subject> ::= <article> <noun>
<article> ::= the | a
<noun> ::= dog | cat | person
<verb> ::= pets | fed
<object> ::= <article> <noun>

In this language, a sentence must contain a subject, verb and noun in that order, and specific
words are matched to the parts of speech.

A subject is an article followed by a noun.

A noun can be one of the following three words: dog, cat or person.

And a verb can only be pets or fed.

Parsing checks a statement that a user provides as input against these rules to prove that the
statement is valid.

Different parsing algorithms check in different orders. There are two main types of parsers:

• Top-down parsers. These start with a rule at the top, such as <sentence> ::= <subject>
<verb> <object>.
Given the input string "The person fed a cat," the parser would look at the first rule, and
work its way down all the rules checking to make sure they are correct.

Page 67 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

In this case, the first word is a <subject>, it follows the subject rule, and the parser will
continue reading the sentence looking for a <verb>.
i.e, top-down parsers begin their work at the start symbol of the grammar at the top of the
parse tree. They then work their way down from the rule to the sentence.
• Bottom-up parsers. These start with the rule at the bottom.
In this case, the parser would look for an <object> first, then look for a <verb> next and so
on. i.e, Bottom-up parsers work their way up from the sentence to the rule.

Beyond these types, it's important to know the two types of derivation. Derivation is the
order in which the grammar reconciles the input string. They are:

• LL parsers. These parse input from left to right using leftmost derivation to match the
rules in the grammar to the input. This process derives a string that validates the input
by expanding the leftmost element of the parse tree.
• LR parsers. These parse input from left to right using rightmost derivation. This process
derives a string by expanding the rightmost element of the parse tree.

In addition, there are other types of parsers, including the following:

• Recursive descent parsers. Recursive descent parsers backtrack after each decision
point to double-check accuracy. Recursive descent parsers use top-down parsing.
• Earley parsers. These parses all context-free grammars, unlike LL and LR parsers. Most
real-world programming languages do not use context-free grammars.
• Shift-reduce parsers. These shift and reduce an input string. At each stage in the
string, they reduce the word to a grammar rule. This approach reduces the string until
it has been completely checked.

Page 68 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

Parser

Top-down Parser Bottom-up Parser

Recursive descent LL(1) Parser LR Parser Operator


Parser precedence Parser

LR(0) SLR(1) LALR(1) CLR(1)

Parsers are used when there is a need to represent input data from source code abstractly as
a data structure so that it can be checked for the correct syntax. Coding languages and other
technologies use parsing of some type for this purpose.

Technologies that use parsing to check code inputs include the following:

➢ Programming languages. Parsers are used in all high-level programming languages,


including the following:

• C++
• Extensible Markup Language or XML
• Hypertext Markup Language or HTML
• Hypertext Preprocessor or PHP
• Java
• JavaScript
• JavaScript Object Notation or JSON
• Perl
• Python

➢ Database languages. Database languages such as Structured Query Language also use
parsers.

Page 69 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

➢ Protocols. Protocols like the Hypertext Transfer Protocol and internet remote function
calls use parsers.

Parser generator. Parser generators take grammar as input and generate source code, which
is parsing in reverse. They construct parsers from regular expressions, which are special
strings used to manage and match patterns in text.

1- Top down Parsing

 Top-down parsing is a method of parsing the input string provided by the lexical
analyzer.
 The top-down parser parses the input string and then generates the parse tree for it.
 In the top-down approach construction of the parse tree starts from the root node and
ends up creating the leaf nodes.
 It uses leftmost derivation to derive a string that matches the input string.
 Here the leaf node presents the terminals that match the terminals of the input string.

Problems of top-Down Parser:

1- Left Recursion
✓ A grammar G(V, T, P, S) is left recursive if it has a production rule in the form:
A→Aα|ß as the left of production is occurring at a first position on the right side of
production. And ß does not begin with an A.
✓ With left recursive problem, it becomes hard for top-down parser to judge
when to stop parsing the left non-terminal and it goes into an infinite loop.

Eliminate left recursion:

Left recursion can be eliminated by introducing new non-terminal A such that:

A→Aα|ß ➔ A→ ßA’

A’→αA’|Ɛ

Left recursion grammar Removed left recursion grammar

Page 70 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

✓ In left Recursive grammar, expansion of A will generate Aα, Aαα, Aααα at each step,
causing it to enter into an infinite loop.

Example:

Consider the Left Recursion from the Grammar:

E→E+T|T

T→T*F|F

F→(E)|id

Eliminate left recursion from the grammar.

Solution:

E→ TE’

E’→ +TE’|Ɛ

T→FT’

T’→*FT’|Ɛ

F→(E)|id

Page 71 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

2- Left Factoring

If more than one grammar production rules have a common prefix string, then the
top-down parser cannot make a choice as to which of the production it should take to
parse in hand.

Example:

If a top-down parser encounters a production like: A→αß | αϕ| …

Then it cannot determine which production to follow to parse the string as both
productions are starting from the same terminal (or non-terminal).

- To remove this confusion, we use a technique called left factoring.


- In this technique, we make one production for each common prefixes and the rest
of the derivation is added by new productions.
- Example:
- The grammar A→αß | αϕ|…
- Can be written as:
- A→αA’
- A’→ ß | ϕ|…
- Now the parser has only one production per prefix which makes it easier to take
decisions.

2- Bottom up parsing

Recursion-

Recursion can be classified into following three types-

Page 72 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

1. Left Recursion-

Earlier discussed.

Example-

S → Sa | ∈

(Left Recursive Grammar)

Left recursion is considered to be a problematic situation for Top down parsers.

Therefore, left recursion has to be eliminated from the grammar.

2. Right Recursion-

• A production of grammar is said to have right recursion if the rightmost variable


of its RHS is same as variable of its LHS.
• A grammar containing a production having right recursion is called as Right
Recursive Grammar.

Example-

S → aS / ∈
(Right Recursive Grammar)
• Right recursion does not create any problem for the Top down parsers.
• Therefore, there is no need of eliminating right recursion from the grammar.

3. General Recursion-

• The recursion which is neither left recursion nor right recursion is called as general
recursion.

Example-

S → aSb | ∈

Problem:

Consider the following grammar and eliminate left recursion-

Page 73 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

A → ABd | Aa | a

B → Be | b

Solution-

• The grammar after eliminating left recursion is-

A → aA’

A’ → BdA’ | aA’ | ∈

B → bB’

B’ → eB’ | ∈

Problem:

Consider the following grammar and eliminate left recursion-

E→E+E|ExE|a

Solution-

The grammar after eliminating left recursion is-

E → aA

A → +EA | xEA | ∈

Problem:

Consider the following grammar and eliminate left recursion-

E→E+T|T

T→TxF|F

F → id

Solution-

Page 74 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

The grammar after eliminating left recursion is-

E → TE’

E’ → +TE’ | ∈

T → FT’

T’ → xFT’ | ∈

F → id

Problem:

Consider the following grammar and eliminate left recursion-

S → (L) | a

L→L,S|S

Solution-

The grammar after eliminating left recursion is-

S → (L) | a

L → SL’

L’ → ,SL’ | ∈

Problem:

Consider the following grammar and eliminate left recursion-

S → S0S1S | 01

Solution-

The grammar after eliminating left recursion is-

S → 01A A → 0S1SA | ∈

Page 75 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

Problem:

Consider the following grammar and eliminate left recursion-

S→A

A → Ad | Ae | aB | ac

B → bBc | f

Solution-

The grammar after eliminating left recursion is-

S→A

A → aBA’ | acA’

A’ → dA’ | eA’ | ∈

B → bBc | f

Problem:

Consider the following grammar and eliminate left recursion-

A → AAα | β

Solution-

The grammar after eliminating left recursion is-

A → βA’
A’ → AαA’ | ∈
Problem:
Consider the following grammar and eliminate left recursion-

A → Ba | Aa | c

B → Bb | Ab | d

Page 76 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

Solution-

This is a case of indirect left recursion.

Step-01:

First let us eliminate left recursion from A → Ba | Aa | c

Eliminating left recursion from here, we get-

A → BaA’ | cA’

A’ → aA’ | ∈

Now, given grammar becomes-

A → BaA’ | cA’

A’ → aA’ | ∈

B → Bb | Ab | d

Step-02:

Substituting the productions of A in B → Ab, we get the following grammar-

A → BaA’ | cA’

A’ → aA’ | ∈

B → Bb | BaA’b | cA’b | d

Step-03:

Now, eliminating left recursion from the productions of B, we get the following grammar-

A → BaA’ | cA’

A’ → aA’ | ∈

B → cA’bB’ | dB’

Page 77 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

B’ → bB’ / aA’bB’ | ∈

This is the final grammar after eliminating left recursion.

Problem:

Consider the following grammar and eliminate left recursion-

X → XSb | Sa | b

S → Sb | Xa | a

Solution-

This is a case of indirect left recursion.

Step-01:

First let us eliminate left recursion from X → XSb | Sa | b

Eliminating left recursion from here, we get-

X → SaX’ | bX’

X’ → SbX’ | ∈

Now, given grammar becomes-

X → SaX’ | bX’

X’ → SbX’ | ∈

S → Sb | Xa | a

Step-02:

Substituting the productions of X in S → Xa, we get the following grammar-

X → SaX’ | bX’

Page 78 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

X’ → SbX’ | ∈

S → Sb | SaX’a | bX’a | a

Step-03:

Now, eliminating left recursion from the productions of S, we get the following grammar-

X → SaX’ | bX’

X’ → SbX’ | ∈

S → bX’aS’ | aS’

S’ → bS’ | aX’aS’ | ∈

This is the final grammar after eliminating left recursion.

Problem:

Consider the following grammar and eliminate left recursion-

S → Aa |b

A → Ac | Sd | ∈

Solution-

This is a case of indirect left recursion.

Step-01:

First let us eliminate left recursion from S → Aa | b

This is already free from left recursion.

Step-02:

Substituting the productions of S in A → Sd, we get the following grammar-

Page 79 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

S → Aa | b

A → Ac | Aad | bd | ∈

Step-03:

Now, eliminating left recursion from the productions of A, we get the following grammar-

S → Aa | b

A → bdA’ | A’

A’ → cA’ | adA’ | ∈

This is the final grammar after eliminating left recursion.

FIRST & FOLLW


First and Follow sets are needed so that the parser can properly apply the needed
production rule at the correct position.

An important part of parser table construction is to create first and follow sets.

FIRST Function:

First(α) is a set of terminal symbols that begin in strings derived from α.

Example:

Consider the production rule: A→abc| def| ghi

➔First(A)= {a, d, g}

FIRST Calculation Rules:


Rule 1: For a production rule X → ∈, First(X) = {∈ }

Rule 2: For any terminal symbol ‘a’, First(a)={a}

Page 80 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

Rule 3: For a production rule X → Y1Y2Y3

-if ∈ ∉ First(Y1), then First(X) = First(Y1)

- If ∈ ∈ First(Y1), then First(X) = { First(Y1) – ∈ } ∪ First(Y2Y3)

make expansion for any production rule X → Y1Y2Y3…..Yn.

FOLLOW Function:
FOLLOW(α) is a set of terminal symbols that appear immediately to the right of α.

FOLLOW Calculation Rules:

Rule 1: For the start symbol S, place $ in Follow(S).

Rule 2: For any production rule A → αB, Follow(B) = Follow(A)

Rule 3: For any production rule A → αBβ,

- If ∈ ∉ First(β), then Follow(B) = First(β)


- If ∈ ∈ First(β), then Follow(B) = { First(β) – ∈ } ∪ Follow(A)

NT ‫بنفس ترتيب‬
E ‫ هو‬start ‫ في المثال التالي ال‬FOLLOW ‫ في ال‬$ ‫ (أول عنصر في القاعدة) دائما له‬Start Symbol S
FOLLOW(E) has 3 cases
|
|
Follow(E)={+} ‫ فنأخذه أي أن‬+ ‫ مثل‬Terminal ‫ يتبعها رمز طرفي‬E -1 |
‫ أحسب‬F→Ee ‫ مثال‬LHS ‫ لل‬Follow ‫ يوجد فراغ ∈ أعود أحسب‬E ‫ بعد‬-2 |
Follow(E)=Follow(F) ‫ ويكون‬Follow(F)
First(T) has ∈ : ‫ يوجد حالتين‬First(T) ‫ هنا فنبحث عن‬T ‫ اخر هو‬NT ‫ رمز‬E ‫ بعد‬A→ ET -3 |
so Follow(E)=First(T) -∈ ⋃Follow(T)
‫أو‬

Page 81 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

First(T) has no ∈ then Follow(E)=First(T)


Note:

1- ∈ may appear in the first function of a non-terminal.


- But ∈ will never appear in the follow function of a non-terminal.
2- Before calculating the first and follow functions, eliminate Left Recursion from the
grammar, if present.
3- We calculate the follow function of a non-terminal by looking where it is present on
the RHS of a production rule.

Example:

Consider the grammar:

E → TE’
E’ → +TE’| ∈
T→ FT’
T’→*FT’ | ∈
F→ (E) | id

‫ فنجد أول ظهور لها يمين في القاعدة الخامسة‬RHS ‫ في الجرامر ناحية اليمين‬E ‫نبحث عن‬Follow(E)={$, …}
F→(E)| id

‫) فنأخذه‬1 ‫(حالة‬-- ( ‫ هو‬Terminal ‫ رمز طرفي‬E ‫أي أن يلي‬

So, Follow(E)= {$, )}

To calculate Follow(E’):

‫ اي‬follow(LHS) ‫) فنحسب‬2 ‫ فراغ (حالة‬E’ ‫ ويلي‬E→TE’ ‫ يمين في قاعدة الجرامر األولي‬E’ ‫أول مرة تظهر‬
FOLLOW(E)

So, Follow(E’)=Follow(E)={$, )}

‫ الحالة الثالثة فنأخذ‬Non-terminal ‫ يليها‬E→TE’ ‫ يمين في القاعدة االولي‬T ‫ أول ظهور ل‬:Follow(T) ‫لحساب‬
First(E’)

:‫ طالما بها ∈ نطبق‬Follow(T)=First(E’)={+, ∈}

Page 82 Chapter Five- Parsing 2024-2025


‫‪Compiler Design‬‬ ‫‪Dr Heba El Hadidi‬‬

‫}) ‪Follow(T)={First(E’)- ∈}⋃Follow(E’)={+} ⋃{$, )}= {+, $,‬‬

‫لحساب )’‪:Follow(T‬‬

‫نستخدم القاعدة ’‪ T→FT‬أول ظهور لل ’‪ T‬يمين‬

‫’‪ T‬يليها فراغ (الحالة الثانية) فنحسب )‪ Follow(LHS‬أي )‪ Follow(T‬وبالتالي‪:‬‬

‫}‪Follow(T’)=Follow(T)={+, ), $‬‬

‫لكن ’‪ T‬ظهرت يمين في جملة تانية لنفس الجرامر ∈ |’‪ T’→*FT‬ونالحظ أن ’‪ T‬اليمين بعدها فراغ (الحالة الثانية) فناخذ‬
‫)’‪ Follow(LHS)= Follow(T‬وهي نفسها المطلوبة فلن تفيدنا‬

‫أي أن‪ Follow(T’)= {+, ), $} :‬فقط‬

‫لحساب )‪:Follow(F‬‬

‫أول ظهور ل ‪ F‬يمين في القاعدة ’‪ T→FT‬يلي ‪ f‬رمز ‪ NT‬هو ’‪( T‬الحالة الثالثة) فنحسب )’‪ FIRST(T‬وهو } ∈‪ {*,‬به‬
‫∈ فنحتاج أن نحذف ∈ ونحسب )’‪ Follow(T‬كاالتي‪:‬‬

‫}‪Follow(F)={First(T’)- ∈}⋃Follow(T’)= {*}⋃{+, ), $}= {+, *, ), $‬‬

‫هل هناك ‪ F‬أخرى يمين؟ ال فنقف‬

‫نكون االن ال ‪Parse Table‬‬

‫‪Example:‬‬

‫‪Consider the grammar:‬‬

‫‪S → iEtSS1 | a‬‬

‫∈ |‪S1 → eS‬‬

‫‪E→b‬‬

‫‪Solution:‬‬

‫‪First(S)={i, a} #from rule 1‬‬

‫}∈ ‪First(S1)={e,‬‬

‫}‪First(E)={b‬‬

‫‪Page 83‬‬ ‫‪Chapter Five- Parsing‬‬ ‫‪2024-2025‬‬


Compiler Design Dr Heba El Hadidi

Example:

Consider the production rules:

S → aBDh

B → cC

C → bC | ∈

D → EF

E→g|∈

F → f |∈ Calculate First and Follow functions

Solution:

➢ First(S) = { a }
➢ First(B) = { c }
➢ First(C) = { b , ∈ }
➢ First(D) = { First(E) – ∈ } ∪ First(F) = { g , f , ∈ }
➢ First(E) = { g , ∈ }
➢ First(F) = { f , ∈ }

➢ Follow(S) = { $ }
➢ Follow(B) = { First(D) – ∈ } ∪ First(h) = { g , f , h }
➢ Follow(C) = Follow(B) = { g , f , h }
➢ Follow(D) = First(h) = { h }
➢ Follow(E) = { First(F) – ∈ } ∪ Follow(D) = { f , h }
➢ Follow(F) = Follow(D) = { h }

Example:

Consider the following grammar:

S → bXY

Page 84 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

X→b|c

Y → b |∈ Calculate First and Follow functions

Solution:

Non-Terminals First Follow


S b $
X b,c b,$
Y b, ∈ $
Example:

Consider the following grammar:

S → aXb|X

X→cXb|b

X→bXZ

Z→n

Non-Terminals First Follow


S a, b,c $, b
X b,c b,n,$
Z n b,n,$
Example:

Consider the following grammar:

S → ABb|bc

A→abAB |∈

B→bc|cBS Calculate First and Follow functions.

Non-Terminals First Follow


S a, b,c $, a, b, c
A a, ∈ b, c
B b, c a, b, c
Example: Find First and Follow for the following grammar:

Page 85 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

E → E+E|E*E|(E)|id

This grammar is left-recursive, ambiguous. So it needs to be modified before we build a


predictive parser.

1- Remove ambiguity.
E→E+T
T→T*F
F→(E)|id
2- Remove left-Recursion
E→TE’
E’→+TE’| ∈
T→FT’
T’→*FT’| ∈
F→(E)|id
Now calculate First and Follow:
First(E)=First(T)=First(F)={(, id}
First(E’)={+, ∈}
First(T’)={*, ∈}
Follow(E)=Follow(E’)={$, )}
Follow(T)=Follow(T’)={+, $, )}
Follow(F)={*, +, $, )}

Example: Find First and Follow for the following grammar:

S→bSX|Y

X→XC|bb

𝑋̅→C𝑋̅| ∈

Y→b| bY

C→ccC| CX| cc

Page 86 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

Non-Terminals First Follow


S b $, b
X b $, b, c
𝑋̅ c, ∈ $, b, c
Y b $, b
C c $, b, c
Example: Consider the grammar:

S→ (L)|a

L→ L, S| S

Compute the function First for all non-terminals.

First(S)= {(, a} First(L)=First(S)= {(, a}

Example: Consider the grammar:

S→ cAd

A→aA’

A’→b| ∈ Compute the function First for all non-terminals.

First(S)={c}

First(A)={a} First(A’)={b, ∈}

Example: Consider the grammar:

S→ L=R| R

L→*R| id

R→L Compute the function First for all non-terminals.

First(S)=First(L)=First(R)= {*, id}

Question:

Consider the following grammar and a partial LL(1) parsing table:

Page 87 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

S→aAbB| bAaB| ∈

A→S

B→S

a b $
S E1 E2 S→∈
A A→S A→S Error
B B→S B→S E3
Entries E1, E2 and E3 are needed to be filled. ∈ is the empty string, $ indicates end of
input.

The FIRST and FOLLOW sets for the non-terminals A and B.

FIRST(A)=FIRST(S)=FIRST(B)={a, b, ∈}

FOLLOW(A)={FIRST(bB), FIRST(aB)}={a, b}

FOLLOW(B)=FOLLOW(S)={FOLLOW(A), $}={a, b, $}

Example:

Calculate the first and follow functions for the given grammar:

S→A

A → aB | Ad

B→b

C→g

• The given grammar is left recursive.


• So, we first remove left recursion from the given grammar.

After eliminating left recursion, we get the following grammar-

S→A

A → aBA’

Page 88 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

A’ → dA’ | ∈

B→b

C→g

Then:

• First(S) = First(A) = { a }
• First(A) = { a }
• First(A’) = { d , ∈ }
• First(B) = { b }
• First(C) = { g }
• Follow(S) = { $ }
• Follow(A) = Follow(S) = { $ }
• Follow(A’) = Follow(A) = { $ }
• Follow(B) = { First(A’) – ∈ } ∪ Follow(A) = { d , $ }
• Follow(C) = NA

Example:

Calculate the first and follow functions for the following grammar:
S → (L) | a

L → SL’

L’ → ,SL’ | ∈

Solution-

The first and follow functions are as follows-

First(S) = { ( , a }

First(L) = First(S) = { ( , a }

First(L’) = { , , ∈ }

Page 89 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

Follow(S) = { $ } ∪ { First(L’) – ∈ } ∪ Follow(L) ∪ Follow(L’) = { $ , , , ) }

Follow(L) = { ) }

Follow(L’) = Follow(L) = { ) }

Example:

Calculate the first and follow functions for the following grammar:
S → (L) | a

S → AaAb |BbBa

A→∈

B→∈

Solution-

The first and follow functions are as follows-

First(S) = { First(A) – ∈ } ∪ First(a) ∪ { First(B) – ∈ } ∪ First(b) = { a , b }

First(A) = { ∈ }

First(B) = { ∈ }

Follow(S) = { $ }

Follow(A) = First(a) ∪ First(b) = { a , b }

Follow(B) = First(b) ∪ First(a) = { a , b }

Problem:

Calculate the first and follow functions for the given grammar-

E→E+T|T

T→TxF|F

Page 90 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

F → (E) | id

Solution-

The given grammar is left recursive.

So, we first remove left recursion from the given grammar.

After eliminating left recursion, we get the following grammar-

E → TE’

E’ → + TE’ | ∈

T → FT’

T’ → x FT’ / ∈

F → (E) / id

Now, the first and follow functions are as follows-

First(E) = First(T) = First(F) = { ( , id }

First(E’) = { + , ∈ }

First(T) = First(F) = { ( , id }

First(T’) = { x , ∈ }

First(F) = { ( , id }

Follow(E) = { $ , ) }

Follow(E’) = Follow(E) = { $ , ) }

Follow(T) = { First(E’) – ∈ } ∪ Follow(E) ∪ Follow(E’) = { + , $ , ) }

Follow(T’) = Follow(T) = { + , $ , ) }

Follow(F) = { First(T’) – ∈ } ∪ Follow(T) ∪ Follow(T’) = { x , + , $ , ) }

Page 91 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

Example:

Calculate the first and follow for the following grammar:

S→ ACB| Cbb|Ba

A→da| BC

B→g|∈

C→h|∈

Solution:

FIRST(S)= { First(A) – ∈ } ∪ { First(C) – ∈ } ∪ First(B) ∪ First(b) ∪ { First(B) – ∈ } ∪ First(a) ={d, g,


h, ∈, b, a}

FIRST(A)={d} U{FIRST(B)- ∈} UFIRST(C)= {d, g, h, ∈}

FIRST(B)= {g, ∈}

FIRST(C)= {h, ∈}

FOLLOW(S)={$}

FOLLOW(A)={h, g, $)

FOLLOW(B)={a, $, h, g}

FOLLOW(C)= {b, g,$, h}

Example:

Calculate the first for the following grammar:

S→ aSb| ba|∈

Solution:

FIRST(S)= {a,b, ∈}

Example:

Page 92 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

Consider the following grammar:

S→ TabS| X

T→cT|∈

X→b| bX

Solution:

FIRST(S)= {c, a, b}

FIRST(T)={c, ∈}

FIRST(X)={b}

Example:

Consider the following grammar:

S→ AB| bS

A→aB| BB

B→b| cB

Solution:

FIRST(S)= {a, b, c}

FIRST(A)={a, b, c}

FIRST(B)={b, c}

Example:

Consider the following grammar:

S→ XYB| ccb

X→xX| ∈

Y→yY| Xy|∈

Page 93 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

B→bbc|b

Solution:

FIRST(S)= {c, x, y, b}

FIRST(X)={x, ∈}

FIRST(Y)={y, x, ∈}

FIRST(B)={b}

Example:

Consider the following grammar:

S→ abS| bX

X→∈| cN

N→Nb| c

Solution:

There is left recursion problem, eliminate it before finding FIRST function.

S→ abS| bX

X→∈| cN

N→cN’

N’→ bN’| ∈

FIRST(S)= {a, b}

FIRST(X)={c, ∈}

FIRST(N)={c}

FIRST(N’)={b, ∈}

Page 94 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

Example:

Calculate FIRST and FOLLOW sets:

S→ XYZ

X→x | λ

Y→y | λ

Z→z

Solution:

S has shape NT→Y1Y2Y3 then First(X)={x, λ} so First(S)={First(X)- λ}⋃First(YZ)→*

First(Y) ={y, λ} ➔ * First(S)={ {x, λ}- λ}⋃{{y, λ}- λ}⋃First(Z)={x, y, z}

So,

First(S)= {x, y, z} First(X)={x, λ} First(Y)={y, λ} First(Z)={z}

Example:

Calculate FIRST and FOLLOW sets:

S→ XYZ

X→x | λ

Y→y | λ

Z→z| λ

Solution:

S has shape NT→Y1Y2Y3 then First(X)={x, λ} so First(S)={First(X)- λ}⋃First(YZ)→*

First(Y) ={y, λ} ➔ * First(S)={ {x, λ}- λ}⋃{{y, λ}- λ}⋃First(Z)={x, y, z, λ} as

First(Z)={z, λ }

So,

Page 95 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

First(S)= {x, y, z} First(X)={x, λ} First(Y)={y, λ} First(Z)={z, λ }

Example:

Calculate FIRST and FOLLOW sets:

S→ XYZa

X→Y

Y→y | λ

Z→z| λ

Solution:

First(X)= {y, λ } First(Y)={ y, λ} First(Z)={ z, λ}

First(S)={First(X)- λ }⋃First(YZa)={y}⋃{First(Y)- λ}⋃First(Za)={y}⋃{{ z, λ}-


λ}⋃First(a)={y, z, a}

Example:

Calculate FIRST and FOLLOW sets:

S→ XaYZ

X→Y

Y→y | λ

Z→z| λ

Solution:

First(X)= {y, λ} First(Y)={ y, λ} First(Z)={z, λ}

First(S)={First(X)- λ }⋃First(aYZ)={y}⋃First(a)= {y, a} as λ∉First(a), a Terminal

Page 96 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

Exercise:

Calculate FIRST and FOLLOW sets:

S→ aBDh

B→cC

C→bC|∈

D→EF

E→ g|∈

F→ f|∈

Example:

Page 97 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

Example:

Types of Top-Down parser:

Page 98 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

Recursive Descent parsing:

▪ Recursive descent is a top-down technique that constructs the parse tree from the
top and input is read from left to right.
▪ It uses procedures for every terminal and non-terminal entity.
▪ This parsing technique recursively parses the input to make a parse tree, which may
or may not require back-tracking.
1- Back-tracking
▪ In Top-Down Parsing with Backtracking, parser will attempt multiple rules or
production to identify the match for input string by backtracking at every step of
derivation.
▪ So, if the applied production does not give the input string as needed, or it does not
match with the needed string, then it can undo that shift.
▪ Example:
Consider the grammar:
S→aAd
A→bc| b
Make parse tree for the string ‘abd’.
Also show parse tree when backtracking is required when the wrong alternative is
chosen.
Solution:
- The derivation for the string abd will be:
- S→ aAd → abd (required string)
- If bc is substituted in place of non-terminal A then the string obtained will be abcd.
- S→aAd → abcd (Wrong Alternative (needs backtrack))

Page 99 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

Limitations of top-down Parsing with Bcktracking:

 Backtracking looks very simple and easy to implement but choosing a different
alternative causes lot of problems as:
o Undoing semantic actions requires a lot of overhead.
o Entries made in the symbol table during parsing have to be removed.
 Due to these reasons, backtracking is not used for practical compilers.

2- Predictive Parser (without backtracking)


• A form of recursive-descent parsing that does not require any back-tracking is
known as predictive parsing.
• It has the capability to predict which production is to be used to replace the input
string.
• The predictive parser uses a look-ahead pointer, which points to the next input
symbols.
• To make the parser back-tracking free, the predictive parser puts some
constraints on the grammar and accepts only a class of grammar known as
LL(k) grammar.
• Predictive parsing uses stack, input buffer, and a parsing table to parse the input
and generate a parse tree. Both the stack and the input contain an end symbol $ to
denote that the stack is empty and the input is consumed.
• The parser refers to the parsing table to take any decision on the input and stack
element combination.

Page 100 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

LL Parser
• An LL parser accepts LL grammar and used to implement predictive parser.
➢ LL grammar is a subset of context-free grammar but with some restrictions to
get the simplified version in order to achieve easy implementation.
➢ LL grammar can be implemented by means of both algorithms namely
recursive-descent or table driven.
• LL parser is denoted as LL(k).
➢ The first L in LL(k) is parsing the input from left to right.
➢ The second L in LL(k) stands for left-most derivation
➢ K represents the number of look ahead. Generally k=1, so LL(k) may also be
written as LL(1).

Algorithm to implement Predictive parsing -LL (1)


Step 1: Elimination of left recursion.
Step 2: Left factoring.
Step 3: First and Follow function.

Page 101 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

Step 4: Predictive parsing table.


Step 5: Parse the input string.
So,
Step 1: First check for left recursion in the grammar, if there is left recursion in the
grammar remove that and go step 2.

Step 2: Calculate First() and Follow() for all non-terminals.

Step 3: For each production A→α (A tends to alpha)

- Find First(α) and for each terminal in First(α), make entry A→α in the table.
- If First(α) contains Є as terminal then find Follow(A) and for each terminal in
Follow(A) make entry A→α in the table.
- If First(α) contains Є and Follow(A) contains $ as terminal, then make entry A→α in
the table for the $.
- To construct the parsing table, we have two functions:
♫ In the table, rows will contain the Non-Terminals and the columns will contain
the Terminal Symbols. All the Null productions of the grammars will go under
the Follow elements and the remaining productions will lie under the elements
of the First set.

Page 102 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

Example on Predictive parsing -LL (1)


Check LL(1) parser for the following grammar, and ckeck validity of the input string
W=id*id+id.
E→ E+T| T

T→ T*F| F

F→id| (E)

Solution:

1- ꓱ left recursion, eliminate it

E→TE’
E’→+TE’| Ɛ
T→FT’
T’→*FT’| Ɛ
F→ id| (E)
2- No left factoring
3- Calculate First and Follow sets:
Non-Terminals First Follow
E→TE’ {id, ( } {$, )}
E’→+TE’| Ɛ {+, Ɛ} {$, )}
T→FT’ {id, (} {+, $, )}
T’→*FT’| Ɛ {*, Ɛ} {+, $, )}
F→ id| (E) {id, (} {*, +, $, )}
4- Predictive Parsing Table
id + * ( ) $
E E→TE’ E→TE’
E’ E’→+TE’ E’→ Ɛ E’→ Ɛ
T T→FT’ T→FT’
T’ E’→ Ɛ T’→*FT’ T’→ Ɛ T’→ Ɛ
F F→id F→(E)

Page 103 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

5- Parse the input string W= id*id+id


Stack Input Output After parsing we get:
$E id*id+id$ E→TE’ W= id*id+id
$E’T id*id+id$ T→ F T’
$E’T’F id*id+id$ F→id The parse Tree
$E’T’id *id+id$ Remove id implemented based
$E’T’ *id+id$ T’→*FT’ on the output
$E’T’F* id+id$ Remove *
$E’T’F id+id$ F→id
$E’T’id +id$ Remove id
$E’T’ +id$ T’→ Ɛ
$E’ +id$ E’→+TE’
$E’T+ id$ Remove +
$E’T id$ T→FT’
$E’T’F id$ F→id
$E’T’id $ Remove id
$E’T’ $ T’→ Ɛ
$E’ $ E’→id
$ $
Example2 on Predictive parsing -LL (1)
Construct a predictive parsing table for the givern grammar or check if the given grammar is
LL(1) or not.

S→ iEtSS’| a
S’→eS| Ɛ
E→ b
Solution:
First(E)=={b}
First(S’)={e, Ɛ}
First(S)={i, a}

Page 104 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

Follow(S)={e, $}
Follow(S’)={e, $}
Follow(E)={t}
The parsing table for this grammar is:
a b e i t $
S S→a S→iEtSS’
S’ S’→ Ɛ S’→ Ɛ
S’→ e
E E→b
As the table has multiply defined entry, the given grammar is not LL(1).

Example3 on Predictive parsing -LL (1)


Construct the First and Follow and predictive parse table for the given grammar, and ckeck
validity of the input string abdcdc$.
S→ AC$
C→c| Ɛ
A→aBCd| BQ| Ɛ
B→bB| d
Q→ q
Solution:
First(Q)={q}
First(B)={b, d}
First(C)={c, Ɛ }
First(A)= {a, i}UFirst(BQ)={a, Ɛ }UFirst(B)={a, Ɛ,b,d}
First(S)= {a, Ɛ,b,d}
Follow(S)={$}
Follow(A)={First(C)- Ɛ}={d, $}
Follow(Q)={c, $}

Page 105 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

Follow(C)={…}
Follow(B)={…}
The parse table for this grammar is:
a b c d q $
S S→ AC$ S→ AC$ S→ AC$ S→ AC$ S→ AC$
A A→aBCd A→ BQ A→ Ɛ A→ BQ A→ Ɛ
B B→bB B→ d
C C→c C→Ɛ C→Ɛ

Q Q→ q
For the input string abdcdc$:
Stack Input Output
$S abdcdc$ S→ AC$
$CA abdcdc$ A→aBCd
$CdCBa abdcdc$ Pop a
$CdCB bdcdc$ B→bB
$CdCBb bdcdc$ Pop b
$CdCB dcdc$ B→d
$CdCd dcdc$ Pop d
$CdC cdc$ C→c
$Cdc cdc$ Pop C
$Cd dc$ Pop d
$C c$ C→c
$c c$ Pop c
$ $ Pop $
Accepted

Page 106 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

Reductions
Bottom-up parsing
- Reducing a string w to the start symbol
- At each reduction step, a particular matching the RHS of a production is replaced by
the LHS.
- Right-most derivation is traced out in reverse.
- E.g.: abbcde
S→ aABe aAbcde
A→Abc | b aAde
B→ d aABe
S
abbcde can be reduced to S

Syntax tree
Syntax trees are abstract or compact representation of parse trees.
They are also called as Abstract Syntax Trees.
Example-

Considering the following grammar-


E→E+T|T
T→TxF|F
F → ( E ) | id
Generate the following for the string id + id x id

Page 107 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

1. Parse tree
2. Syntax tree
3. Directed Acyclic Graph (DAG)
Solution-

Parse Tree-

Syntax Tree-

Directed Acyclic Graph-

Construct a syntax tree for the following arithmetic expression-


( a + b ) * ( c – d ) + ( ( e / f ) * ( a + b ))
We convert the given arithmetic expression into a postfix expression as-

Page 108 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

(a+b)*(c–d)+((e/f)*(a+b))
ab+ * ( c – d ) + ( ( e / f ) * ( a + b ) )
ab+ * cd- + ( ( e / f ) * ( a + b ) )
ab+ * cd- + ( ef/ * ( a + b ) )
ab+ * cd- + ( ef/ * ab+ )
ab+ * cd- + ef/ab+*
ab+cd-* + ef/ab+*
ab+cd-*ef/ab+*+

We draw a syntax tree for the above postfix expression.


Start pushing the symbols of the postfix expression into the stack one by one.
When an operand is encountered,
 Push it into the stack.
When an operator is encountered
 Push it into the stack.
 Pop the operator and the two symbols below it from the stack.
 Perform the operation on the two operands using the operator you have in hand.
 Push the result back into the stack.
Continue in the similar manner and draw the syntax tree simultaneously.

The required syntax tree is-

Page 109 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

 There may exist derivations for a string which are neither leftmost nor rightmost.

Example:
Consider the following grammar:
S→ABC
A→a
B→b
C→c
Consider s string w=abc
Total 6 derivations exist for string w.
The following 4 derivations are neither leftmost nor rightmost.
Derivation 1:
S → ABC
→ aBC (Using A → a)
→ aBc (Using C → c)
→ abc (Using B → b)
Derivation 2:
S → ABC
→ AbC (Using B → b)
→ abC (Using A → a)
→ abc (Using C → c)
Derivation 3:
S → ABC
→ AbC (Using B → b)
→ Abc (Using C → c)
→ abc (Using A → a)
The other 2 derivations are leftmost derivation and rightmost derivation.

Page 110 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

• Leftmost derivation and rightmost derivation of a string may be exactly same.

• In fact, there may exist a grammar in which leftmost derivation and rightmost
derivation is exactly same for all the strings.

Example
Consider the following grammar-
S → aS | ∈
The language generated by this grammar is-
L = { an , n>=0 } or a*
All the strings generated from this grammar have their leftmost derivation and rightmost
derivation exactly same.
Let us consider a string w = aaa.

Leftmost Derivation:
S → aS
→ aaS (Using S → aS)
→ aaaS (Using S → aS)
→ aaa∈
→ aaa

Rightmost Derivation:
S → aS
→ aaS (Using S → aS)
→ aaaS (Using S → aS)
→ aaa∈
→ aaa

Page 111 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

Clearly,
Leftmost derivation = Rightmost derivation
Similar is the case for all other strings.

Page 112 Chapter Five- Parsing 2024-2025


Compiler Design Dr Heba El Hadidi

Chapter Six- LL(1) Predictive Parser

Compiler is

LL(1) parser use the following Data structures:

1- Input buffer
2- Stack
3- Parsing table

Page 113 Chapter Six- LL(1) Predictive Parser 2024-2025


Compiler Design Dr Heba El Hadidi

Example:
Construct Productive Parsing Table for the following grammar:

S→ iEtSS1 | a

S1→ eS1| λ

E→ b

Solution:

Calculate First , Follow

LL(1) grammar ‫ فان الجرامر ليس‬:‫لوجود أكثر من قاعدة في خلية بالجدول‬

Page 114 2024-2025


Compiler Design Dr Heba El Hadidi

Chapter Seven- LR Parsers

The LR parser is an efficient bottom-up syntax analysis technique that can be used for a large
class of context-free grammar. This technique is also called LR(0) parsing.
L stands for the left to right scanning
R stands for rightmost derivation in reverse
0 stands for no. of input symbols of lookahead.
LR(0) is the simplest technique in the LR family. Although that makes it the easiest to learn, t
hese parsers are too weak to be of practical use for anything but a very limited set of gramma
rs.
Augmented grammar :
If G is a grammar with starting symbol S, then G’ (augmented grammar for G) is a grammar
with a new starting symbol S‘ and productions S’-> .S . The purpose of this new starting
production is to indicate to the parser when it should stop parsing. The ‘ . ‘ before S indicates
the left side of ‘ . ‘ has been read by a compiler and the right side of ‘ . ‘ is yet to be read by a
compiler.
Steps for constructing the LR parsing table :
1. Writing augmented grammar
2. LR(0) collection of items to be found
3. Defining 2 functions: goto(list of non-terminals) and action(list of terminals) in the
parsing table.
Q. Construct an LR parsing table for the given context-free grammar –
S–>AA
A–>aA|b
Solution :
STEP 1- Find augmented grammar –
The augmented grammar of the given grammar is:-
S’–>.S [0th production]
S–>.AA [1st production]
A–>.aA [2nd production]
A–>.b [3rd production]
STEP 2 – Find LR(0) collection of items
Below is the figure showing the LR(0) collection of items. We will understand everything one
by one.

Page 115 Chapter Seven- LR Parsers 2024-2025


Compiler Design Dr Heba El Hadidi

The terminals of this grammar are {a,b}


The non-terminals of this grammar are {S,A}
RULE – if any nonterminal has ‘ . ‘ preceding it, we have to write all its production and add ‘
. ‘ preceding each of its-production.
RULE – from each state to the next state, the ‘ . ‘ shifts to one place to the right.
• In the figure, I0 consists of augmented grammar.
• Io goes to I1 when ‘ . ‘ of 0th production is shifted towards the right of S(S’->S.).
This state is the accepted state.
S is seen by the compiler
• Io goes to I2 when ‘ . ‘ of 1st production is shifted towards the right (S->A.A) . A is
seen by the compiler

Page 116 Chapter Seven- LR Parsers 2024-2025


Compiler Design Dr Heba El Hadidi

• I0 goes to I3 when ‘ . ‘ of the 2nd production is shifted towards the right (A->a.A) .
a is seen by the compiler.
• I0 goes to I4 when ‘ . ‘ of the 3rd production is shifted towards the right (A->b.) . b
is seen by the compiler.
• I2 goes to I5 when ‘ . ‘ of 1st production is shifted towards the right (S->AA.) . A is
seen by the compiler
• I2 goes to I4 when ‘ . ‘ of 3rd production is shifted towards the right (A->b.) . b is
seen by the compiler.
• I2 goes to I3 when ‘ . ‘ of the 2nd production is shifted towards the right (A->a.A) .
a is seen by the compiler.
• I3 goes to I4 when ‘ . ‘ of the 3rd production is shifted towards the right (A->b.) . b
is seen by the compiler.
• I3 goes to I6 when ‘ . ‘ of 2nd production is shifted towards the right (A->aA.) . A is
seen by the compiler
• I3 goes to I3 when ‘ . ‘ of the 2nd production is shifted towards the right (A->a.A) .
a is seen by the compiler.

STEP3 – defining 2 functions: goto[list of non-terminals] and action[list of terminals] in the


parsing table
• $ is by default a nonterminal that takes the accepting state.
• 0,1,2,3,4,5,6 denotes I0,I1,I2,I3,I4,I5,I6
• I0 gives A in I2, so 2 is added to the A column and 0 rows.
• I0 gives S in I1, so 1 is added to the S column and 1 row.
• similarly, 5 is written in A column and 2nd row, 6 is written in A column and 3
rows.
• I0 gives an in I3 to .so S3(shift 3) is added to a column and 0 rows.
• I0 gives b in I4, so S4(shift 4) is added to the b column and 0 rows.
• Similarly, S3(shift 3) is added on a column and 2,3 rows, S4(shift 4) is added on b
column and 2,3 rows.
• I4 is reduced state as ‘ . ‘ is at the end. I4 is the 3rd production of grammar. So
write r3(reduce 3) in terminals.
• I5 is reduced state as ‘ . ‘ is at the end. I5 is the 1st production of grammar. So
write r1(reduce 1) in terminals.
• I6 is reduced state as ‘ . ‘ is at the end. I6 is the 2nd production of grammar. So
write r2(reduce 2) in terminals.

Page 117 Chapter Seven- LR Parsers 2024-2025


Compiler Design Dr Heba El Hadidi

As each cell has only 1 value in it, hence, the given grammar is LR(0).

Example:

(0) S 0 → S$

(1) S → AA

(2) S → bc

(3) S → baA

(4) A → c

Page 118 Chapter Seven- LR Parsers 2024-2025


Compiler Design Dr Heba El Hadidi

Example:

Page 119 Chapter Seven- LR Parsers 2024-2025


Compiler Design Dr Heba El Hadidi

Example:

Example:

(0) E ‘ → E $

(1) E → E + T

(2) E → T

(3) T → T * id

(4) T → id

Page 120 Chapter Seven- LR Parsers 2024-2025


Compiler Design Dr Heba El Hadidi

In state 3, on a *, should we shift, or reduce? Why?- shift-reduce conflict

Page 121 Chapter Seven- LR Parsers 2024-2025


Compiler Design Dr Heba El Hadidi

SLR(1)

The simple improvement that SLR(1) makes on the basic LR(0) parser is to reduce only if
the next input token is a member of the follow set of the nonterminal being reduced. When
filling in the table, we don't assume a reduce on all inputs as we did in LR(0), we selectively
choose the reduction only when the next input symbols in a member of the follow set.

Same last Example:

Example:

0) S' –> S

1) S –> XX

2) X –> aX

3) X –> b

Page 122 SLR(1) 2024-2025


Compiler Design Dr Heba El Hadidi

Let’s parse an example string baab. It is a valid sentence in this language as shown by this le
ftmost derivation:

S –> XX

bX

baX

baaX

Page 123 SLR(1) 2024-2025


Compiler Design Dr Heba El Hadidi

baab

Now, let’s consider what the states mean. S4 is where X>b is completed; S2 and S6 is
where we are in the middle of processing the 2 a's; S7 is where we process the final b; S9
is where we complete the X –> aX production; S5 is where we complete S –> XX; and S1 is
where we accept.

Example:

LR(0) states and transitions:

Page 124 SLR(1) 2024-2025


Compiler Design Dr Heba El Hadidi

SLR(1) parse table:

Why SLR(1) Fails?

R, s

Page 125 SLR(1) 2024-2025


Compiler Design Dr Heba El Hadidi

LR(1)

Add Lookahead. Consider the following grammar and closure:

Page 126 LR(1) 2024-2025


Compiler Design Dr Heba El Hadidi

Page 127 LR(1) 2024-2025


Compiler Design Dr Heba El Hadidi

Example:

LR(1) parse table:

Page 128 LR(1) 2024-2025


Compiler Design Dr Heba El Hadidi

Example:

Page 129 LR(1) 2024-2025


Compiler Design Dr Heba El Hadidi

LALR

Page 130 LALR 2024-2025


Compiler Design Dr Heba El Hadidi

Give Example.

Page 131 LALR 2024-2025


Compiler Design Dr Heba El Hadidi

Page 132 LALR 2024-2025


Compiler Design Dr Heba El Hadidi

References
1- Keith D. Cooper and Linda Torczon, “Engineering a compiler”, 2nd edition, ElSeiver, 2012.
2- Seth D. Bergmann, “Compiler Design: Theory, Tools, and Example”, 2016.

Page 133 References 2024-2025

You might also like