SPCC - 5

SPCC notes for mumbai university

Uploaded by

Ian Thomas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views19 pages

SPCC - 5

SPCC notes for mumbai university

Uploaded by

Ian Thomas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

SPCC - 5

Compiler:
A compiler translates source code written in HLL to target code written in assembly language
or low level language. It reports the user about error(s) in the source program if any.

Phases of compiler:

Lexical Analysis:
● Lexical Analysis serves as the foundational phase in compiler design, acting as the first
step in the compilation process. It is commonly referred to as "lexer," "tokenizer," or
simply "scanner."
● Functions:
○ Reading Source Code: The lexer scans the source code, treating it as a series of
sentences.
○ Tokenization: It transforms sequences of characters into tokens, which are
fundamental units of language.
○ Whitespace and Comment Handling: The lexer removes excess spaces and
comments (e.g., // or #) from the source code.
○ Error Detection: Detects lexical errors, such as spelling mistakes, identifier length
violations, or illegal characters, during program execution.
○ Data Passing: Reads character streams from the source code, validates legal
tokens, and forwards data to the syntax analyzer upon request.
○ Symbol Table Population: Assists in identifying tokens and populating the symbol
table, aiding in further stages of compilation.

● Lexical analyzer reads the input source program, scans the characters and produces a
sequence of tokens that the parser can use for syntactic analysis.
● “get next token” is a command sent from the parser to the lexical analyzer.
● Language -
○ A language is considered as a finite set of strings over some finite set of alphabets.
Computer languages are considered as finite sets, and mathematically set
operations can be performed on them.
○ Finite languages can be described by means of regular expressions.
● Longest Match Rule -
○ When the lexical analyzer reads the source-code, it scans the code letter by letter;
and when it encounters a whitespace, operator symbol, or special symbols, it
decides that a word is completed.
○ int intvalue;
○ While scanning both lexemes till ‘int’, the lexical analyzer cannot determine
whether it is a keyword int or the initials of identifier int value.
● Complete set of tokens form the set of terminal symbols used in the grammar for the
parser. In most of the languages, the tokens fall into these categories-
○ Keywords
○ Operators
○ Identifiers
○ Constants
○ Literal strings
○ Punctuation
● Lexical analysis is the process of recognizing tokens from the input. Following are the
steps -
○ Store the input in the input buffer
○ The token is read and regular expressions are built for the corresponding token.
○ Regular expression is converted into a finite automaton.
○ For each state of FA, a function is designed and each input along with transitional
edges correspond to the input parameters of these functions.
○ The set of such functions ultimately creates lexical analyser program
● Functions of lexical analyser -
○ Tokenization.
○ Give only lexeme related Error messages: Exceeding length, unmatched string,
illegal characters. No messages related to syntax, semantics.
○ Eliminate comments, white spaces (Tab, blank space, newline).
● Lexical analyser uses transitional diagrams to keep track of information about the
characters that are seen as the forward pointer scans the input. Transition diagrams are
also called finite automata.
Lexemes, Tokens and Patterns:
● Lexemes consist of a series of characters within the source code that adheres to the
defined pattern of a token, essentially serving as an occurrence of that token.
● Tokens, on the other hand, are specific sequences of characters that encapsulate
meaningful units of information within the source program. These units can include
identifiers, keywords, constants (literals), operators, separators, and special characters.
● For instance, when considering a keyword, it operates as a token with a distinct pattern,
which is essentially a predefined sequence of characters representing that keyword. This
pattern is utilized to identify and categorize the keyword within the source code.
Symbol Table: The symbol table will contain the following types of information for the
input strings in a source program -
● The lexeme (input string) itself
● Corresponding token
● Its semantic component (e.g., variable, operator, constant, functions, procedure,
etc.)
● Data type
● Pointers to other entries (when necessary)

● The Symbol Table undergoes various operations, with key focus on the following pivotal
tasks:
○ Addition of Symbols - During initialization, the Symbol Table is populated with
reserved words, standard identifiers, and operators. As the scanner processes new
lexemes, they are dynamically added to the table and associated with a token
class. Furthermore, the semantic analyzer enriches these lexemes with pertinent
properties and attributes.
○ Organization - The Symbol Table can be structured in diverse ways, each with its
own advantages and drawbacks. A conventional approach involves organizing it as
an array of records. However, this method necessitates either a linear search for
retrieval or continual sorting to maintain orderliness.
● By optimizing the organization of the Symbol Table, compilers can streamline the lookup
process, thereby enhancing overall efficiency and performance.
● Some other ways of organizing the symbol table are -
○ Unordered List
○ Binary Search Tree
○ String table and name table
○ Hash table and name table
Syntax analysis:
● The Syntax Analyzer adheres to the production rules outlined by Context-Free Grammar
(CFG). Context-Free Grammar is formally represented as G(V, T, P, S), where:
○ V represents a set of non-terminal symbols.
○ T represents a set of terminal symbols, where the intersection of V and T is empty.
○ P denotes a set of rules, with each rule structured as P: V → (V ∪ T)*. In simpler
terms, the left-hand side of each production rule in P does not contain any
contextual dependencies, neither on its right nor left side.
○ S stands for the start symbol, signifying the initial non-terminal symbol from which
the derivation of valid strings commences.
● CFG serves as a foundational framework for defining the syntax and structure of
programming languages, providing a formalized structure for syntactic analysis by
compilers and parsers.

● A derivation tree or parse tree is an ordered rooted tree that graphically represents the
semantic information a string derived from a context-free grammar.
● Leftmost and Rightmost Derivation of a String-
○ Leftmost derivation − A leftmost derivation is obtained by applying production to
the leftmost variable in each step.
○ Rightmost derivation − A rightmost derivation is obtained by applying production
to the rightmost variable in each step.
● A grammar is said to ambiguous if for any string generated by it, it produces more than
one
○ Parse tree
○ Derivation tree
○ Syntax tree
○ Leftmost derivation
○ Rightmost derivation
● Ambiguous Grammar creates confusion for parser
● Left and Right Recursive Grammars-
○ In a context-free grammar G, if there is a production in the form X → Xa where X
is a non-terminal and ‘a’ is a string of terminals, it is called a left recursive
production. The grammar having a left recursive production is called a left
recursive grammar.
○ And if in a context-free grammar G, if there is a production in the form X → aX
where X is a non-terminal and ‘a’ is a string of terminals, it is called a right
recursive production. The grammar having a right recursive production is called a
right recursive grammar.

Rules for calculating first():

● Rule 1 -
○ For a production rule X → ∈, First(X) = { ∈ }
● Rule 2 -
○ For any terminal symbol ‘a’, First(a) = { a }
● Rule 3 -
○ For a production rule X → Y1Y2Y3, Calculating First(X)
○ If ∈ not in First(Y1), then First(X) = First(Y1)
○ If ∈ in First(Y1), then First(X) = { First(Y1) – ∈ } ∪ First(Y2Y3)
○ If ∈ not in First(Y2), then First(Y2Y3) = First(Y2)
○ If ∈ in First(Y2), then First(Y2Y3) = { First(Y2) – ∈ } ∪ First(Y3)
○ Similarly, we can make expansion for any production rule X → Y1Y2Y3…..Yn
Example:
● A -> abc / def / ghi
● First(A) ={a, d, g}
Rules for calculating follow():
● Rule 1 -
○ For the start symbol S, place $ in Follow(S).
● Rule 2 -
○ For any production rule A → αB, Follow(B) = Follow(A) [Recursion]
● Rule 3 -
○ For any production rule A → αBβ,
○ If ∈ not in First(β), then Follow(B) = First(β)
○ If ∈ in First(β), then Follow(B) = { First(β) – ∈ } ∪ Follow(A)
Calculate the first and follow functions for the given grammar:
S → aBDh
B → cC
C → bC / ∈
D → EF
E→g/∈
F→f/∈

LL1 Parser:
● It is a non recursive predictive parsing.
● It is a top-down parser.
● In this,
○ L - Scanning input from left to right
○ L - Producing a leftmost derivation
○ 1- One input symbol of lookahead at each step
● A grammar is LL(1) iff whenever A → a|b:
○ First(a) ∩ First(b) = Φ
○ At most one of a and b can derive empty string
○ First(a) ∩ Follow(A) = Φ
● INPUT: Contains string to be parsed with $ as it's end marker
● STACK: Contains sequence of grammar symbols with $ as it's bottom marker. Initially
stack contains only $
● PARSING TABLE: A two dimensional array
● M[A,a], where A is a nonterminal and a is a Terminal

NOTEBOOK
Left Recursive Grammar:
● A production of grammar is said to have left recursion if the leftmost variable of its RHS
is the same as the variable of its LHS.
● A grammar containing a production having left recursion is called Left Recursive
Grammar. Elimination of Left Recursion.
● Left recursion can be eliminated by converting it to right recursion. Elimination of left
recursion -
○ A → Aα / β (where β does not begin with an A.)
○ We can eliminate it as follows -
■ A → βA’
■ A’ → αA’ / ∈

NOTEBOOK
Right Recursive Grammar:
● A production of grammar is said to have right recursion if the rightmost variable of its
RHS is the same as the variable of its LHS.
● A grammar containing a production having right recursion is called Right Recursive
Grammar.
● Right recursion does not create any problem for the Top down parsers. Therefore, there
is no need of eliminating right recursion from the grammar
Grammar with common prefixes:
● If RHS of more than one production starts with the same symbol, then such a grammar is
called as Grammar With Common Prefixes.
● A → αβ1 / αβ2 / αβ3
● This kind of grammar creates a problematic situation for Top down parsers.
● Top down parsers can not decide which production must be chosen to parse the string in
the right hand.
● To remove this confusion we use left factoring
● It converts non deterministic CFG into deterministic CFG
● We make one production for each common prefix.
● The common prefixes may be a terminal or a non-terminal or a combination of both.
● Rest of the derivation is added by new productions.
● The grammar obtained after the process of left factoring is called as Left Factored
Grammar

NOTEBOOK
Bottom Up parsers:

LR parser -
● It is a non-recursive shift reduce parser – LR(k)
○ L- Left to right scanning input stream
○ R- Construction of rightmost derivation which is in reverse manner
○ k- Denotes lookahead symbol to make decision
● An LR-Parser uses -
○ States to memorize information during the parsing process.
○ An action table to make decisions (such as shift or reduce) and to compute states.
○ A goto table to compute states.
● S-R and R-R conflicts -

● Advantages of LR parser -
○ LR parsers can handle a large class of context-free grammars.
○ The LR parsing method is a most general non-back tracking shift-reduce parsing
method.
○ An LR parser can detect the syntax errors as soon as they can occur.
○ LR grammars can describe more languages than LL grammars.
● Disadvantages of LR parser -
○ It is too much work to construct an LR parser by hand.
○ It needs an automated parser generator.
○ If the grammar contains ambiguities or other constructs then it is difficult to parse
in a left-to-right scan of the input.

NOTEBOOK
Operator Grammars:
● Operator grammars have the property that no production on the right side is empty (no
null productions) or has two adjacent nonterminals. This property enables the
implementation of efficient operator-precedence parsers.
● Rule 1-
○ If precedence of b is higher than precedence of a, then we define a < b If
precedence of b is same as precedence of a, then we define a = b If precedence of
b is lower than precedence of a, then we define a > b
● Rule 2-
○ An identifier is always given a higher precedence than any other symbol. $ symbol
is always given the lowest precedence.
● Rule 3-
○ If two operators have the same precedence, then we go by checking their
associativity.

E → E+E | ExE | id

If $ < id > + < id > x < id > $ is the input:

$ E + < id > x < id > $
$ E + E x < id > $
$ E + Ex E $
$+x$
$<+<x>$
$ <+> $
$$
Shift reduce Parsing:
Operation relation (precedence) table is constructed as follows -

Q. Construct operator precedence parser and parse the string ( a , ( a , a ) ) :

S→(L)|a
L→L,S|S
Ans:
SLR Parsing:
● SLR (1) refers to simple LR Parsing. It is the same as LR(0) parsing. The only difference is
in the parsing table.
● To construct an SLR (1) parsing table, we use a canonical collection of LR (0) items.
● In the SLR (1) parsing, we place the reduce move only in the follow of the left hand side.

NOTEBOOK
Semantic Analysis:
● Semantic analysis checks the semantic consistency of the code.
● It uses the syntax tree of the previous phase along with the symbol table to verify that
the given source code is semantically consistent.
● It also checks whether the code is conveying an appropriate meaning.
CFG + semantic rules = Syntax Directed Definitions
● The semantic analyzer is expected to recognize:
○ Type mismatch.
○ Undeclared variable.
○ Reserved identifier misuse.
○ Multiple declaration of variables in a scope.
○ Accessing an out of scope variable.
○ Actual and formal parameter mismatch.
● Functions of semantic analysis:
○ Helps you to store type information gathered and save it in symbol table or syntax
tree
○ Allows you to perform type checking
○ In the case of type mismatch, where there are no exact type correction rules
which satisfy the desired operation a semantic error is shown
○ Collects type information and checks for type compatibility
○ Checks if the source language permits the operands or not
Syntax directed translation:
● In syntax directed translation, along with the grammar we associate some informal
notations and these notations are called semantic rules.
● In syntax directed translation, every non-terminal can get one or more than one
attribute or sometimes 0 attribute depending on the type of the attribute. The value of
these attributes is evaluated by the semantic rules associated with the production rule.
● In the semantic rule, an attribute is VAL and an attribute may hold anything like a string,
a number, a memory location and a complex record.
● In Syntax directed translation, whenever a construct is encountered in the programming
language then it is translated according to the semantic rules defined in that particular
programming language.
Syntax directed translation scheme:
● The Syntax directed translation scheme is a context -free grammar.
● The syntax directed translation scheme is used to evaluate the order of semantic rules.
In the translation scheme, the semantic rules are embedded within the right side of the
productions.
● The position at which an action is to be executed is shown by enclosed between braces.
It is written within the right side of the production.
● Annotated Parse Tree – The parse tree containing the values of attributes at each node
for given input string is called annotated or decorated parse tree
● Features –
○ High level specification
○ Hides implementation details
○ Explicit order of evaluation is not specified
Implementation of SDT:
● Syntax directed translation is implemented by constructing a parse tree and performing
the actions in a left to right depth first order.
● SDT is implemented by parsing the input and produces a parse tree as a result.

Role of Lexical Analyser
No ratings yet
Role of Lexical Analyser
10 pages
CD Unit-2
100% (1)
CD Unit-2
60 pages
Compiler Design Notes
No ratings yet
Compiler Design Notes
101 pages
Unit 1
No ratings yet
Unit 1
219 pages
Components of C Program
No ratings yet
Components of C Program
64 pages
Project File
No ratings yet
Project File
66 pages
Introduction To PEG (Parsing Expression Grammar) in Python
50% (2)
Introduction To PEG (Parsing Expression Grammar) in Python
71 pages
Module-2 1
No ratings yet
Module-2 1
51 pages
CD 2,3 Unit's Material
100% (1)
CD 2,3 Unit's Material
170 pages
Charles Stevenson - On What Is Poem
100% (2)
Charles Stevenson - On What Is Poem
35 pages
Material For CAT 1
100% (1)
Material For CAT 1
22 pages
Introduction To Programming Languages Compilers Interpreters and Assemblers
No ratings yet
Introduction To Programming Languages Compilers Interpreters and Assemblers
10 pages
Principles of Compiler Design
100% (2)
Principles of Compiler Design
35 pages
Parsing - 1
No ratings yet
Parsing - 1
59 pages
Chapter 3 - Syntax Analysis
No ratings yet
Chapter 3 - Syntax Analysis
9 pages
Chapter 4 - Syntax Analysis
No ratings yet
Chapter 4 - Syntax Analysis
82 pages
Unit - Ii 2.1 Syntax Analysis
No ratings yet
Unit - Ii 2.1 Syntax Analysis
122 pages
CH03
No ratings yet
CH03
57 pages
Compiler Design
No ratings yet
Compiler Design
19 pages
M2 Compiler Design
No ratings yet
M2 Compiler Design
51 pages
2-Role of Parser and Parse Tree-02!08!2024
No ratings yet
2-Role of Parser and Parse Tree-02!08!2024
69 pages
CD-30 Questions With Solution
No ratings yet
CD-30 Questions With Solution
43 pages
CD Notes
No ratings yet
CD Notes
194 pages
PPL Complete Notes PPL
No ratings yet
PPL Complete Notes PPL
126 pages
Chapter 3 (Part 1)
No ratings yet
Chapter 3 (Part 1)
33 pages
Syllabus PCD
No ratings yet
Syllabus PCD
64 pages
Unit-2 F&CD
No ratings yet
Unit-2 F&CD
31 pages
Lexical and Syntax Analysis: CSE 325/CSE 425: Concepts of Programming Language
No ratings yet
Lexical and Syntax Analysis: CSE 325/CSE 425: Concepts of Programming Language
41 pages
Thesis
No ratings yet
Thesis
53 pages
Week 2 Lec 4 CC
No ratings yet
Week 2 Lec 4 CC
28 pages
Lexical and Syntax Analysis: CSE 325/CSE 425: Concepts of Programming Language
No ratings yet
Lexical and Syntax Analysis: CSE 325/CSE 425: Concepts of Programming Language
41 pages
Quiz With Answers
No ratings yet
Quiz With Answers
12 pages
Compiler Design Lec-Three Syntax Analysis
No ratings yet
Compiler Design Lec-Three Syntax Analysis
60 pages
Free Pascal Reference Guide Version 3 0 2 Michaël Van Canneyt - The Ebook Is Available For Instant Download, No Waiting Required
No ratings yet
Free Pascal Reference Guide Version 3 0 2 Michaël Van Canneyt - The Ebook Is Available For Instant Download, No Waiting Required
58 pages
Unit Iii
No ratings yet
Unit Iii
28 pages
Compiler Rewind
No ratings yet
Compiler Rewind
52 pages
Lecture 4 Lexical Analysis
No ratings yet
Lecture 4 Lexical Analysis
23 pages
SP Unit III-2024-25
No ratings yet
SP Unit III-2024-25
126 pages
Compiler 2 PDF
No ratings yet
Compiler 2 PDF
43 pages
2024 CD-Ch03 Syntaxx Analysis
No ratings yet
2024 CD-Ch03 Syntaxx Analysis
28 pages
Syntax Analysis With Bison
No ratings yet
Syntax Analysis With Bison
30 pages
Topic #4: Syntactic Analysis (Parsing) : INF 524 Compiler Construction Spring 2011
No ratings yet
Topic #4: Syntactic Analysis (Parsing) : INF 524 Compiler Construction Spring 2011
44 pages
CSC 318 Class Notes
No ratings yet
CSC 318 Class Notes
21 pages
Chapter 4 - Syntax Analysis CIE1
No ratings yet
Chapter 4 - Syntax Analysis CIE1
69 pages
3a. Context Free Grammar
No ratings yet
3a. Context Free Grammar
18 pages
Lexical and Syntax Analysis
No ratings yet
Lexical and Syntax Analysis
63 pages
Compilers Theory
No ratings yet
Compilers Theory
16 pages
Chapter 3
No ratings yet
Chapter 3
43 pages
Unit - 3 Syntax Analyzer
No ratings yet
Unit - 3 Syntax Analyzer
43 pages
Lecture 19-20
No ratings yet
Lecture 19-20
40 pages
Chapter 3
No ratings yet
Chapter 3
41 pages
Compiler Design
No ratings yet
Compiler Design
19 pages
Syntax Analyser
No ratings yet
Syntax Analyser
30 pages
Multimedia Application L4
No ratings yet
Multimedia Application L4
42 pages
Compler
No ratings yet
Compler
35 pages
Unit4 Notes
No ratings yet
Unit4 Notes
32 pages
Content 3
No ratings yet
Content 3
39 pages
Lesson 08 2
No ratings yet
Lesson 08 2
33 pages
Chem Soc Rev: Tutorial Review
No ratings yet
Chem Soc Rev: Tutorial Review
26 pages
CC LL
No ratings yet
CC LL
15 pages
CD Notes by Quantum City AIR 107, GATE CS 2024, Shreyas Rathod Compiler
No ratings yet
CD Notes by Quantum City AIR 107, GATE CS 2024, Shreyas Rathod Compiler
37 pages
Towards Optimizing The Costs of LLM Usage
No ratings yet
Towards Optimizing The Costs of LLM Usage
12 pages
PP La Sa
No ratings yet
PP La Sa
20 pages
Electromechanical Engineering Faculty of Engineering Somali National University Course Name: Elementary Programing Concept Course Code: EPC 2309
No ratings yet
Electromechanical Engineering Faculty of Engineering Somali National University Course Name: Elementary Programing Concept Course Code: EPC 2309
11 pages
Comp Final
No ratings yet
Comp Final
16 pages
SP and MP - TM - Lec02-ParserCFG
No ratings yet
SP and MP - TM - Lec02-ParserCFG
27 pages
CD Farre
No ratings yet
CD Farre
13 pages
2019 February Iat 1 Te CMPN Sem Vi SPCC
No ratings yet
2019 February Iat 1 Te CMPN Sem Vi SPCC
12 pages
Understanding and Writing Compilers A Doityourself Guide Richard Bornat Instant Download
No ratings yet
Understanding and Writing Compilers A Doityourself Guide Richard Bornat Instant Download
82 pages
Final CD Problem Statements
No ratings yet
Final CD Problem Statements
11 pages
Chapter 3
No ratings yet
Chapter 3
9 pages
Question Bank Part A, Part B&C
No ratings yet
Question Bank Part A, Part B&C
15 pages
CC Questions
No ratings yet
CC Questions
9 pages
Compiler Designnotes
No ratings yet
Compiler Designnotes
18 pages
Unit 2
No ratings yet
Unit 2
22 pages
CC Summary (Slides)
No ratings yet
CC Summary (Slides)
9 pages
Learning Translation Rules From Bilingual English - Filipino Corpus
No ratings yet
Learning Translation Rules From Bilingual English - Filipino Corpus
10 pages
CD CIE 1 - DD - Scheme
No ratings yet
CD CIE 1 - DD - Scheme
13 pages
Mehedy Et Al. - 2003 - Bangla Syntax Analysis A Comprehensive Approach
No ratings yet
Mehedy Et Al. - 2003 - Bangla Syntax Analysis A Comprehensive Approach
7 pages
Chapter 3 - Syntax Analysis Part One
No ratings yet
Chapter 3 - Syntax Analysis Part One
10 pages
Compiler Design Question Bank
No ratings yet
Compiler Design Question Bank
7 pages
Compiler CH-3
No ratings yet
Compiler CH-3
6 pages
ATCD - Unit 3 - QB
No ratings yet
ATCD - Unit 3 - QB
9 pages
Lexical and Syntax Analysis
No ratings yet
Lexical and Syntax Analysis
3 pages
Compiler Construction Tools & Introduction To LA
No ratings yet
Compiler Construction Tools & Introduction To LA
5 pages
Lab Session 1 - Lexical Analyzer
No ratings yet
Lab Session 1 - Lexical Analyzer
4 pages
Lexical and Syntax Analysis - Updated
No ratings yet
Lexical and Syntax Analysis - Updated
5 pages
SP Unit Test 1 - Set A
No ratings yet
SP Unit Test 1 - Set A
1 page