SEN 317 Lecture 3

Syntax Analysis, or Parsing, is a crucial phase in compiler construction that verifies the syntax of source code by checking it against formal grammar rules and producing a parse tree. It plays a vital role in ensuring syntactical correctness, error detection, and serves as a bridge to semantic analysis. The document also discusses various types of grammars, their applications, and the importance of regular expressions in tokenization during lexical analysis.

Uploaded by

stargazeboi14

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views10 pages

SEN 317 Lecture 3

Uploaded by

stargazeboi14

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Lecture 3

Introduction to Syntax Analysis

Compiler Construction- NUN 2024 Austin Olom Ogar

Introduction to Syntax Analysis
Definition:
• Syntax Analysis, also known as Parsing, is the process by which a compiler verifies the syntax of
a programming language’s source code. It checks whether the input string of symbols (typically
tokens generated during lexical analysis) conforms to the formal grammar rules of the
language.
• The syntax analyzer takes tokens from the lexical analyzer as input and produces a parse tree
or syntax tree as output, representing the syntactic structure of the input according to a formal
grammar.. .
Goals of Syntax Analysis:
•To validate the syntactical correctness of a program by ensuring it follows the defined grammar
rules.
•To construct a parse tree that visually represents the program structure, which aids in further
stages of compilation such as semantic analysis and code generation..
Example:
If we have a mathematical expression like 3 + (5 * 2), the syntax analyzer will check if the
arrangement of tokens 3, +, (, 5, *, 2, and ) follows the rules of arithmetic expression grammar.
Role of Syntax Analysis in Compiler Design
Role and Importance:
• Syntax Analysis is the second phase of compilation, following Lexical Analysis. Its main role is to interpret the
syntactic structure of the program to ensure it is meaningful and adheres to language rules.
• Syntax Analysis bridges the gap between the lexically analyzed tokens and the semantic meaning of the code by
structuring them into a coherent parse tree.
• The parser’s output is used by the subsequent phases, particularly Semantic Analysis, which checks for
meaningfulness beyond the syntax and applies the semantic rules of the language..
Main Functions:
• Grammar Enforcement: Syntax Analysis checks that the source code follows language-specific grammatical rules.
• Error Detection and Recovery: It can detect syntax errors (such as missing parentheses or unclosed brackets) early in the
.
compilation process and, where possible, apply error recovery techniques to continue parsing.
• Generation of Parse Tree: By organizing tokens into a hierarchical structure, the parser makes it easier to handle complex
expressions and statements, which simplifies tasks in subsequent compilation stages.

Types of Parsers::
• There are two main types of parsers used in compiler design: Top-Down Parsers (e.g., Recursive Descent) and Bottom-Up Parsers
(e.g., LR Parsers). Each type has specific advantages depending on the structure of the grammar it is handling.Data Movement:
Load (LD), Store (ST).
Difference between Syntax Analysis and Lexical Analysis
Purpose:
• Lexical Analysis: Transforms the input character stream into tokens (the smallest meaningful units) by
identifying keywords, operators, identifiers, etc.
• Syntax Analysis: Takes the stream of tokens produced by the lexical analyzer and arranges them into a
hierarchical structure (parse tree) according to the language’s grammar.
Output:
• Lexical Analysis: Produces a series of tokens (e.g., IDENTIFIER, NUMBER, PLUS).
• Syntax Analysis: Produces a parse tree or syntax tree that represents the syntactic structure of the
source code..
.
Error Detection:
• Lexical Analysis: Primarily detects errors in token structure, such as invalid characters or unrecognized
symbols.
• Syntax Analysis: Detects errors related to the sequence and structure of tokens, such as misplaced
operators or unbalanced parentheses.

Example of a Flow in Compilation:

• If the source code is int x = 5 + ;, the lexical analyzer would identify the tokens as int, x, =, 5, +, ;.
• The syntax analyzer would then detect an error in the syntax, as the + operator lacks an operand.
Grammar and Language Theory
Grammar in Compiler Design:
• Grammar in compiler design is a formal framework that specifies the structure of the strings in a given
language. It defines the syntax rules for arranging symbols, operators, and keywords within a
programming language.
• A grammar is often represented as a set of production rules, which are used by a parser to analyze the
syntactical structure of source code and generate a parse tree, reflecting the hierarchical organization
of the language.
Components of Grammar:

.
• Non-Terminal Symbols: Abstract symbols representing groups or patterns in the language (e.g.,
expressions, statements).
• Terminal Symbols: The basic symbols or tokens of the language, like keywords and operators.
• Production Rules: Define how non-terminal symbols can be replaced by groups of terminal and/or
non-terminal symbols.
• Start Symbol: A special non-terminal symbol from which the production process begins.
Example of a Flow in Compilation:
• In a simple arithmetic language, a grammar might define an expression (E) that can consist of a term
(T), plus another expression, or just a term, with production rules such as:
• E→E+T
• E→T
• T → integer
Types of Grammar (Regular, Context-Free, Context-Sensitive, Unrestricted)
According to the Chomsky Hierarchy, grammars are classified into four main types, each with increasing levels of
expressiveness:
Type 3: Regular Grammar:
• The simplest type, used to define regular languages. Its production rules are of the form A → aB or A → a, where A and B
are non-terminal symbols and a is a terminal symbol.
• Regular grammar can be represented by finite automata and is often used in lexical analysis to define token patterns..
Type 2: Context-Free Grammar (CFG):
• In CFG, each production rule has a single non-terminal symbol on the left-hand side (e.g., A → γ, where A is a non-
terminal, and γ is a string of terminals and/or non-terminals).
.
• CFGs are powerful enough to describe most programming language constructs and are typically used for syntax analysis
(parsing)..
Type 1: Context-Sensitive Grammar:
• These grammars allow more complex production rules, where the left-hand side can contain multiple symbols, and the
right side may vary based on the surrounding context. Productions follow the form αAβ → αγβ, where A is a non-
terminal, and α and β can be empty or non-empty strings of terminals/non-terminals.
• Context-sensitive grammars are less restrictive and are used to model context-dependent structures.

Type 0: Unrestricted Grammar::

• The most general form of grammar, unrestricted grammars have no constraints on production rules. They are as
powerful as Turing machines and can represent any computable function.
• Type-0 grammars are rarely used in practical compiler design due to their complexity.
Type 3 Grammar: Overview and Characteristics
In the Chomsky Hierarchy, Type 3 grammars represent the simplest form of formal grammar, also known as
Regular Grammar. This type of grammar is used to define regular languages, which can be processed by finite
automata. Regular grammars are fundamental in automata theory and computational linguistics, particularly in
applications where simple, linear patterns must be recognized, such as in lexical analysis within compilers.:
Characteristics of Type 3 Grammar:
Grammar Structure:

• 𝐴→𝑎𝐵A→aB (where 𝐴A and 𝐵B are non-terminal symbols and 𝑎a is a terminal), or

• Rules in Type 3 grammar follow a very restricted format. Each rule must either take the form:

• 𝐴→𝑎A→a, where 𝐴A is a non-terminal and 𝑎a is a terminal.

.
• This format ensures that productions proceed in a linear and non-nested manner, making them simpler than other types
of grammar.
Direction of Production:

• Right-linear grammar: All productions are of the form 𝐴→𝑥𝐵A→xB or 𝐴→𝑥A→x, where non-terminals appear to
• Type 3 grammars can either be right-linear or left-linear:

• Left-linear grammar: All productions are of the form 𝐴→𝐵𝑥A→Bx or 𝐴→𝑥A→x, where non-terminals appear to
the right.

the left.
• However, left-linear grammars are less commonly used since they can often be converted to right-linear grammars, which
align more naturally with finite state machines.
Type 3 Grammar: Overview and Characteristics cont.
Limitations:
• Regular grammars cannot handle nested structures or recursive patterns like parenthesis matching or
palindromes. For example, a Type 3 grammar cannot express languages where symbols must be balanced or
deeply embedded.
Applications in Computing:
• Type 3 grammars and regular languages are widely used in:
• Lexical Analysis: Recognizing tokens (keywords, identifiers, literals) in programming languages.
• Pattern Matching: Used extensively in tools like regular expressions for searching and validating strings.
.
• Finite Automata Design: Since regular languages can be represented by regular expression,
deterministic or non-deterministic finite automata (DFA or NFA), they are essential in designing efficient
parsers and scanners in compilers.

Examples of Type 3 Grammar:

•A typical regular language could be represented by a grammar like:
• S→aS or S→bS or S→a or S→b
•This grammar generates strings consisting of any combination of "a"s and "b"s, like "ab", "aabb", or "bb".
Introduction to Regular Expressions (Regex)
A Regular Expression (often abbreviated as regex or regexp) is a powerful tool used to define search patterns in text. It allows
users to find, match, and manipulate specific text strings efficiently within a larger text body. Regular expressions are widely
applied in data validation, text processing, lexical analysis in compilers, search engines, and text editors.
Components of Regular Expressions:
• Regular expressions use various characters to define patterns:
• Literals: Characters or strings that match themselves, like abc matching "abc".
• Meta-characters: Special characters with specific functions, such as:
• . (dot) matches any character except a newline.
• * (asterisk) matches zero or more occurrences of the preceding element.

.
• + (plus) matches one or more occurrences of the preceding element.
• ? matches zero or one occurrence.
• | represents an OR operator, such as (cat|dog) to match "cat" or "dog".
• Anchors: Indicate positions in a string.
• ^ matches the start of a string.
• $ matches the end of a string.
• Character Classes: Define sets of characters to match.
• [abc] matches any of "a", "b", or "c".
• \d matches any digit, \w matches any word character, and \s matches whitespace.
• Quantifiers: Specify the number of times an element should appear.
• {n} exactly n occurrences.
• {n,} at least n occurrences.
• {n,m} between n and m occurrences.
Introduction to Regular Expressions (Regex) cont.
Applications of Regular Expressions
• Text Search and Replacement: Regular expressions allow for advanced search patterns and replacements in editors and
command-line tools.
• Data Validation: Common in form input validation for email, phone numbers, or zip codes.
• Tokenization in Programming: Lexical analyzers in compilers use regex to categorize tokens in code.

Examples of Regular Expressions for a Compiler Design (Tokenization):

• Identifiers: In many programming languages, identifiers consist of letters followed by letters or digits.
• Regex: [a-zA-Z][a-zA-Z0-9]*

.
• This matches strings starting with a letter and followed by any combination of letters and digits.
• Keywords: Reserved words, such as "if" or "while," can be matched as exact patterns.
• Regex for "if": \bif\b
• The \b denotes word boundaries, ensuring that "if" is matched as a whole word and not as a part of another word.
• Integer Constants: In languages like C, integers consist of sequences of digits.
• Regex: [0-9]+
• This matches one or more digits.
• Floating-point Numbers: Regular expressions for floating-point numbers recognize decimal patterns.
• Regex: [0-9]+\.[0-9]+
• This matches numbers with a decimal point, such as "123.45".
• Operators: Many operators are single characters, like +, -, *, or =.
• Regex for operators: [+\-*/=]
• This matches any single character operator.

Lecture - Information Technology in Supply Chain Management
No ratings yet
Lecture - Information Technology in Supply Chain Management
23 pages
BMW E46 Code List
No ratings yet
BMW E46 Code List
82 pages
Unit 2 - Sessions 1 - 2
No ratings yet
Unit 2 - Sessions 1 - 2
36 pages
Unit 2 - Sessions 1 - 2
No ratings yet
Unit 2 - Sessions 1 - 2
133 pages
Syntax Analysis
67% (3)
Syntax Analysis
46 pages
3a. Context Free Grammar
No ratings yet
3a. Context Free Grammar
18 pages
Day 5 - Syntax Analysis
No ratings yet
Day 5 - Syntax Analysis
46 pages
Compiler Design
No ratings yet
Compiler Design
19 pages
003chapter 3 - Syntax Analysis
No ratings yet
003chapter 3 - Syntax Analysis
171 pages
Chapter 3
No ratings yet
Chapter 3
41 pages
Structure Ofa Compiler: Front End
No ratings yet
Structure Ofa Compiler: Front End
95 pages
Unit-2 F&CD
No ratings yet
Unit-2 F&CD
31 pages
Chapter 3 - Syntax Analysis Part One
No ratings yet
Chapter 3 - Syntax Analysis Part One
10 pages
Class Three
No ratings yet
Class Three
74 pages
SS Unit 4
No ratings yet
SS Unit 4
29 pages
Compiler Construction Tools & Introduction To LA
No ratings yet
Compiler Construction Tools & Introduction To LA
5 pages
Chapter 3 - Syntax Analysis
No ratings yet
Chapter 3 - Syntax Analysis
9 pages
Unit Iii
No ratings yet
Unit Iii
95 pages
Chapter 4 Syntax Analysis
No ratings yet
Chapter 4 Syntax Analysis
95 pages
CSC 318 Class Notes
No ratings yet
CSC 318 Class Notes
21 pages
Ch3 Compiler Ebook
No ratings yet
Ch3 Compiler Ebook
26 pages
Ch3 - Syntax Analysis
No ratings yet
Ch3 - Syntax Analysis
94 pages
Chapter 4
No ratings yet
Chapter 4
23 pages
Compiler Design
No ratings yet
Compiler Design
19 pages
Ch3 - Syntax Analysis
No ratings yet
Ch3 - Syntax Analysis
92 pages
CD Unit 2
100% (1)
CD Unit 2
20 pages
Lecture 4 - Syntax Analysis
No ratings yet
Lecture 4 - Syntax Analysis
66 pages
Ch3 - Syntax Analysis
No ratings yet
Ch3 - Syntax Analysis
96 pages
Compiler Design - Syntax Analysis
No ratings yet
Compiler Design - Syntax Analysis
11 pages
Compiler - Design - Module3
No ratings yet
Compiler - Design - Module3
19 pages
Role of Parse1
No ratings yet
Role of Parse1
20 pages
Syntax Analysis (Part-I)
No ratings yet
Syntax Analysis (Part-I)
88 pages
Structure and Phases of A Compiler
No ratings yet
Structure and Phases of A Compiler
54 pages
Overview: 1.1 Motivation
No ratings yet
Overview: 1.1 Motivation
39 pages
CD - Unit - 2
No ratings yet
CD - Unit - 2
22 pages
CD Unit - 2
100% (1)
CD Unit - 2
148 pages
Ch3 - Syntax Analysis
No ratings yet
Ch3 - Syntax Analysis
96 pages
CIT316 Summary
No ratings yet
CIT316 Summary
21 pages
09 Parsing
No ratings yet
09 Parsing
11 pages
Learning Materials, CD, Unit-3 (Syntax Analysis)
No ratings yet
Learning Materials, CD, Unit-3 (Syntax Analysis)
42 pages
L4 Formal Grammers
No ratings yet
L4 Formal Grammers
23 pages
Compiler Rewind
No ratings yet
Compiler Rewind
52 pages
2 Syntax Analysis - Introduction
No ratings yet
2 Syntax Analysis - Introduction
8 pages
Syntax Analysis: - Check Syntax and Construct Abstract Syntax Tree
No ratings yet
Syntax Analysis: - Check Syntax and Construct Abstract Syntax Tree
22 pages
CD Chapter 3
No ratings yet
CD Chapter 3
57 pages
PL Lec 2 Syntax and Semantics
No ratings yet
PL Lec 2 Syntax and Semantics
48 pages
Compler
No ratings yet
Compler
35 pages
CS8602 CD Unit 2
No ratings yet
CS8602 CD Unit 2
43 pages
Khawajamohiuddin 2801 20888 5 CC 05 SyntaxAnalysis
No ratings yet
Khawajamohiuddin 2801 20888 5 CC 05 SyntaxAnalysis
29 pages
Compiler Unit Ii
No ratings yet
Compiler Unit Ii
67 pages
Parsing - 1
No ratings yet
Parsing - 1
59 pages
Chapter 3 (Part 1)
No ratings yet
Chapter 3 (Part 1)
33 pages
Chapter 2
No ratings yet
Chapter 2
47 pages
CH 6
No ratings yet
CH 6
18 pages
Unit Iii
No ratings yet
Unit Iii
28 pages
Chapter 4 Syntax Analysis
No ratings yet
Chapter 4 Syntax Analysis
95 pages
2nd Phase Syntax Analyzer - 1
No ratings yet
2nd Phase Syntax Analyzer - 1
136 pages
Lecture 4 Lexical Analysis
No ratings yet
Lecture 4 Lexical Analysis
23 pages
Lecture 4
No ratings yet
Lecture 4
26 pages
Compiler Design: Unit:02
No ratings yet
Compiler Design: Unit:02
12 pages
Syntax Analysis: Chapter - 4
No ratings yet
Syntax Analysis: Chapter - 4
41 pages
Coding for beginners The basic syntax and structure of coding
From Everand
Coding for beginners The basic syntax and structure of coding
Diamond Moore
No ratings yet
4 Enhancing-Decision-Making-Through-Sensitivity-Analysis
No ratings yet
4 Enhancing-Decision-Making-Through-Sensitivity-Analysis
14 pages
Guidelines Assignment 1 - Aerobic Dance
No ratings yet
Guidelines Assignment 1 - Aerobic Dance
5 pages
Ec240 Volvo 1 102
100% (25)
Ec240 Volvo 1 102
102 pages
PJM 3
No ratings yet
PJM 3
29 pages
Flow Over Immersed Body
No ratings yet
Flow Over Immersed Body
12 pages
CHAPTER 6 Frequency Analysis
No ratings yet
CHAPTER 6 Frequency Analysis
38 pages
Fone Xiaomi 24-05-24
No ratings yet
Fone Xiaomi 24-05-24
1 page
Latex Foam Manufacturing Process
No ratings yet
Latex Foam Manufacturing Process
2 pages
HiperLAN - Wikipedia
No ratings yet
HiperLAN - Wikipedia
3 pages
FASA - Federation Ship Recognition Manual 2385
100% (4)
FASA - Federation Ship Recognition Manual 2385
204 pages
Over-Voltage Spark Gaps Datasheet
No ratings yet
Over-Voltage Spark Gaps Datasheet
15 pages
PDF 1
No ratings yet
PDF 1
3 pages
Kia Seltos 4 Page Leaflet 2023 Desktop Revised
No ratings yet
Kia Seltos 4 Page Leaflet 2023 Desktop Revised
4 pages
James Garner Broadcast Resume
No ratings yet
James Garner Broadcast Resume
1 page
Front Loading Process From WCM
No ratings yet
Front Loading Process From WCM
15 pages
L-3, Output Devices by Arpita Mam
No ratings yet
L-3, Output Devices by Arpita Mam
22 pages
The School Principal of PEGAFI, Dr. Francisca Uy
No ratings yet
The School Principal of PEGAFI, Dr. Francisca Uy
3 pages
VECTOR PNP v1-2 PDF
No ratings yet
VECTOR PNP v1-2 PDF
4 pages
Senarai Frekuensi, Stesen Radio Di Malaysia
No ratings yet
Senarai Frekuensi, Stesen Radio Di Malaysia
2 pages
College Information System
68% (28)
College Information System
97 pages
Fembot
No ratings yet
Fembot
7 pages
Gujarat Technological University: Analysis and Design of Algorithms
No ratings yet
Gujarat Technological University: Analysis and Design of Algorithms
3 pages
MPG12V155F
No ratings yet
MPG12V155F
2 pages
Opening Range Trading Strategy
100% (1)
Opening Range Trading Strategy
20 pages
WPS P4 To P4
No ratings yet
WPS P4 To P4
1 page
Interchange Limit and Summation
No ratings yet
Interchange Limit and Summation
4 pages
FactSheet - QoS v1
No ratings yet
FactSheet - QoS v1
4 pages
C# & C++: 5 Books in 1 - The #1 Coding Course From Beginner To Advanced (2023) Mark Reed PDF Download
No ratings yet
C# & C++: 5 Books in 1 - The #1 Coding Course From Beginner To Advanced (2023) Mark Reed PDF Download
62 pages

SEN 317 Lecture 3

Uploaded by

SEN 317 Lecture 3

Uploaded by

Lecture 3

Introduction to Syntax Analysis

Compiler Construction- NUN 2024 Austin Olom Ogar

Example of a Flow in Compilation:

Type 0: Unrestricted Grammar::

• 𝐴→𝑎𝐵A→aB (where 𝐴A and 𝐵B are non-terminal symbols and 𝑎a is a terminal), or

• 𝐴→𝑎A→a, where 𝐴A is a non-terminal and 𝑎a is a terminal.

Examples of Type 3 Grammar:

Examples of Regular Expressions for a Compiler Design (Tokenization):

You might also like